浅谈Python如何处理字符串-k6k4.com

本次内容是以以Python 2.7为例来进行讨论的，Python 3x与其类似。

1. Python文件编码

在文件头部一般声明为UTF-8：

# encoding=utf8

有的也声明为GBK，多见于Windows系统上

2. 常用字符串操作

s = 'I love python '
rs = s[::-1] # 反转字符串，rs为：' nohtyp evol I'
s[0] # 取s的第0个字符‘I'
s[-2] # 去s的倒数第二个字符'n', （负索引，倒数）
s[0:3] #去s的第0-3个字符成为新字符'I l' （左开右闭区间）
s.strip() # 去掉s两边的空白字符（空格、\n、\t 等）
array = s.split(' ') #以空格截断字符串生成数组，英文的分词
print array # ['I', 'love', 'python', ''], 注意 array有四个成员，最后是空字符串

ns = ','.join(array) #用,把array连接成一个字符串'I,love,python,'

字符串操作还有很多函数可用，最方便的查看这些函数的方法就是用ipython，

在ipython里面输入s.后按TAB键即可：

In [26]: s = 'abc'
In [27]: s.
s.capitalize s.format      s.isupper     s.rindex      s.strip
s.center      s.index       s.join        s.rjust       s.swapcase
s.count       s.isalnum     s.ljust       s.rpartition s.title
s.decode      s.isalpha     s.lower       s.rsplit      s.translate
s.encode      s.isdigit     s.lstrip      s.rstrip      s.upper
s.endswith    s.islower     s.partition   s.split       s.zfill
s.expandtabs s.isspace     s.replace     s.splitlines
s.find        s.istitle     s.rfind       s.startswith

查看某个函数的详细说明就在该函数后面加一个?，比如：

In [27]: s.index?
Type: builtin_function_or_method
String Form:
Docstring:
S.index(sub [,start [,end]]) -> int

Like S.find() but raise ValueError when the substring is not found.

3. 长字符串

Python代码里面有时候要写很长的字符串，比如sql语句，长的打印信息等，很容易超过80个字符的限制而破坏代码的美观，而字符串相加据说效率低下且不那么美观。

于是乎，就有了这种漂亮的写法（用括号括起来的多行字符串，其实是一个字符串）：

ss = ('select a.name, a.age, a.class, '
      'b.content, b.url, b.title, b.time '
      'from user a '
      'left join page b on a.userId=b.userId ')

4. 中文字符

中文和日韩文字都是多字节的，导致他们比英文复杂一点点。

utf8 = '我爱机器学习算法与Python学习公众号'
unicode = u'我爱机器学习算法与Python学习公众号'
utf8.decode('utf8') == unicode # True
unicode.encode('utf8') == utf8 # True
len(utf8) == 12 # 每个中文字的utf8编码占3个字节
len(unicode) == 4

GBK转UTF-8的过程，

就是先decode转成unicode，unicode再encode成为UTF-8：

gbk.decode('gbk').encode('utf8')

个人资料

郭昱良
等级：6
文章：22篇
访问：1.0w
排名： 18

推荐圈子

机器学习算法与Python学习