python正则表达式

2025-01-07 技术教程

python使用re模块提供了正则表达式的处理能力

常量

常量说明re.M、re.MULTILINE 多行模式 re.S、re.DOTALL 单行模式re.I、re.IGNORECASE 忽略大小写re.X、re.VERBOSE 忽略表达式中的空白字符使用 | 位或运算开启多种选项方法

编译

re.compile(pattern,flags =0)设定flags，编译模式，返回正则表达式对象regex。pattern就是正则表达式字符串，flags是选项。正则表达式需要被编译，为了提高效率，这些被编译后的结果被保存，下次使用同样的pattern的时候，就不需要再次编译。re的其他方法为了提高效率都调用了编译方法，就是为了提速

单次匹配

re.match(pattern,string,flags=0)regex.match(string[,pos[,endpos]])match匹配从字符串的开头匹配，regex对象match方法可以重设定开始的位置和结束位置，返回match对象re.search(pattern,string,flags=0)regex.search(string[,pos[,endpos]])从头搜索直到第一个匹配，regex对象search方法可以重新设定开始和结束位置，返回match对象re.fullmatch(pattern,string,flags=0)regex.fullmatc(string[,pos[,endpos]])整个字符串和正则表达式匹配import re s = '''bottle\nbag\nbig\napple'''for i,c in enumerate(s,1): print((i-1,c),end = '\n' if i%8==0 else ' ')print()(0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b')(8, 'a') (9, 'g') (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a')(16, 'p') (17, 'p') (18, 'l') (19, 'e') #match 方法print('--match--')result = re.match('b',s) # 找到一个就不找了print(1,result)result = re.match('a',s)print(2,result) #没找到返回Noneresult = re.match('^a',s,re.M)# 依然从头开始找，多行模式没有用print(3,result)result = re.match('^a',s,re.S)#依然从头开始找print(4,result)#先编译，再使用正则表达式表达regex = re.compile('a')result = regex.match(s) #依然从头开始找print(5,result)result = regex.match(s,15) # 把索引15作为开始找print(6,result)print()search方法print('--search--')result = re.search('a',s) # 扫描找到匹配的第一个位置print(7,result)regex = re.compile('b')result = regex.search(s,1)print(8,result) # bagregex = re.compile('^b',re.M)result = regex.search(s) # 不管是不是多行，找到就返回print(8.5,result) #bootleresult = regex.search(s,8)print(9,result) #bigfullmatch方法result = re.fullmatch('bag',s)print(10,result)regex = re.compile('bag')result = regex.fullmatch(s)print(11,result)result = regex.fullmatch(s,7)print(12,result)result = regex.fullmatch(s,7,10)print(13,result) # 要完全匹配，多了少了都不行，[7,10)全文方法re.findall(pattern,string,flags=0)regex.findall(string[,pos[,endpos]])对整个字符串，从左至右匹配，返回所有匹配项的列表re.finditer(pattern,string,flags=0)regex.fingiter(string[,pos[,endpos]])对整个字符串，从左至右匹配，返回所有匹配项,返回迭代器注意每次迭代返回的是match对象。import re s = '''bottle\nbag\nbig\nable'''for i,c in enumerate(s,1): print((i-1,c),end = '\n' if i%8==0 else ' ')print()(0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b')(8, 'a') (9, 'g') (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a')(16, 'b') (17, 'l') (18, 'e') findall 方法result = re.findall('b,',s)print(1,result)regex = re.compile('^b')result = regex.findall(s)print(2,result)regex = re.compile('^b',re.M)result = regex.findall(s,7)print(3,result) #bag bigregex = re.compile('^b',re.S)result = regex.findall(s)print(4,result) # bottleregex = re.compile('^b',re.M)result = regex.findall(s,7,10)print(5,result) # bagfiditer 方法result = regex.finditer(s)print(type(result))print(next(result))print(next(result))

匹配替换

re.sub(pattern,replacement,string,count=0,flags=0)regex.sub(replacement,string,count=0)使用pattern对字符串string进行匹配，对匹配项使用repl替换。replacement可以是string、bytes、functionre.subn(pattern,replace,string,count=0,flags=0)regex.subn(replance,string,string,count=0)同sub返回一个元组(new_string,number_of_subs_made)import re s = '''bottle\nbag\nbig\napple'''for i,c in enumerate(s,1): print((i-1,c),end = '\n' if i%8==0 else ' ')print()(0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b')(8, 'a') (9, 'g') (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a')(16, 'p') (17, 'p') (18, 'l') (19, 'e') 替换方法regex = re.compile('b\wg')result = regex.sub('magedu',s)print(1,result)result = regex.sub('magedu',s,1)print(2,result)regex = re.compile('\s+')result = regex.subn('\t',s)print(3,result) # 被替换后的字符串及替换次数的元组分割字符串re.split(pattren,string,maxsplit,flags=0)re.split(分割字符串）import res = '''01 bottle02 bag03 big1100 able'''for i,c in enumerate(s,1): print((i-1,c),end='\n' if i%8==0 else ' ')print()把每行单词提取出来print(s.split()) #做不到['01'...'big1'...]result = re.split('[\s\d]+',s)print(1,result)regex = re.compile('^[\s\d]+')result = regex.split(s)print(2,result)regex = re.compile('^[\s\d]+',re.M)result = regex.split(s)print(3,result)regex = re.compile('\s\d+\s+')result = regex.split(' '+s)print(4,result)

分组

使用小括号的pattern捕获的数据被放到了组group中。match、search函数可以返回match对象，findall返回字符串列表，finditer返回一个match对象如果pattern中使用了分组，如果匹配有结果，会在match对象中1、使用group(N)方式返回对应分组，1到N是对应分组，0返回整个匹配字符2、如果使用了命名分组，可以使用group('name')的方式取分组3、使用groupdict()返回所有命名的分组import re s = '''bottle\nbag\nbig\napple'''for i,c in enumerate(s,1): print((i-1,c),end = '\n' if i%8==0 else ' ')print()#分组regex = re.compile('(b\w+)')result = regex.match(s)print(type(result))print(1,'match',result.groups())result = regex.search(s,8)print(2,'match',result.groups())#命名分组regex = re.compile('(b\w+)\n(?P<name2>b\w+)\n(?P<name3>b\w+)')result = regex.match(s)print(3,'match',result)print(4,result.group(3),result.group(2),result.group(1))print(5,result.group(0).encode()) # 0返回整个匹配字符串，即matchprint(6,result.group('name2'),result.group('name3'))print(7,result.groups())print(8,result.groupdict())result = regex.findall(s)for x in result: # 字符串列表 print(type(x),x)regex = re.compile('(?P<head>b\w+)')result = regex.finditer(s)for x in result: print(type(x),x,x.group(),x.group('head'))