Python之re模块的使用

re模块常用的方法：

/match()方法，从字符串头部开始匹配/
使用match匹配，第一个参数为正则表达式，第二个是要匹配的字符串
group()是用来输出匹配的内容
span()是用来输出匹配的位置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 import re
content='The 123456 is my one phone number.'
print(len(content))
result=re.match(r'The\s\d+\s\w*',content) #
print(result)
print(result.group())  #输出匹配的内容
print(result.span())   #输出内容位置
#匹配目标
content = 'The 123456 is my one phone number.'
print(len(content)) #字符串长度
result = re.match(r'^The\s(\d+)\sis', content) #使用match匹配, 第一个参数为正则表达式, 第二个为要匹配的字符串
print(result)
print(result.group()) #输出匹配内容
print(result.group(1)) #输出第一个被()包裹的内容
print(result.span()) #输出匹配内容的位置索引


/search()方法是不需要从头开始匹配的，形式与match相同/
使用match匹配，第一个参数为正则表达式，第二个是要匹配的字符串

1
2
3
4
5 import re
content = 'Other The 123456 is my one phone number.'
result = re.search('The.*?(\d+).*?number.', content)
print(result.group())


/findall()返回所有符合匹配规则的内容/
findall与compile函数相互配合使用
compile函数里放匹配的形式
例如：
contents=re.compile()函数:是你要查找的字符串
result=contents.findall(html.text) 表示的是从HTML中查找contents里的内容


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 import re
html = """
<div id="songs-list">
<h2 class="title">歌单</h2>
<p class="introduction">歌单列表</p>
<ul id="list" class="list-group">
<li data-view="2">一路上有你</li>
<li data-view="7">
<a href="/2.mp3" singer="任贤齐">沧海一声笑</a>
</li>
<li data-view="4" class="active">
<a href="/3.mp3" singer="齐秦">往事随风</a>
</li>
<li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li>
<li data-view="5"><a href="/5.mp3" singer="程慧玲">记事本</a></li>
<li data-veiw="5">
<a href="/6.mp3" singer="邓丽君">但愿人长久</a>
</li>
</ul>
</div>
"""
result = re.findall('<li.*?href="(.*?)".*?singer="(.*?)">(.*?)</a>', html, re.S)
if result:
	print(result)
	for res in result:
		print(res[0], res[1], res[2])


### 文件的简单操作：
import os
# os.path.exists(path)  判断一个目录是否存在
#os.makedirs(path) 多层创建目录
#os.mkdir(path) 创建目录