练习4：正则表达式 📗

📋 任务目标

掌握使用re模块进行正则匹配

使用re.match()从开头匹配
使用re.search()搜索整个字符串
使用re.findall()查找所有匹配
掌握常见正则元字符
理解贪婪与非贪婪匹配

💡 知识点回顾

课程位置：05.处理响应.pdf - 5.3 正则表达式

核心方法：

re.match(pattern, string) - 从开头匹配
re.search(pattern, string) - 搜索整个字符串
re.findall(pattern, string) - 查找所有匹配
re.sub(pattern, repl, string) - 替换

常用元字符：

. - 匹配任意字符（除换行符）
* - 匹配前一个字符0次或多次
+ - 匹配前一个字符1次或多次
? - 匹配前一个字符0次或1次
.* - 贪婪匹配，.*? - 非贪婪匹配

📝 联系人信息

下面是一些联系人信息，请使用正则表达式提取邮箱和电话号码。

姓名	邮箱	电话
张三	zhangsan@example.com	13812345678
李四	lisi@example.com	13987654321
王五	wangwu@example.com	15612345678

💡 示例代码

提取邮箱地址

import requests
import re
from bs4 import BeautifulSoup

url = 'https://req.haleibc.com/practice4'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# 获取所有邮箱单元格
email_cells = soup.find_all('td', class_='email-data')

# 提取邮箱
emails = []
for cell in email_cells:
    text = cell.text
    # 邮箱正则：\w+@\w+\.\w+
    email = re.findall(r'\w+@\w+\.\w+', text)
    if email:
        emails.extend(email)

print("邮箱列表：", emails)

提取电话号码

import requests
import re
from bs4 import BeautifulSoup

url = 'https://req.haleibc.com/practice4'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# 获取所有电话单元格
phone_cells = soup.find_all('td', class_='phone-data')

# 提取电话
phones = []
for cell in phone_cells:
    text = cell.text
    # 电话正则：1[3-9]\d{9}
    phone = re.findall(r'1[3-9]\d{9}', text)
    if phone:
        phones.extend(phone)

print("电话列表：", phones)

使用search方法

import re

text = "我的邮箱是zhangsan@example.com，电话是13812345678"

# 搜索邮箱
email_match = re.search(r'\w+@\w+\.\w+', text)
if email_match:
    print(f"找到邮箱：{email_match.group()}")

# 搜索电话
phone_match = re.search(r'1[3-9]\d{9}', text)
if phone_match:
    print(f"找到电话：{phone_match.group()}")

✅ 练习任务

1. 使用正则表达式提取页面中所有的邮箱地址

2. 使用正则表达式提取页面中所有的电话号码

3. 尝试使用re.match()、re.search()、re.findall()的区别

4. 练习贪婪匹配和非贪婪匹配的区别

💡 常用正则表达式

邮箱：\w+@\w+\.\w+ 或 [a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+
手机号：1[3-9]\d{9}
网址：https?://[^\s]+
日期：\d{4}-\d{2}-\d{2}

返回练习列表下一个练习 →