这一关需要爬取两层页面:
1. 先提取所有详情页的链接
2. 循环访问每个详情页
3. 从详情页提取完整信息
import requests
from bs4 import BeautifulSoup
# 第一步:获取列表页
url = 'https://req.haleibc.com/level5'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 第二步:提取详情页链接
detail_links = []
for link in soup.find_all('a', class_='detail-link'):
detail_url = 'https://req.haleibc.com' + link['href']
detail_links.append(detail_url)
# 第三步:访问每个详情页
for detail_url in detail_links:
response = requests.get(detail_url)
soup = BeautifulSoup(response.text, 'html.parser')
# 提取详情页的信息...
应该爬取到 5 本书的详细信息
每本书包含:书名、作者、价格、简介、出版信息等