python 爬取名著

管理员 2023-08-31 08:04:43 软件开发 0 ℃ 0 评论 2088字收藏

python 爬取名著

Python 作为一门强大的编程语言，可以用于爬取互联网上的很多文本资源。其中，可以利用 Python 爬取名著电子版来进行专业化的研究、学习等活动。下面将介绍怎样使用 Python 爬取名著。

import requests
from bs4 import BeautifulSoup
url = 'http://www.shicimingju.com/book/sanguoyanyi.html'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
titles = soup.select('.book-mulu >ul >li >a')
for title in titles:
url = 'http://www.shicimingju.com' + title['href']
article_html = requests.get(url).text
article_soup = BeautifulSoup(article_html, 'lxml')
article_title = article_soup.h1.text
article_content = article_soup.select('.chapter_content')[0].text
with open('san_guo_yan_yi.txt', 'a', encoding='utf⑻') as f:
f.write(article_title)
f.write('\n\n')
f.write(article_content)
f.write('\n\n')

上述代码中，我们首先利用 requests.get() 函数获得名著《三国演义》的 HTML 代码，然后通过 BeautifulSoup 库解析 HTML，获得到该书的目录列表（.book-mulu >ul >li >a）。接着，我们遍历目录列表并通过遍历取得每章节的链接地址，然后分别对每章节分别进行获得并写入 txt 文件中。其中，我们使用了 pre 标签来展现代码。

import requests
from bs4 import BeautifulSoup
url = 'http://www.shicimingju.com/book/hongloumeng.html'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
titles = soup.select('.book-mulu >ul >li >a')
for title in titles:
url = 'http://www.shicimingju.com' + title['href']
article_html = requests.get(url).text
article_soup = BeautifulSoup(article_html, 'lxml')
article_title = article_soup.h1.text
article_content = article_soup.select('.chapter_content')[0].text
with open('hong_lou_meng.txt', 'a', encoding='utf⑻') as f:
f.write(article_title)
f.write('\n\n')
f.write(article_content)
f.write('\n\n')

以上代码是用来爬取《红楼梦》的代码，只需更改 url 和文件名便可爬取您感兴趣的任何名著。

文章来源：丸子建站

文章标题：python 爬取名著

https://www.wanzijz.com/view/75613.html

python 爬取名著

相关文章

随机看看

热门文章

热门标签