python 爬取企查查

管理员 2023-08-30 08:07:58 软件开发 0 ℃ 0 评论 1875字收藏

python 爬取企查查

Python爬虫是一种自动化的数据收集工具，在很多领域都有着广泛的利用。本文将介绍怎样使用Python爬虫技术爬取企查查网站的基本信息。

首先我们需要导入必要的库，包括requests和BeautifulSoup。requests库是一个用于发起HTTP要求的库，它可让我们要求特定的页面进行爬取；而BeautifulSoup库则是一个解析HTML和XML文档的库，它可以帮助我们提取需要的信息。

import requests
from bs4 import BeautifulSoup

接下来，我们需要定义目标网页的URL地址，并发起要求获得网页内容。

url = 'https://www.qichacha.com/firm_1b4d36ad9ad48dd31f8c079e722dbaca.html'
req = requests.get(url)
html = req.content

获得到网页内容后，我们就能够使用BeautifulSoup库解析HTML文档，提取需要的信息。

soup = BeautifulSoup(html, 'html.parser')
company_name = soup.find('h1', {'class': 'company-name'}).text.strip()
company_status = soup.find('div', {'class': 'status-icon'}).text.strip()
legal_person = soup.find('div', {'class': 'legalPersonName'}).text.strip()
registered_capital = soup.find('dl', {'class': 'tb-new'}).find_all('dd')[1].text.strip()

最后，我们可以将获得到的信息输出或保存到本地文件中。

print('公司名称：', company_name)
print('公司状态：', company_status)
print('法定代表人：', legal_person)
print('注册资本：', registered_capital)
with open('company_info.txt', 'w', encoding='utf⑻') as f:
f.write('公司名称：' + company_name + '\n')
f.write('公司状态：' + company_status + '\n')
f.write('法定代表人：' + legal_person + '\n')
f.write('注册资本：' + registered_capital + '\n')

通过以上代码，我们可以轻松地使用Python爬虫技术爬取企查查网站的基本信息。

文章来源：丸子建站

文章标题：python 爬取企查查

https://www.wanzijz.com/view/75278.html

python 爬取企查查

相关文章

随机看看

热门文章

热门标签