python 爬淘宝店铺

管理员 2023-08-24 08:12:28 软件开发 0 ℃ 0 评论 2075字收藏

python 爬淘宝店铺

python的爬虫模块可让我们轻松地爬取淘宝店铺的商品信息，下面就来简单介绍一下爬取淘宝店铺的方法。

import requests
from lxml import etree
import json
def get_shop_info(shop_url):
# 构造要求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/58.0.3029.110 Safari/537.3'}
# 获得店铺页面html
html = requests.get(shop_url, headers=headers).text
# 使用xpath获得店铺信息
xml_data = etree.HTML(html)
# 店铺名称
shop_name = xml_data.xpath('//div[@class="tb-shop-name"]/dl/dd/strong/a/text()')
# 商品列表
goods_list = xml_data.xpath('//div[@class="shop-hesper-bd gridview"]/div')
goods_info_list = []
for goods in goods_list:
# 商品标题
title = goods.xpath('.//div[@class="item"]/a/img/@alt')[0]
# 商品价格
price = goods.xpath('.//div[@class="item"]/div[@class="price g_price g_price-highlight"]/strong/text()')[0]
# 商品url
url = goods.xpath('.//div[@class="item"]/a/@href')[0]
# 商品图片
img = goods.xpath('.//div[@class="item"]/a/img/@src')[0]
# 组装每一个商品的信息
goods_info = {'title': title, 'price': price, 'url': url, 'img': img}
goods_info_list.append(goods_info)
# 组装店铺信息
shop_info = {'shop_name': shop_name, 'goods_list': goods_info_list}
return shop_info
if __name__ == '__main__':
shop_url = 'https://********.taobao.com/search.htm?orderType=hotsell_desc'
shop_info = get_shop_info(shop_url)
print(json.dumps(shop_info, ensure_ascii=False, indent=4))

这段代码是一个简单的爬虫程序，通过输入淘宝店铺的链接，便可返回该店铺的商品列表和店铺名称的JSON数据，其中包括商品的标题、价格、url和图片地址。

需要注意的是，在爬取淘宝店铺商品信息时，我们需要使用requests模块来访问店铺页面，然后使用xpath语法来提取信息。由于淘宝的页面结构会不断变化，因此我们需要时刻留意页面的结构，并对代码进行相应的修改和更新。

文章来源：丸子建站

文章标题：python 爬淘宝店铺

https://www.wanzijz.com/view/73856.html

python 爬淘宝店铺

相关文章

随机看看

热门文章

热门标签