当前位置:网站首页>Crawl Zhejiang industry and trade news page
Crawl Zhejiang industry and trade news page
2022-07-04 10:26:00 【weixin_ forty-six million three hundred and sixty-four thousand】
import requests
import chardet
from lxml import etree
def jiexi(rep):
et = etree.HTML(rep.text)
biaoti = et.xpath("//h2/text()")[0]
zuozhe = et.xpath("//div[@class=‘zz’][1]/text()")[0].split()[0].lstrip(“ author :”)
laiyuan = et.xpath("//div[@class=‘zz’][1]/text()")[0].split()[1].lstrip(“ source :”)
shijian = et.xpath("//div[@class=‘zz’][2]/text()")[0].split("\xa0\xa0")[1].lstrip(“ Release time :”)
zw = “”
for w in et.xpath("//div[@class=‘nr-content-con fl’]/div[1]//text()"):
zw = zw + w
zw = zw.split()
zhengwen = “”
for z in zw:
zhengwen = zhengwen + z
d = {}
d[“biaoti”] = biaoti
d[“zuozhe”] = zuozhe
d[“laiyuan”] = laiyuan
d[“shijian”] = shijian
d[“zhengwen”] = zhengwen
return d
url_list = []
for i in range(0, 10):
if i == 0:
url = “http://www.zjitc.net/xwzx/xyxw.htm”
url_list.append(url)
else:
url = “http://www.zjitc.net/xwzx/xyxw/” + str(359 - i) + “.htm”
url_list.append(url)
for url in url_list:
headers = {“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36”}
response = requests.get(url, headers=headers)
response.encoding = chardet.detect(response.content)[“encoding”]
et = etree.HTML(response.text)
ul_list = []
ul_head = “http://www.zjitc.net/”
for li in et.xpath("//div[@class=‘right-1’]/ul/li"):
if li.xpath("./a/@href")[0].startswith("…/…/"):
ul_list.append(ul_head + li.xpath("./a/@href")[0].lstrip("…/…/"))
else:
ul_list.append(ul_head + li.xpath("./a/@href")[0].lstrip("…/"))
wenzhang = []
for ul in ul_list:
rep = requests.get(ul, headers=headers)
rep.encoding = chardet.detect(rep.content)[“encoding”]
d = jiexi(rep)
wenzhang.append(d)
print(wenzhang)
边栏推荐
- Native div has editing ability
- uniapp 小于1000 按原数字显示 超过1000 数字换算成10w+ 1.3k+ 显示
- System. Currenttimemillis() and system Nanotime (), which is faster? Don't use it wrong!
- 按键精灵跑商学习-商品数量、价格提醒、判断背包
- If the uniapp is less than 1000, it will be displayed according to the original number. If the number exceeds 1000, it will be converted into 10w+ 1.3k+ display
- 【FAQ】华为帐号服务报错 907135701的常见原因总结和解决方法
- Rhsca day 11 operation
- Exercise 8-10 output student grades (20 points)
- Hands on deep learning (44) -- seq2seq principle and Implementation
- If you don't know these four caching modes, dare you say you understand caching?
猜你喜欢

Online troubleshooting

Rhcsa day 10 operation

Advanced technology management - how to design and follow up the performance of students at different levels

Pcl:: fromrosmsg alarm failed to find match for field 'intensity'

leetcode842. Split the array into Fibonacci sequences

基于线性函数近似的安全强化学习 Safe RL with Linear Function Approximation 翻译 1
Si vous ne connaissez pas ces quatre modes de mise en cache, vous osez dire que vous connaissez la mise en cache?

Basic principle of servlet and application of common API methods

Reprint: summation formula of proportional series and its derivation process

Idea SSH channel configuration
随机推荐
leetcode842. Split the array into Fibonacci sequences
对于程序员来说,伤害力度最大的话。。。
Use C to extract all text in PDF files (support.Net core)
Exercise 7-4 find out the elements that are not common to two arrays (20 points)
Hands on deep learning (III) -- Torch Operation (sorting out documents in detail)
Hands on deep learning (43) -- machine translation and its data construction
Differences among opencv versions
Histogram equalization
leetcode1-3
PHP code audit 3 - system reload vulnerability
Application of safety monitoring in zhizhilu Denggan reservoir area
DDL statement of MySQL Foundation
Use the data to tell you where is the most difficult province for the college entrance examination!
Delayed message center design
If you don't know these four caching modes, dare you say you understand caching?
BGP advanced experiment
Static comprehensive experiment ---hcip1
Exercise 9-5 address book sorting (20 points)
uniapp 小于1000 按原数字显示 超过1000 数字换算成10w+ 1.3k+ 显示
Legion is a network penetration tool