当前位置:网站首页>Crawl Zhejiang industry and trade news page
Crawl Zhejiang industry and trade news page
2022-07-04 10:26:00 【weixin_ forty-six million three hundred and sixty-four thousand】
import requests
import chardet
from lxml import etree
def jiexi(rep):
et = etree.HTML(rep.text)
biaoti = et.xpath("//h2/text()")[0]
zuozhe = et.xpath("//div[@class=‘zz’][1]/text()")[0].split()[0].lstrip(“ author :”)
laiyuan = et.xpath("//div[@class=‘zz’][1]/text()")[0].split()[1].lstrip(“ source :”)
shijian = et.xpath("//div[@class=‘zz’][2]/text()")[0].split("\xa0\xa0")[1].lstrip(“ Release time :”)
zw = “”
for w in et.xpath("//div[@class=‘nr-content-con fl’]/div[1]//text()"):
zw = zw + w
zw = zw.split()
zhengwen = “”
for z in zw:
zhengwen = zhengwen + z
d = {}
d[“biaoti”] = biaoti
d[“zuozhe”] = zuozhe
d[“laiyuan”] = laiyuan
d[“shijian”] = shijian
d[“zhengwen”] = zhengwen
return d
url_list = []
for i in range(0, 10):
if i == 0:
url = “http://www.zjitc.net/xwzx/xyxw.htm”
url_list.append(url)
else:
url = “http://www.zjitc.net/xwzx/xyxw/” + str(359 - i) + “.htm”
url_list.append(url)
for url in url_list:
headers = {“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36”}
response = requests.get(url, headers=headers)
response.encoding = chardet.detect(response.content)[“encoding”]
et = etree.HTML(response.text)
ul_list = []
ul_head = “http://www.zjitc.net/”
for li in et.xpath("//div[@class=‘right-1’]/ul/li"):
if li.xpath("./a/@href")[0].startswith("…/…/"):
ul_list.append(ul_head + li.xpath("./a/@href")[0].lstrip("…/…/"))
else:
ul_list.append(ul_head + li.xpath("./a/@href")[0].lstrip("…/"))
wenzhang = []
for ul in ul_list:
rep = requests.get(ul, headers=headers)
rep.encoding = chardet.detect(rep.content)[“encoding”]
d = jiexi(rep)
wenzhang.append(d)
print(wenzhang)
边栏推荐
- Rhcsa operation
- Devop basic command
- MySQL develops small mall management system
- Advanced technology management - how to design and follow up the performance of students at different levels
- Container cloud notes
- Reprint: summation formula of proportional series and its derivation process
- Application of safety monitoring in zhizhilu Denggan reservoir area
- Remove linked list elements
- Deep learning 500 questions
- Normal vector point cloud rotation
猜你喜欢
MPLS: multi protocol label switching
Intelligent gateway helps improve industrial data acquisition and utilization
2. Data type
Remove linked list elements
Hands on deep learning (43) -- machine translation and its data construction
Four characteristics and isolation levels of database transactions
Static comprehensive experiment ---hcip1
leetcode1-3
BGP ---- border gateway routing protocol ----- basic experiment
Hands on deep learning (40) -- short and long term memory network (LSTM)
随机推荐
Exercise 8-7 string sorting (20 points)
Exercise 7-4 find out the elements that are not common to two arrays (20 points)
Normal vector point cloud rotation
System.currentTimeMillis() 和 System.nanoTime() 哪个更快?别用错了!
Whether a person is reliable or not, closed loop is very important
Pcl:: fromrosmsg alarm failed to find match for field 'intensity'
El Table Radio select and hide the select all box
Dynamic address book
有老师知道 继承RichSourceFunction自定义读mysql怎么做增量吗?
Hands on deep learning (37) -- cyclic neural network
Reprint: summation formula of proportional series and its derivation process
Differences among opencv versions
Latex insert picture, insert formula
Batch distribution of SSH keys and batch execution of ansible
DDL statement of MySQL Foundation
uniapp---初步使用websocket(长链接实现)
Network disk installation
For programmers, if it hurts the most...
system design
【Day1】 deep-learning-basics