当前位置:网站首页>Baidu Post Bar crawler gets web pages
Baidu Post Bar crawler gets web pages
2022-07-25 06:44:00 【You in Yangzhou】
""" tieba """
import requests
import re
from urllib import parse
import time
import random
class BaiduSpider:
def __init__(self):
self.url = 'http://tieba.baidu.com/f?kw={}&pn={}'
self.headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1 QQBrowser/6.9.11079.201'}
def get_html(self,url):
html = requests.get(url=url,headers=self.headers).content.decode('utf-8')
return html
def parse_html(self):
pass
def save_html(self,filename,html):
with open(filename,'w') as f:
f.write(html)
def run(self):
name = input(' Please input the name of the post bar ')
# page = int(input(" Please enter the page number "))
start = int(input(' Start page '))
end = int(input(' End page '))
params = parse.quote(name)
for page in range(start,end+1):
pn = (page - 1) *50
url = self.url.format(params,pn)
html = self.get_html(url)
filename = '{} The first {} page .html'.format(name,page)
self.save_html(filename,html)
print(' complete ')
time.sleep(random.randint(1,8))
if __name__ == '__main__':
spider = BaiduSpider()
spider.run()
边栏推荐
- Kyligence Li Dong: from the data lake to the index middle stage, improve the ROI of data analysis
- 长安链Solidity智能合约调用原理分析
- 睡眠不足有哪些危害?
- Over adapter mode
- Health clock in daily reminder tired? Then let automation help you -- hiflow, application connection automation assistant
- JVM tuning summary -xms -xmx -xmn -xss
- 共模电感听过很多次,但是什么原理你们真的懂吗?
- It is said that screentogif is a GIF recording artifact, but I don't know that its strength is far from here
- LeetCode46全排列(回溯入门)
- Tab bar toggle style
猜你喜欢

你了解PowerBI中的去年同期吗

Software engineering in Code: regular expression ten step clearance

labelme标注不同物体显示不同颜色以及批量转换

Can communication test based on STM32: turn the globe

JS array = number assignment changes by one, causing the problem of changing the original array

【愚公系列】2022年7月 Go教学课程 016-运算符之逻辑运算符和其他运算符

Easy gene chip SEQ analysis method: practical workflow and advanced applications

【C】 Program environment and pretreatment

Upload and download multiple files using web APIs
![[datawhale202207] reinforcement learning: strategy gradient and near end strategy optimization](/img/4e/aabc603e47977503a4bcc5d07b4c61.png)
[datawhale202207] reinforcement learning: strategy gradient and near end strategy optimization
随机推荐
探讨影响自动化测试成败的重要因素
Software engineering in Code: regular expression ten step clearance
C control open source library: download of metroframework
Baidu SEM bidding avoidance
Detailed explanation of the difference, working principle and basic structure between NMOS and PMOS
[add, delete, modify, and check the array base]
Prevention strategy of Chang'an chain Shuanghua transaction
CRC8 CRC16 table lookup method
2022深圳杯
Classic cases of static keywords and block blocks
How to convert multi row data into multi column data in MySQL
Labelme labels different objects, displays different colors and batch conversion
如何学习 C 语言?
【C】 Program environment and pretreatment
机器人工程-教学品质-如何判定
When the graduation season comes, are you ready? What are we going to do
Case ---- how efficient is the buffer stream compared with the ordinary input stream and output stream?
大话西游服务端启动注意事项
Create a new STM32 project and configure it - based on registers
MySQL index collation summary