当前位置:网站首页>Crawler career from scratch (IV): climb the bullet curtain of station B through API
Crawler career from scratch (IV): climb the bullet curtain of station B through API
2022-07-03 09:19:00 【fishfuck】
List of articles
Preface
In this article, we will use b Stations api To crawl B The barrage of station video , This will be the fourth article in this series .
Display the page that needs to be crawled
Thought analysis
Reptilian thinking
There is nothing to analyze , Is to call the barrage pool interface found on the Internet , and bv No. 1 to the interface of barrage pool number .
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import json
from bs4 import BeautifulSoup
import re
(2). obtain cid( Barrage pool number )
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
(3). Bullet screen comments
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
3. The overall code
import requests
import json
from bs4 import BeautifulSoup
import re
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
cid = bvid2cid('BV1gp4y1e7cE')
cid2data(cid)
Crawling results
You can see , This climb was very successful
边栏推荐
- LeetCode 871. Minimum refueling times
- Common penetration test range
- excel一小时不如JNPF表单3分钟,这样做报表,领导都得点赞!
- AcWing 785. 快速排序(模板)
- LeetCode 30. Concatenate substrings of all words
- STM32F103 can learning record
- LeetCode 1089. 复写零
- In the digital transformation, what problems will occur in enterprise equipment management? Jnpf may be the "optimal solution"
- [point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
- 【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
猜你喜欢
Computing level network notes
Move anaconda, pycharm and jupyter notebook to mobile hard disk
Sword finger offer II 029 Sorted circular linked list
PIC16F648A-E/SS PIC16 8位 微控制器,7KB(4Kx14)
[point cloud processing paper crazy reading classic version 10] - pointcnn: revolution on x-transformed points
剑指 Offer II 029. 排序的循环链表
Build a solo blog from scratch
LeetCode 532. K-diff number pairs in array
[point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
Data mining 2021-4-27 class notes
随机推荐
Tag paste operator (#)
String splicing method in shell
The "booster" of traditional office mode, Building OA office system, was so simple!
Liteide is easy to use
LeetCode 75. Color classification
【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说
Introduction to the basic application and skills of QT
低代码起势,这款信息管理系统开发神器,你值得拥有!
Jenkins learning (I) -- Jenkins installation
Basic knowledge of network security
LeetCode 57. Insert interval
[graduation season | advanced technology Er] another graduation season, I change my career as soon as I graduate, from animal science to programmer. Programmers have something to say in 10 years
npm install安装依赖包报错解决方法
Recommend a low code open source project of yyds
教育信息化步入2.0,看看JNPF如何帮助教师减负,提高效率?
LeetCode 30. Concatenate substrings of all words
dried food! What problems will the intelligent management of retail industry encounter? It is enough to understand this article
一个优秀速开发框架是什么样的?
Temper cattle ranking problem
Linxu learning (4) -- Yum and apt commands