当前位置:网站首页>Crawler career from scratch (IV): climb the bullet curtain of station B through API
Crawler career from scratch (IV): climb the bullet curtain of station B through API
2022-07-03 09:19:00 【fishfuck】
List of articles
Preface
In this article, we will use b Stations api To crawl B The barrage of station video , This will be the fourth article in this series .
Display the page that needs to be crawled

Thought analysis
Reptilian thinking
There is nothing to analyze , Is to call the barrage pool interface found on the Internet , and bv No. 1 to the interface of barrage pool number .
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import json
from bs4 import BeautifulSoup
import re
(2). obtain cid( Barrage pool number )
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
(3). Bullet screen comments
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
3. The overall code
import requests
import json
from bs4 import BeautifulSoup
import re
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
cid = bvid2cid('BV1gp4y1e7cE')
cid2data(cid)
Crawling results


You can see , This climb was very successful
边栏推荐
- Overview of database system
- 【Kotlin学习】高阶函数的控制流——lambda的返回语句和匿名函数
- 干货!零售业智能化管理会遇到哪些问题?看懂这篇文章就够了
- 剑指 Offer II 029. 排序的循环链表
- [point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation
- Recommend a low code open source project of yyds
- What is an excellent fast development framework like?
- Use the interface colmap interface of openmvs to generate the pose file required by openmvs mvs
- Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!
- Data mining 2021-4-27 class notes
猜你喜欢

Digital statistics DP acwing 338 Counting problem

In the digital transformation, what problems will occur in enterprise equipment management? Jnpf may be the "optimal solution"

LeetCode 30. 串联所有单词的子串

Sword finger offer II 091 Paint the house

Recommend a low code open source project of yyds
【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说

Vs2019 configuration opencv3 detailed graphic tutorial and implementation of test code

Redis learning (I)
![[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition](/img/94/2ab1feb252dc84c2b4fcad50a0803f.png)
[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition

AcWing 787. Merge sort (template)
随机推荐
干货!零售业智能化管理会遇到哪些问题?看懂这篇文章就够了
Just graduate student reading thesis
[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition
LeetCode 715. Range module
Tag paste operator (#)
MySQL installation and configuration (command line version)
Problems in the implementation of lenet
What is the difference between sudo apt install and sudo apt -get install?
Data mining 2021-4-27 class notes
The "booster" of traditional office mode, Building OA office system, was so simple!
AcWing 788. 逆序对的数量
即时通讯IM,是时代进步的逆流?看看JNPF怎么说
【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
Linxu learning (4) -- Yum and apt commands
LeetCode 535. Encryption and decryption of tinyurl
C language programming specification
Move anaconda, pycharm and jupyter notebook to mobile hard disk
[point cloud processing paper crazy reading cutting-edge version 12] - adaptive graph revolution for point cloud analysis
Jenkins learning (I) -- Jenkins installation
LeetCode 1089. 复写零