当前位置:网站首页>Crawler career from scratch (IV): climb the bullet curtain of station B through API
Crawler career from scratch (IV): climb the bullet curtain of station B through API
2022-07-03 09:19:00 【fishfuck】
List of articles
Preface
In this article, we will use b Stations api To crawl B The barrage of station video , This will be the fourth article in this series .
Display the page that needs to be crawled

Thought analysis
Reptilian thinking
There is nothing to analyze , Is to call the barrage pool interface found on the Internet , and bv No. 1 to the interface of barrage pool number .
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import json
from bs4 import BeautifulSoup
import re
(2). obtain cid( Barrage pool number )
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
(3). Bullet screen comments
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
3. The overall code
import requests
import json
from bs4 import BeautifulSoup
import re
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
cid = bvid2cid('BV1gp4y1e7cE')
cid2data(cid)
Crawling results


You can see , This climb was very successful
边栏推荐
- LeetCode 535. TinyURL 的加密与解密
- 2022-2-14 learning xiangniuke project - Session Management
- [point cloud processing paper crazy reading frontier edition 13] - gapnet: graph attention based point neural network for exploring local feature
- Data mining 2021-4-27 class notes
- AcWing 787. 归并排序(模板)
- Common penetration test range
- Too many open files solution
- 2022-2-14 learning the imitation Niuke project - send email
- With low code prospect, jnpf is flexible and easy to use, and uses intelligence to define a new office mode
- Jenkins learning (III) -- setting scheduled tasks
猜你喜欢

Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!

Basic knowledge of network security

LeetCode 324. 摆动排序 II

Computing level network notes

Pic16f648a-e/ss PIC16 8-bit microcontroller, 7KB (4kx14)

Jenkins learning (III) -- setting scheduled tasks

Digital statistics DP acwing 338 Counting problem

LeetCode 30. Concatenate substrings of all words

【Kotlin学习】类、对象和接口——定义类继承结构
![[point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis](/img/fa/36d28b754a9f380bfd86d4562268c3.png)
[point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
随机推荐
Linxu learning (4) -- Yum and apt commands
【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说
LeetCode 57. Insert interval
Discussion on enterprise informatization construction
Jenkins learning (II) -- setting up Chinese
Shell script kills the process according to the port number
Method of intercepting string in shell
MySQL installation and configuration (command line version)
Overview of database system
The "booster" of traditional office mode, Building OA office system, was so simple!
AcWing 788. Number of pairs in reverse order
LeetCode 715. Range 模块
【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
Jenkins learning (III) -- setting scheduled tasks
LeetCode 1089. Duplicate zero
AcWing 788. 逆序对的数量
State compression DP acwing 91 Shortest Hamilton path
LeetCode 75. 颜色分类
教育信息化步入2.0,看看JNPF如何帮助教师减负,提高效率?
State compression DP acwing 291 Mondrian's dream