当前位置:网站首页>Crawler career from scratch (IV): climb the bullet curtain of station B through API
Crawler career from scratch (IV): climb the bullet curtain of station B through API
2022-07-03 09:19:00 【fishfuck】
List of articles
Preface
In this article, we will use b Stations api To crawl B The barrage of station video , This will be the fourth article in this series .
Display the page that needs to be crawled

Thought analysis
Reptilian thinking
There is nothing to analyze , Is to call the barrage pool interface found on the Internet , and bv No. 1 to the interface of barrage pool number .
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import json
from bs4 import BeautifulSoup
import re
(2). obtain cid( Barrage pool number )
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
(3). Bullet screen comments
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
3. The overall code
import requests
import json
from bs4 import BeautifulSoup
import re
def bvid2cid(bvid): # Get video cid
url = "https://api.bilibili.com/x/player/pagelist?bvid=" + str(bvid) + "&jsonp=jsonp"
r = requests.get(url)
dirt = json.loads(r.text)
cid = dirt['data'][0]['cid']
return cid
def cid2data(cid):
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' + str(cid)
r = requests.get(url=url)
r.encoding = 'utf-8'
html = BeautifulSoup(r.text, 'html5lib')
ds = html.find_all('d')
said = '.*">(.*)</d>.*'
for d in ds:
with open(str(cid) + '.txt', 'a', encoding='utf-8') as f:
f.write(re.findall(said, str(d))[0] + '\n')
cid = bvid2cid('BV1gp4y1e7cE')
cid2data(cid)
Crawling results


You can see , This climb was very successful
边栏推荐
- Problems in the implementation of lenet
- State compression DP acwing 291 Mondrian's dream
- Move anaconda, pycharm and jupyter notebook to mobile hard disk
- Noip 2002 popularity group selection number
- 剑指 Offer II 029. 排序的循环链表
- We have a common name, XX Gong
- Wonderful review | i/o extended 2022 activity dry goods sharing
- LeetCode 871. 最低加油次数
- String splicing method in shell
- PIC16F648A-E/SS PIC16 8位 微控制器,7KB(4Kx14)
猜你喜欢

【点云处理之论文狂读经典版8】—— O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

Vscode connect to remote server
【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说

LeetCode 513. Find the value in the lower left corner of the tree

On February 14, 2022, learn the imitation Niuke project - develop the registration function
![[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition](/img/94/2ab1feb252dc84c2b4fcad50a0803f.png)
[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition

2022-2-13 learn the imitation Niuke project - Project debugging skills
![[point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion](/img/76/b92fe4549cacba15c113993a07abb8.png)
[point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion

Jenkins learning (I) -- Jenkins installation

Install third-party libraries such as Jieba under Anaconda pytorch
随机推荐
AcWing 786. 第k个数
Instant messaging IM is the countercurrent of the progress of the times? See what jnpf says
【Kotlin疑惑】在Kotlin类中重载一个算术运算符,并把该运算符声明为扩展函数会发生什么?
LeetCode 438. 找到字符串中所有字母异位词
Recommend a low code open source project of yyds
即时通讯IM,是时代进步的逆流?看看JNPF怎么说
2022-2-13 learn the imitation Niuke project - Project debugging skills
Explanation of the answers to the three questions
AcWing 787. 归并排序(模板)
LeetCode 715. Range module
【点云处理之论文狂读经典版12】—— FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation
Jenkins learning (III) -- setting scheduled tasks
AcWing 788. Number of pairs in reverse order
[point cloud processing paper crazy reading classic version 14] - dynamic graph CNN for learning on point clouds
[point cloud processing paper crazy reading classic version 7] - dynamic edge conditioned filters in revolutionary neural networks on Graphs
[point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
We have a common name, XX Gong
Introduction to the usage of getopts in shell
Use the interface colmap interface of openmvs to generate the pose file required by openmvs mvs
教育信息化步入2.0,看看JNPF如何帮助教师减负,提高效率?