当前位置:网站首页>Crawler crawls Sina Weibo data
Crawler crawls Sina Weibo data
2022-06-25 03:54:00 【Blockchain research】
Tools : Cloud gathering reptile
The goal is : Capture all microblogs of a blogger
Analyze the structure of the web page :
The idea of our crawling is to simulate the browser to automatically access the page crawling .
Let's take a look at the page structure , First, each Weibo list , Three or four pull-down loads are required , If there is a page turning button at the bottom , Then judge that this page is loaded .

Login problem
To crawl, you need to log in , How to login ?
No verification code is required for login , If you make a mistake , Will ask you to enter the verification code , So there is no technical difficulty in logging in .
We can create one 【 Login module 】, First log in with a browser , In the future, all pages will be shared based on this browser cookie Go grab it .

Flow chart design :

We don't need the details page of Weibo . So there is no detail page for the whole crawler process , The data is extracted from the list .
Crawling results :
Total cost 5 Minutes of time , Grab it 10 A page , 400 microblogs in total . Because my microblog is not posted very often .
The data are as follows :

Make a simple word cloud :

边栏推荐
- 居家办公之后才明白的时间管理 | 社区征文
- Program. Launch (xxx) open file
- Is it safe to open a stock account with the customer's haircut account link? Tell me what you know
- x86 CPU,危!最新漏洞引发热议,黑客可远程窃取密钥,英特尔“全部处理器”受影响...
- OpenSUSE environment PHP connection Oracle
- Is it safe to open an account online? Online and other answers
- 俄罗斯AIRI研究院等 | SEMA:利用深度迁移学习进行抗原B细胞构象表征预测
- JSP cannot be resolved to a type error reporting solution
- Zuckerberg's latest VR prototype is coming. It is necessary to confuse virtual reality with reality
- Tensorflow, danger! Google itself is the one who abandoned it
猜你喜欢

The programmer reality show is coming again! Hulan, as the host, carried the lamp to fill the knowledge. The SSS boss had a bachelor's degree in pharmacy. Zhu Jun and Zhang Min from Tsinghua joined th

Lao Ye's blessing

What is an SSL certificate and what are the benefits of having an SSL certificate?

Does it count as staying up late to sleep at 2:00 and get up at 10:00? Unless you can do it every day

亚马逊在中国的另一面

About PLSQL error initialization failure

中国天眼发现地外文明可疑信号,马斯克称星舰7月开始轨道试飞,网信办:APP不得强制要求用户同意处理个人信息,今日更多大新闻在此...

Lu Qi invests in quantum computing for the first time

协作+安全+存储,云盒子助力深圳爱德泰重构数据中心

AI越进化越跟人类大脑像!Meta找到了机器的“前额叶皮层”,AI学者和神经科学家都惊了...
随机推荐
OpenSUSE environment variable settings
Zuckerberg's latest VR prototype is coming. It is necessary to confuse virtual reality with reality
Demonstration of combination of dream CAD cloud map and GIS
MySQL modifies and deletes tables in batches according to the table prefix
可能是拿反了的原因
MySQL modifies and deletes tables in batches according to the table prefix
亚马逊在中国的另一面
发布功能完成02《ivx低代码签到系统制作》
How to choose a securities company when opening an account with a compass? Which is safer
Standing at the center of the storm: how to change the engine of Tencent
Why can banana be a random number generator? Because it is the "king of radiation" in the fruit industry
墨天轮访谈 | IvorySQL王志斌—IvorySQL,一个基于PostgreSQL的兼容Oracle的开源数据库
x86 CPU,危!最新漏洞引发热议,黑客可远程窃取密钥,英特尔“全部处理器”受影响...
程序员真人秀又来了!呼兰当主持挑灯狂补知识,SSS大佬本科竟是药学,清华朱军张敏等加入导师团...
Comprehensive assignment of thesis writing instruction of Dongcai
9 necessary soft skills for program ape career development
Internet Explorer died, and netizens started to build a true tombstone
Disassembly of Weima prospectus: the electric competition has ended and the intelligent qualifying has just begun
About PLSQL error initialization failure
居家办公之后才明白的时间管理 | 社区征文