当前位置:网站首页>Go crawler framework -colly actual combat (I)
Go crawler framework -colly actual combat (I)
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Make complaints
I'm going to use... These days go Write about reptiles , It used to be python,python Write the schedule , My chicken also has egg pain , Just learned again go, Just want to experience go Write about the pleasure of reptiles .
Before github According to other people's ideas , Write a simple concurrent crawler framework , Yes go Concurrent , I learned a little , Stumble across colly, Compare with others , Reading what I wrote , alas …
2.colly A brief introduction to the use of
github: https://github.com/gocolly/colly
Official website : http://go-colly.org/
2.1 colly Introduce
colly It's a reptile frame , Through him , We can quickly implement a concurrent crawler , Same as easy to understand , Easy to expand .
colly The main thing is Collector, adopt Collector To collect the accessed data , And store it .( Process oriented )
2.1 colly Callback in the process of fetching a page
- Before collector request : onRequest()
- Collector fetch failed :onError()
- After the collector responds :onResponse()
- Collector received HTML:onHTML()
- Collector received XML: onXML()
- The last callback executed after the collector finishes fetching :onScraped()
Through these callbacks , We can quickly write a reptile , There are also many examples on the official website , For our reference , Not really. Look at the source code .
2.2 colly in Collector Configuration of
- The specific configuration information can be viewed on the official website , Just a few words here .
- Crawler domain name crawl restrictions , Maximum depth limit , Whether to crawl duplicate websites , Avoid the dead cycle .
- Set asynchronous , Concurrent number , Set random delay time, etc
- http Whether the long connection is maintained in the , Limit the number of connections, etc .
- It also supports distributed .
- By extending the , We can also set random user-agent,reffer.
2.3 colly Storage in
- The default storage is in memory .
- The official website recommends storing in redis in
- It can also be stored in sqlite3,mongo in , There are relevant examples on the official website .
- colly-sqlite3 Storage
- colly-mongo Storage
3. ending
- If you want to know more about , Take a look at this article :go The crawler frame colly Source code and software architecture analysis , have a look colly Design structure of
- Colly The source code parsing —— Combined with examples to analyze the underlying implementation Under the analysis of colly The main functions in the source code .
Tomorrow, I will write about crawling with this framework leetCode Topics on .
边栏推荐
- 【排行榜】Carla leaderboard 排行榜 运行与参与手把手教学
- 5-minute NLP: summary of 3 pre training libraries for rapid realization of NER
- [distributed system design profile (2)] kV raft
- 【面试题】instancof和getClass()的区别
- 无人驾驶: 对多传感器融合的一些思考
- 美国众议院议员:数字美元将支持美元作为全球储备货币
- Time unified system
- The third generation of power electronics semiconductors: SiC MOSFET learning notes (V) research on driving power supply
- 技术分享| WVP+ZLMediaKit实现摄像头GB28181推流播放
- What is the difference between one way and two way ANOVA analysis, and how to use SPSS or prism for statistical analysis
猜你喜欢

What are the advantages of VR panoramic production? Why is it favored?

C# Winform 最大化遮挡任务栏和全屏显示问题

∞符号线条动画canvasjs特效

im即时通讯开发应用保活之进程防杀

5G dtu无线通信模块的电力应用
Fuxin Kunpeng joins in, and dragon lizard community welcomes a new partner in format document technical service

After 5 years of software testing in didi and ByteDance, it's too real

Hibernate学习3 - 自定义SQL

In the past 5 years, from "Diandian" to the current test development, my success is worth learning from.

节奏快?压力大?VR全景客栈带你体验安逸生活
随机推荐
Current situation and development prospect forecast report of global and Chinese tetrahydrofurfuryl alcohol acetate industry from 2022 to 2028
Signal integrity (SI) power integrity (PI) learning notes (XXV) differential pair and differential impedance (V)
Unmanned driving: Some Thoughts on multi-sensor fusion
Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform
Svg line animation background JS effect
Tongji and Ali won the CVPR best student thesis, lifeifei won the Huang xutao award, and nearly 6000 people attended the offline conference
UE4 WebBrowser chart cannot display problems
中低速航空航天电子总线概述
D omit parameter name
VR全景制作的优势是什么?为什么能得到青睐?
Tutorial details | how to edit and set the navigation function in the coolman system?
Analysis report on production and marketing demand and investment forecast of China's boron nitride industry from 2022 to 2028
The third generation of power electronics semiconductors: SiC MOSFET learning notes (V) research on driving power supply
5年,从“点点点”到现在的测试开发,我的成功值得每一个借鉴。
UE4 WebBrowser图表不能显示问题
What is the difference between one way and two way ANOVA analysis, and how to use SPSS or prism for statistical analysis
MySQL log management
Fast pace? high pressure? VR panoramic Inn brings you a comfortable life
[proteus simulation] example of using timer 0 as a 16 bit counter
Time unified system