当前位置:网站首页>Go crawler framework -colly actual combat (II) -- Douban top250 crawling
Go crawler framework -colly actual combat (II) -- Douban top250 crawling
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Try to use it today colly Come and crawl for the watercress Top 250!( Everyone likes to practice with him …)
Go straight to the code , There are notes on it .
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
"regexp"
"strings"
"time"
)
func main() {
t := time.Now()
number := 1
c := colly.NewCollector(func(c *colly.Collector) {
extensions.RandomUserAgent(c) // Set random header
c.Async=true
},
// Filter url, It's not https://movie.douban.com/top250?start=0&filter= Of url
colly.URLFilters(
regexp.MustCompile("^(https://movie\\.douban\\.com/top250)\\?start=[0-9].*&filter="),
),
) // Create collector
// The format of the response is HTML, Extract the links in the page
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
//fmt.Printf("find link: %s\n", e.Request.AbsoluteURL(link))
c.Visit(e.Request.AbsoluteURL(link))
})
// Get movie information
c.OnHTML("div.info", func(e *colly.HTMLElement) {
e.DOM.Each(func(i int, selection *goquery.Selection) {
movies := selection.Find("span.title").First().Text()
director := strings.Join(strings.Fields(selection.Find("div.bd p").First().Text()), " ")
quote := selection.Find("p.quote span.inq").Text()
fmt.Printf("%d --> %s:%s %s\n", number, movies, director, quote)
number += 1
})
})
c.OnError(func(response *colly.Response, err error) {
fmt.Println(err)
})
c.Visit("https://movie.douban.com/top250?start=0&filter=")
c.Wait()
fmt.Printf(" Spend time :%s",time.Since(t))
}

github Address :github Address
I think it is very convenient to use this framework , Tomorrow, I will try to crawl some websites that need to log in !
边栏推荐
- 融合模型权限管理设计方案
- [interview question] what is a transaction? What are dirty reads, unrepeatable reads, phantom reads, and how to deal with several transaction isolation levels of MySQL
- Ansible及playbook的相关操作
- Analysis report on the development trend and Prospect of cetamide industry in the world and China from 2022 to 2028
- 信号完整性(SI)电源完整性(PI)学习笔记(一)信号完整性分析概论
- Why do more and more physical stores use VR panorama? What are the advantages?
- VIM use command
- ∞符号线条动画canvasjs特效
- Tongji and Ali won the CVPR best student thesis, lifeifei won the Huang xutao award, and nearly 6000 people attended the offline conference
- Tape SVG animation JS effect
猜你喜欢

Hibernate学习2 - 懒加载(延迟加载)、动态SQL参数、缓存

Fast pace? high pressure? VR panoramic Inn brings you a comfortable life
Design and practice of vivo server monitoring architecture

OTT营销之风正盛,商家到底该怎么投?

离散数学及其应用 2018-2019学年春夏学期期末考试 习题详解

部门新来的00后真是卷王,工作没两年,跳槽到我们公司起薪18K都快接近我了
5-minute NLP: summary of 3 pre training libraries for rapid realization of NER
Outer screen and widescreen wasted? Harmonyos folding screen design specification teaches you to use it

svg线条动画背景js特效

在滴滴和字节跳动干了 5年软件测试,太真实…
随机推荐
Hibernate learning 3 - custom SQL
In the past 5 years, from "Diandian" to the current test development, my success is worth learning from.
Reservoir dam safety monitoring
为什么越来越多的实体商铺用VR全景?优势有哪些?
Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform
5年,从“点点点”到现在的测试开发,我的成功值得每一个借鉴。
Investment analysis and prospect forecast report of global and Chinese triglycine sulfate industry from 2022 to 2028
Wx applet jump page
为什么生命科学企业都在陆续上云?
Fast pace? high pressure? VR panoramic Inn brings you a comfortable life
linux 系统redis常用命令
vim使用命令
怎么把wps表格里某一列有重复项的整行删掉
走近Harvest Moon:Moonbeam DeFi狂欢会
Is it so difficult to calculate the REM size of the web page according to the design draft?
Signal integrity (SI) power integrity (PI) learning notes (XXV) differential pair and differential impedance (V)
D does not require opapply() as a domain
时间统一系统
Ott marketing is booming. How should businesses invest?
JPA learning 2 - core annotation, annotation addition, deletion, modification and query, list query result return type, one to many, many to one, many to many