当前位置:网站首页>Go crawler framework -colly actual combat (II) -- Douban top250 crawling
Go crawler framework -colly actual combat (II) -- Douban top250 crawling
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Try to use it today colly Come and crawl for the watercress Top 250!( Everyone likes to practice with him …)
Go straight to the code , There are notes on it .
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
"regexp"
"strings"
"time"
)
func main() {
t := time.Now()
number := 1
c := colly.NewCollector(func(c *colly.Collector) {
extensions.RandomUserAgent(c) // Set random header
c.Async=true
},
// Filter url, It's not https://movie.douban.com/top250?start=0&filter= Of url
colly.URLFilters(
regexp.MustCompile("^(https://movie\\.douban\\.com/top250)\\?start=[0-9].*&filter="),
),
) // Create collector
// The format of the response is HTML, Extract the links in the page
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
//fmt.Printf("find link: %s\n", e.Request.AbsoluteURL(link))
c.Visit(e.Request.AbsoluteURL(link))
})
// Get movie information
c.OnHTML("div.info", func(e *colly.HTMLElement) {
e.DOM.Each(func(i int, selection *goquery.Selection) {
movies := selection.Find("span.title").First().Text()
director := strings.Join(strings.Fields(selection.Find("div.bd p").First().Text()), " ")
quote := selection.Find("p.quote span.inq").Text()
fmt.Printf("%d --> %s:%s %s\n", number, movies, director, quote)
number += 1
})
})
c.OnError(func(response *colly.Response, err error) {
fmt.Println(err)
})
c.Visit("https://movie.douban.com/top250?start=0&filter=")
c.Wait()
fmt.Printf(" Spend time :%s",time.Since(t))
}

github Address :github Address
I think it is very convenient to use this framework , Tomorrow, I will try to crawl some websites that need to log in !
边栏推荐
- Is it so difficult to calculate the REM size of the web page according to the design draft?
- Alternative to log4j
- Hyperledger Fabric 2. X dynamic update smart contract
- ArcGIS加载免费在线历史影像作为底图(不需要插件)
- wx小程序跳转页面
- Discrete mathematics and its application detailed explanation of exercises in the final exam of spring and summer semester of 2018-2019 academic year
- 从数字化过渡到智能制造
- What are the advantages of VR panoramic production? Why is it favored?
- Current situation analysis and development trend prediction report of hesperidase industry in the world and China from 2022 to 2028
- Hibernate学习2 - 懒加载(延迟加载)、动态SQL参数、缓存
猜你喜欢
WordPress add photo album function [advanced custom fields Pro custom fields plug-in series tutorial]

融合模型权限管理设计方案

Reservoir dam safety monitoring

What is the difference between one way and two way ANOVA analysis, and how to use SPSS or prism for statistical analysis

【排行榜】Carla leaderboard 排行榜 运行与参与手把手教学

创意SVG环形时钟js特效

Creative SVG ring clock JS effect

微搭低代码中实现增删改查

Unmanned driving: Some Thoughts on multi-sensor fusion

Hibernate learning 3 - custom SQL
随机推荐
节奏快?压力大?VR全景客栈带你体验安逸生活
微搭低代码中实现增删改查
[interview question] what is a transaction? What are dirty reads, unrepeatable reads, phantom reads, and how to deal with several transaction isolation levels of MySQL
Current situation analysis and development trend forecast report of global and Chinese acrylonitrile butadiene styrene industry from 2022 to 2028
UE4 WebBrowser图表不能显示问题
How does VR panorama make money? Based on the objective analysis of the market from two aspects
Outer screen and widescreen wasted? Harmonyos folding screen design specification teaches you to use it
[figure database performance and scenario test sharp tool ldbc SNB] series I: introduction to data generator & Application to ges service
Dynamic effect of canvas lines
Outer screen and widescreen wasted? Harmonyos folding screen design specification teaches you to use it
Investment analysis and prospect forecast report of global and Chinese propargyl chloride industry from 2022 to 2028
C WinForm maximizes occlusion of the taskbar and full screen display
技术分享| WVP+ZLMediaKit实现摄像头GB28181推流播放
水库大坝安全监测
Dry and wet contacts
wx小程序跳转页面
Requests Library
Daily calculation (vowel case conversion)
Adding, deleting, modifying and checking in low build code
@mysql