当前位置:网站首页>Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
2022-06-25 00:16:00 【You're like an ironclad treasure】
Original link :Hzy Blog
Try some simple processing of the data today , Then visualize , So I thought of making some rough statistics on the cartoons that have appeared , And then according to Word frequency To output word cloud !
Let's take a look at the renderings first

The code is in my GitHub On , There are some for study go Some small projects in the process .
Follow yesterday , Yesterday I grabbed zhihushan's answer , Put it in a file .
- The first page should be read line by line from the file ( Each line is an answer ).
- Read out the sentences , We have to do some simple segmentation , For example, only the animation in the book title is extracted .(ps: Of course, libraries that can be analyzed in other languages , Want to python Medium jieba, But I was go There seems to be no similar library found in ), Then just write a simple one by yourself .
- Extract the animation and count it , We are going to visualize it , I am here github We found
go-echarts
2.go-charts Brief introduction
install
go get -u github.com/go-echarts/go-echarts
file :https://go-echarts.github.io/go-echarts/
go-ehcharts Baidu open source is used echarts Chart Library , And provides a concise api.
3. Everything is ready , Here is the time to type the code
3.1 First, open the file , Then read each of these lines , Then split to find the animation name , Then count .
/*
Word count
*/
// This structure is used to implement sort Interface used , because map If according to value It's not easy to sort .
type Pair struct {
Key string
Value int
}
type PairList []Pair
func (p PairList) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p PairList) Len() int { return len(p) }
func (p PairList) Less(i, j int) bool { return p[j].Value < p[i].Value } // The reverse
type WordCount map[string]interface{}
// The following symbols are encountered , Segmentation of sentences
func SplitByMoreStr(r rune) bool{
splitSymbol := []rune("《》<>")
for _,v:=range(splitSymbol){
if r == v{
return true
}
}
return false
}
// Here the read line is cut , And simple statistics
func (wc WordCount)SplitAndStatistics(s string){
dist1 := strings.FieldsFunc(s,SplitByMoreStr)
for _,v :=range(dist1){
flag :=0
v = strings.Replace(v," ","",-1)
for key :=range wc {
if strings.Index(v,key)!=-1{ // The new field contains map Fields that once appeared in , directly +1
wc[key]=wc[key].(int)+1
flag =1
}
}
if flag==0{
if wc[v]==nil{
wc[v] =1
}else{
wc[v]=wc[v].(int)+1
}
}
//fmt.Println(v)
}
}
// Read each line of the file , And make statistics
func (wc WordCount)ReadFile(f *os.File){
rd := bufio.NewReader(f)
for{
line, err := rd.ReadString('\n') // With '\n' Read in a line for the Terminator
if err != nil || io.EOF == err {
break
}
wc.SplitAndStatistics(line)// Cut and count
}
}
// This function is used to sort , Display the results , But it doesn't use .
func(wc WordCount)AnalysisResut(){
// take map[string][int] Turn into struct Realization sort Interface to achieve sorting function
pl :=make(PairList,len(wc))
i:=0
for k,v :=range(wc){
pl[i] = Pair{k,v.(int)}
i++
}
sort.Sort(pl)
for _,pair :=range(pl){
fmt.Println(pair.Value,pair.Key)
}
}
3.42 After cutting , We have to output the word cloud to finish it .
The above libraries are installed , That's all right. .
// route , Output word cloud
func handler(w http.ResponseWriter, _ *http.Request) {
nwc := charts.NewWordCloud()
nwc.SetGlobalOptions(charts.TitleOpts{Title: " Zhihu problem :"})
wc :=make(wordCount.WordCount)
f, err := os.Open(wordCount.Path+"answer.txt")
if err!=nil{
panic(err)
}
defer f.Close()
wc.ReadFile(f)
nwc.Add("wordcloud", wc, charts.WordCloudOpts{SizeRange: []float32{14, 250}})
nwc.Render(w)
}
// Judge whether the file exists
func Exists(path string) bool {
_, err := os.Stat(path) //os.Stat Get file information
if err != nil {
if os.IsExist(err) {
return true
}
return false
}
return true
}
func main(){
if !Exists(wordCount.Path+"answer.txt"){
wordCount.QuestionAnswer()
}
http.HandleFunc("/", handler)
http.ListenAndServe(":8081", nil)
}
summary , It's still interesting , Try some better next time , More accurate statistical methods , This should be the problem of naturallanguageprocessing , Ha ha ha , Yes, I have , But I haven't played …
边栏推荐
- 【面试题】instancof和getClass()的区别
- [leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching
- U.S. House of Representatives: digital dollar will support the U.S. dollar as the global reserve currency
- Hibernate learning 3 - custom SQL
- The third generation of power electronics semiconductors: SiC MOSFET learning notes (V) research on driving power supply
- wx小程序跳转页面
- Is it so difficult to calculate the REM size of the web page according to the design draft?
- Collection of software testing and game testing articles
- canvas线条的动态效果
- Analysis report on the development trend and Prospect of cetamide industry in the world and China from 2022 to 2028
猜你喜欢

水库大坝安全监测

Im instant messaging development application keeping alive process anti kill

Interesting checkbox counters

Collective example

Ansible及playbook的相关操作

What exactly is Nacos

C WinForm maximizes occlusion of the taskbar and full screen display

Phprunner 10.7.0 PHP code generator

机器学习自学成才的十条戒律

MySQL log management
随机推荐
中低速航空航天电子总线概述
I suddenly find that the request dependent package in NPM has been discarded. What should I do?
Do280openshift access control -- encryption and configmap
svg+js键盘控制路径
Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform
5年,从“点点点”到现在的测试开发,我的成功值得每一个借鉴。
Tutorial details | how to edit and set the navigation function in the coolman system?
JDBC - database connection
Common redis commands in Linux system
Interesting checkbox counters
人体改造 VS 数字化身
部门新来的00后真是卷王,工作没两年,跳槽到我们公司起薪18K都快接近我了
信号完整性(SI)电源完整性(PI)学习笔记(二十五)差分对与差分阻抗(五)
D omit parameter name
Investment analysis and prospect forecast report of global and Chinese propargyl chloride industry from 2022 to 2028
C program design topic 15-16 final exam exercise solutions (Part 1)
Intensive reading of thinking about markdown
[interview question] the difference between instancof and getclass()
Ten commandments of self-learning in machine learning
Analysis report on development mode and investment direction of sodium lauriminodipropionate in the world and China 2022 ~ 2028