当前位置:网站首页>Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
2022-06-25 00:16:00 【You're like an ironclad treasure】
Original link :Hzy Blog
Try some simple processing of the data today , Then visualize , So I thought of making some rough statistics on the cartoons that have appeared , And then according to Word frequency To output word cloud !
Let's take a look at the renderings first

The code is in my GitHub On , There are some for study go Some small projects in the process .
Follow yesterday , Yesterday I grabbed zhihushan's answer , Put it in a file .
- The first page should be read line by line from the file ( Each line is an answer ).
- Read out the sentences , We have to do some simple segmentation , For example, only the animation in the book title is extracted .(ps: Of course, libraries that can be analyzed in other languages , Want to python Medium jieba, But I was go There seems to be no similar library found in ), Then just write a simple one by yourself .
- Extract the animation and count it , We are going to visualize it , I am here github We found
go-echarts
2.go-charts Brief introduction
install
go get -u github.com/go-echarts/go-echarts
file :https://go-echarts.github.io/go-echarts/
go-ehcharts Baidu open source is used echarts Chart Library , And provides a concise api.
3. Everything is ready , Here is the time to type the code
3.1 First, open the file , Then read each of these lines , Then split to find the animation name , Then count .
/*
Word count
*/
// This structure is used to implement sort Interface used , because map If according to value It's not easy to sort .
type Pair struct {
Key string
Value int
}
type PairList []Pair
func (p PairList) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p PairList) Len() int { return len(p) }
func (p PairList) Less(i, j int) bool { return p[j].Value < p[i].Value } // The reverse
type WordCount map[string]interface{}
// The following symbols are encountered , Segmentation of sentences
func SplitByMoreStr(r rune) bool{
splitSymbol := []rune("《》<>")
for _,v:=range(splitSymbol){
if r == v{
return true
}
}
return false
}
// Here the read line is cut , And simple statistics
func (wc WordCount)SplitAndStatistics(s string){
dist1 := strings.FieldsFunc(s,SplitByMoreStr)
for _,v :=range(dist1){
flag :=0
v = strings.Replace(v," ","",-1)
for key :=range wc {
if strings.Index(v,key)!=-1{ // The new field contains map Fields that once appeared in , directly +1
wc[key]=wc[key].(int)+1
flag =1
}
}
if flag==0{
if wc[v]==nil{
wc[v] =1
}else{
wc[v]=wc[v].(int)+1
}
}
//fmt.Println(v)
}
}
// Read each line of the file , And make statistics
func (wc WordCount)ReadFile(f *os.File){
rd := bufio.NewReader(f)
for{
line, err := rd.ReadString('\n') // With '\n' Read in a line for the Terminator
if err != nil || io.EOF == err {
break
}
wc.SplitAndStatistics(line)// Cut and count
}
}
// This function is used to sort , Display the results , But it doesn't use .
func(wc WordCount)AnalysisResut(){
// take map[string][int] Turn into struct Realization sort Interface to achieve sorting function
pl :=make(PairList,len(wc))
i:=0
for k,v :=range(wc){
pl[i] = Pair{k,v.(int)}
i++
}
sort.Sort(pl)
for _,pair :=range(pl){
fmt.Println(pair.Value,pair.Key)
}
}
3.42 After cutting , We have to output the word cloud to finish it .
The above libraries are installed , That's all right. .
// route , Output word cloud
func handler(w http.ResponseWriter, _ *http.Request) {
nwc := charts.NewWordCloud()
nwc.SetGlobalOptions(charts.TitleOpts{Title: " Zhihu problem :"})
wc :=make(wordCount.WordCount)
f, err := os.Open(wordCount.Path+"answer.txt")
if err!=nil{
panic(err)
}
defer f.Close()
wc.ReadFile(f)
nwc.Add("wordcloud", wc, charts.WordCloudOpts{SizeRange: []float32{14, 250}})
nwc.Render(w)
}
// Judge whether the file exists
func Exists(path string) bool {
_, err := os.Stat(path) //os.Stat Get file information
if err != nil {
if os.IsExist(err) {
return true
}
return false
}
return true
}
func main(){
if !Exists(wordCount.Path+"answer.txt"){
wordCount.QuestionAnswer()
}
http.HandleFunc("/", handler)
http.ListenAndServe(":8081", nil)
}
summary , It's still interesting , Try some better next time , More accurate statistical methods , This should be the problem of naturallanguageprocessing , Ha ha ha , Yes, I have , But I haven't played …
边栏推荐
- MySQL log management
- Dry and wet contacts
- Analysis report on the "fourteenth five year plan" and development trend of China's engineering project management industry from 2022 to 2028
- Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer,在ImageNet分类准确率达88.8%!开源...
- Intensive reading of thinking about markdown
- wx小程序跳转页面
- 软件测试与游戏测试文章合集录
- [leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching
- 【Proteus仿真】定时器0作为16位计数器使用示例
- 技术分享| WVP+ZLMediaKit实现摄像头GB28181推流播放
猜你喜欢
![[leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching](/img/bd/b176e93ee6fa2125f021bcad3025c2.png)
[leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching

Ten commandments of self-learning in machine learning

Arbitrary file download of file operation vulnerability (7)

What are the advantages of VR panoramic production? Why is it favored?

Adding, deleting, modifying and checking in low build code

I suddenly find that the request dependent package in NPM has been discarded. What should I do?
Is it so difficult to calculate the REM size of the web page according to the design draft?

Hibernate learning 3 - custom SQL

After 5 years of software testing in didi and ByteDance, it's too real

svg线条动画背景js特效
随机推荐
从数字化过渡到智能制造
C程序设计专题 18-19年期末考试习题解答(下)
Global and Chinese tetrahydrofurfuryl butyrate industry operation pattern and future prospect report 2022 ~ 2028
Power application of 5g DTU wireless communication module
Design scheme of authority management of fusion model
Zed acquisition
[leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching
无需显示屏的VNC Viewer远程连接树莓派
Fuxin Kunpeng joins in, and dragon lizard community welcomes a new partner in format document technical service
One way 和two way ANOVA分析的区别是啥,以及如何使用SPSS或者prism进行统计分析
∞符号线条动画canvasjs特效
Collection of software testing and game testing articles
On the difficulty of developing large im instant messaging system
Transition from digitalization to intelligent manufacturing
Overview of medium and low speed aerospace electronic bus
Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer,在ImageNet分类准确率达88.8%!开源...
JPA学习2 - 核心注解、注解进行增删改查、List查询结果返回类型、一对多、多对一、多对多
Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform
怎么把wps表格里某一列有重复项的整行删掉
vim使用命令