当前位置:网站首页>Data visualization - White Snake 2: black snake robbery (3)
Data visualization - White Snake 2: black snake robbery (3)
2022-07-28 00:45:00 【Python slag】
Catalog
7 Film review word cloud analysis
from PIL Import Image Pictures of the
Graph of word cloud operation results :
Drink a bottle of Wangzai, let's continue ......

7 Film review word cloud analysis
download
pip install jieba ( If a download fails , Just a few more times , If you can't, check online )
pip install jieba
collections Statistical word quantity
import jieba
import wordcloud
import collectionslcurt Cut the string in a list
jieba.lcut(df[' Comment on '][0])The operation result is as follows :

I have a stop phrase file (stopwords.txt), You can also find resources on the Internet .
Put stop words in stop_words in , And put \n Cut off .
The code is shown as follows :
with open('stopwords.txt','r',encoding='utf-8')as fp:
words=fp.readlines()
stop_words = []
for word in words:
w = word.strip('\n')# Put the backslash in the word n Cut off
stop_words.append(w)
stop_wordsOperation result diagram :
Stop word processing :
Next, we begin to deal with stop words ,word_list = jieba.lcut(comment).
Then on word_list Do traversal , Remove the stop words ( You can go online to find stop words Resources )
# Stop word processing
good_words =[]
for comment in df[' Comment on ']:
word_list = jieba.lcut(comment)
# Yes word_list Do traversal , Remove the stop words ( Go online to find stop words Resources )
for word in word_list:
if word not in stop_words:
good_words.append(word)
In the following code, we carry out font type ( Mona Chao gang Heijian .ttf) Application , And configure the word number data to the word cloud object , Finally show the picture . The code is as follows :
c = collections.Counter(good_words)
wc = wordcloud.WordCloud(font_path=' Mona Chao gang Heijian .ttf',width=500,height=300,
background_color='white',
max_font_size=200,
min_font_size=5,
max_words=1000)
# Configure word number data to the word cloud object
wc.generate_from_frequencies(c)
# Show the image
plt.imshow(wc)
The operation results are as follows : At this time, the size and color of word cloud are given by the system by default , As shown in the figure :
,
from PIL Import Image Pictures of the
from PIL import ImageBased on the last word cloud code , We added a love background picture to it , And define a color for word cloud from dark to light , Then turn off the horizontal and vertical coordinates beside the figure . The code is as follows :
back_image = Image.open(r'C:\Users\1\Desktop\1.png')
c = collections.Counter(good_words)
# Adjust canvas
plt.figure(figsize=(12,7))
wc = wordcloud.WordCloud(font_path=' Mona Chao gang Heijian .ttf',width=500,height=300,
background_color='white',
# The background color is white
max_font_size=200,
min_font_size=5,
# Adjust the size of words to 5-200
max_words=1000,
# The maximum number of words that can be accommodated is 1000
mask=np.array(back_image),
# Put pictures
colormap=sns.dark_palette('pink',as_cmap=True)
# The color of the tone cloud is a deep to light purple , Remember to set up as_cmap=True This parameter , Otherwise, the code cannot recognize this as a color parameter
)
# Configure word number data to the word cloud object
wc.generate_from_frequencies(c)
# Show the image
plt.imshow(wc)
# Turn off the horizontal and vertical axes next to the diagram
plt.axis('off')Graph of word cloud operation results :

So far, data visualization -《 White Snake 2: The green snake robbed 》 This concludes the analysis . Friends who feel useful give me a little praise , finger heart .

边栏推荐
- The R language uses the hexsticker package to convert the visualized results of ggplot2 package into hexagonal diagrams (hexagonal stickers, hexagonal stickers, ggplot2 plot to hex stickers)
- leetcode 452. Minimum Number of Arrows to Burst Balloons 用最少数量的箭引爆气球(中等)
- 大众中国豪掷80亿,成国轩高科第一大股东
- [must read for new products] valuation analysis of Meishi technology, distributed audio-visual products and Solutions
- Diffusion + super-resolution model strong combination, the technology behind Google image generator image
- BuildForge 资料
- 数据可视化-《白蛇2:青蛇劫起》(3)
- The construction of Yongzhou entry exit inspection laboratory
- Intel joins hands with hanshuo and Microsoft to release the "Ai + retail" trick!
- threejs个人笔记
猜你喜欢

Basic operations of MySQL database (I) --- Based on Database

Smart convenience store takes you to unlock the future technology shopping experience

MATLAB | 那些你不得不知道的MATLAB小技巧(三)

The influence of head zeroing and tail zeroing on FFT output

MATLAB | 那些你不得不知道的MATLAB小技巧(四)

头补零和尾补零对FFT输出结果的影响

英特尔发布开源AI参考套件

What are the namespaces and function overloads of + and @ in front of MATLAB folder

Strong collaboration and common development! Intel and Taiyi IOT held a seminar on AI computing box aggregation services

程序员工作中的理性与感性活动及所需的技能素养
随机推荐
mysql分表之后怎么平滑上线?
Impulse attends the 2022 Forum on safe circulation of data elements Online - a special session in the field of government affairs, and helps the construction and innovative development of big data for
MATLAB | 那些你不得不知道的MATLAB小技巧(四)
BuildForge 资料
Y79. Chapter IV Prometheus' monitoring system and practice -- Prometheus' service discovery mechanism (10)
头补零和尾补零对FFT输出结果的影响
A design scheme of Wal
Matlab | those matlab tips you have to know (4)
The latest notice of the Chinese Academy of Sciences: abandon the impact factor! The journal zoning table will be published for the "Journal surpassing index"
How to realize fast recognition of oversized images
基于Unittest的ddt+yaml实现数据驱动机制
Leetcode 415. string addition and 43. string multiplication
startUMl
MATLAB 文件夹前面的+和@是干啥的 命名空间与函数的重载
Threejs personal notes
单片机之led、数码管与按键
数据分析:拆解方法(详情整理)
Recurrence of fastjson historical vulnerabilities
程序员工作中的理性与感性活动及所需的技能素养
递归求解迷宫问题
