当前位置:网站首页>数据可视化-《白蛇2:青蛇劫起》(3)
数据可视化-《白蛇2:青蛇劫起》(3)
2022-07-27 21:58:00 【python小渣渣】
目录
喝瓶旺仔咱们继续 ......

7影评词云分析
下载
pip install jieba (如果一次下载没成功,就多下几次,不行了就上网查)
pip install jieba
collections 统计词量
import jieba
import wordcloud
import collectionslcurt以列表形式对字符串进行切割
jieba.lcut(df['评论'][0])运行结果图如下:

我这边是有一个停用词文件(stopwords.txt),大家也可以在网上查找一下资源。
将停用词放在stop_words中,并且将停用词中的\n切掉。
代码展示如下:
with open('stopwords.txt','r',encoding='utf-8')as fp:
words=fp.readlines()
stop_words = []
for word in words:
w = word.strip('\n')#将单词中的反斜杠n切掉
stop_words.append(w)
stop_words运行结果图:
停用词处理:
接下来我们开始对停用词进行处理,word_list = jieba.lcut(comment)。
然后对word_list做遍历,去掉停用词(可去网上找停用词资源)
#停用词处理
good_words =[]
for comment in df['评论']:
word_list = jieba.lcut(comment)
#对word_list做遍历,去掉停用词(去网上找停用词资源)
for word in word_list:
if word not in stop_words:
good_words.append(word)
在下面这个代码中我们进行字体类型(蒙纳超刚黑简.ttf)的应用,并向词云对象配置词数数据,最终展示图片。代码如下:
c = collections.Counter(good_words)
wc = wordcloud.WordCloud(font_path='蒙纳超刚黑简.ttf',width=500,height=300,
background_color='white',
max_font_size=200,
min_font_size=5,
max_words=1000)
#向词云对象配置词数数据
wc.generate_from_frequencies(c)
#展示图像
plt.imshow(wc)
运行结果如下:这个时候的词云大小颜色啥的都是由系统默认给出 ,如图所示:
,
从PIL中导入Image的图片
from PIL import Image在上一个词云代码的基础上,我们给它添加了一个爱心的背景图片,并且给词云定义一种颜色由深到浅的,然后顺便将图旁边的横纵坐标关掉。代码如下:
back_image = Image.open(r'C:\Users\1\Desktop\1.png')
c = collections.Counter(good_words)
#调画布
plt.figure(figsize=(12,7))
wc = wordcloud.WordCloud(font_path='蒙纳超刚黑简.ttf',width=500,height=300,
background_color='white',
#背景颜色为白色
max_font_size=200,
min_font_size=5,
#调节词的大小为5-200
max_words=1000,
#最多能容纳词的数量为1000
mask=np.array(back_image),
#放图片
colormap=sns.dark_palette('pink',as_cmap=True)
#调词云颜色为一种颜色由深到浅的紫色,记得设置as_cmap=True这个参数,否则代码无法识别这个作为颜色参数处理
)
#向词云对象配置词数数据
wc.generate_from_frequencies(c)
#展示图像
plt.imshow(wc)
#把图旁边的横纵轴关掉
plt.axis('off')词云运行结果图:

至此数据可视化-《白蛇2:青蛇劫起》就此分析结束。觉得有用的友友们给我小小的点个赞吧,比心。

边栏推荐
- 永州清洁级动物实验室建设选址注意事项
- 自动推理的逻辑07–谓词演算
- Implement Gobang game with C language
- [must read for new products] valuation analysis of Meishi technology, distributed audio-visual products and Solutions
- JS event propagation capture stage bubbling stage onclick addeventlistener
- Server open sensitive port
- The second uncle cured my spiritual internal friction and made me angry out of station B
- Introduction to thesis writing | how to write an academic research paper
- 英特尔AI实践日第56期 | 探讨行业发展新趋势
- JS 事件传播 捕获阶段 冒泡阶段 onclick addEventListener
猜你喜欢

Matlab | those matlab tips you have to know (I)

HarmonyOS 3纯净模式可限制华为应用市场检出的风险应用获取个人数据

startUMl

It was dog days, but I was scared out of a cold sweat: how far is the hidden danger of driving safety from us?

元宇宙的应用场景展示

The second uncle cured my spiritual internal friction and made me angry out of station B

Yuanuniverse office, the ultimate dream of migrant workers

See how well-known enterprises use Web3 to reshape their industries

迷惑的单片机矩阵按键

FFT 采样频率和采样点数的选取
随机推荐
A few lines of code can easily realize the real-time reasoning of paddleocr. Come and get!
It was dog days, but I was scared out of a cold sweat: how far is the hidden danger of driving safety from us?
adb路径不能包含2空格remote couldn‘t create file: Is a directory
Description and analysis of main parameters of R language r native plot function and lines function (type, PCH, CEX, lty, LWD, col, xlab, ylab)
「图神经网络:基础、前沿与应用」最新IJCAI2022教程
MATLAB | 那些你不得不知道的MATLAB小技巧(四)
How difficult is it to apply for a doctorate under the post system in northern Europe?
See how well-known enterprises use Web3 to reshape their industries
Impulse attends the 2022 Forum on safe circulation of data elements Online - a special session in the field of government affairs, and helps the construction and innovative development of big data for
永州清洁级动物实验室建设选址注意事项
The latest notice of the Chinese Academy of Sciences: abandon the impact factor! The journal zoning table will be published for the "Journal surpassing index"
[book club issue 13] packaging format of audio and video files
永州水质检测实验室建设:家具说明
【Meetup预告】OpenMLDB+OneFlow:链接特征工程到模型训练,加速机器学习模型开发
火狐浏览器 Firefox 103 发布,提升高刷新率显示器下的性能
永州分析实验室建设选址概述
自动推理的逻辑07–谓词演算
Leetcode 452. minimum number of arrows to burst balloons (medium)
threejs个人笔记
点分治解析
