当前位置:网站首页>Perform Jieba word segmentation on the required content and output EXCEL documents according to word frequency
Perform Jieba word segmentation on the required content and output EXCEL documents according to word frequency
2022-07-25 22:21:00 【Buddhist monk】
Read in excel data structure :
import pandas as pd
import jieba
df = pd.read_excel('xuqiufenxi.xls')
print(df)
# Create a new column to store word segmentation results
df['fenci'] = ''
# Traverse the text of each line , And save the word segmentation results into the new column
for i in range(len(df)):
print(i)
df['fenci'][i] = ' '.join(jieba.cut(df[' Content of requirements '][i]))
print(df['fenci'][i])
# Count the number of times each word appears
word_count = {
}
for word in df['fenci'][i].split():
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
# take word_count The dictionary is converted into dataframe
word_count_df = pd.DataFrame(word_count.items(), columns=['word', 'count'])
# according to count Value descending sort
word_count_df = word_count_df.sort_values(by='count', ascending=False)
# Output excel
word_count_df.to_excel(f"{
df[' function '][i]}.xlsx", index=False)
Output :
边栏推荐
- After 2 years of functional testing, I feel like I can't do anything. Where should I go in 2022?
- mysql: error while loading shared libraries: libncurses.so. 5: cannot open shared object file: No suc
- C language: random generated number + bubble sort
- Fill the whole square with the float property
- What is the difference between minor GC and full GC?
- Common source code for ArcGIS development
- Redis foundation 2 (notes)
- 3dslicer introduction and installation tutorial
- The automation testing post spent 20K recruiting, but in the end, there was no suitable one. Both fresh students are better than them
- MySQL - subquery - column subquery (multi row subquery)
猜你喜欢

Don't vote, software testing posts are saturated

Nuclear power plants strive to maintain safety in the heat wave sweeping Europe

What is partition and barrel division?

Arcgis10.2 configuring postgresql9.2 standard tutorial

TFrecord写入与读取

4day

如何实现一个App应用程序,限制用户时间使用?

Use of hyperlinks

Xiaobai programmer the next day

数据平台下的数据治理
随机推荐
JS timer and swiper plug-in
SMART S7-200 PLC通道自由映射功能块(DO_Map)
淦,为什么 '𠮷𠮷𠮷' .length !== 3 ??
win10搭建flutter环境踩坑日记
成为比开发硬气的测试人,我都经历了什么?
Wkid in ArcGIS
How to call the size of two numbers with a function--- Xiao Tang
3dslicer introduction and installation tutorial
Internship: writing common tool classes
The second short contact of gamecloud 1608
Div drag effect
2day
Mitsubishi FX PLC free port RS command realizes Modbus Communication
torchvision
TFrecord写入与读取
What is the difference between minor GC and full GC?
【C语法】void*浅说
Which is reliable between qiniu business school and WeiMiao business school? Is it safe to open an account recommended by the teacher?
Leetcode 106. construct binary tree from middle order and post order traversal sequence
synchronized与volatile