当前位置:网站首页>NLP keyword extraction overview
NLP keyword extraction overview
2022-06-09 17:55:00 【Brother prawn flying】
NLP- Keywords extraction - review
One 、 There are several ways to extract keywords

Two 、TF-IDF
TF-IDF Algorithm , Mainly through statistical methods , Evaluate the importance of words to documents . A basic idea is , The more times a word appears in a document , Obviously, this word will be relatively more representative , But if this word appears in many documents , No matter how many times he appears, he doesn't have the ability to distinguish documents . So his other basic idea is that if a word appears more than once in fewer documents , The stronger its ability to distinguish documents , It is also representative .
3、 ... and 、TextRank
TextRank The algorithm can be separated from the basis of corpus , Only analyzing a single document can extract the keywords of the document . This is also TextRank Important characteristics of the algorithm .TextRank The basic idea of the algorithm comes from Google Of PageRank Algorithm .
Four 、LDA
LDA Algorithm , It is one of the most popular methods of keyword detection , Each document consists of different words at the same time , There are also several potential themes , Like sports , entertainment , Journalism , Politics . And each topic has its own words , For example, it belongs to “ sports ” Subject may have “ football , Basketball , match ”, Belong to “ entertainment ” Subject may have “ star , The movie , Record ” wait . But in general , The main content of an article is most likely to focus on a few topics , If each topic is covered , Obviously, these topics can not reflect the focus of the article . therefore ,LDA On the basis of the above conditions , According to the words in the document, find the most likely topics and words in the document .
5、 ... and 、word2vec
Word2vec Algorithm , It mainly studies the relationship between words , He converts all non repeating words in all text datasets into vectors , This data format contains the similarity between this word and all other words , So we can classify according to the relationship between words , Through the classification algorithm, we can get the head words of multiple categories , Then calculate the similarity between the words in each category and the category center and sort them , Finally, choose the first few words closest to the center as the key words .
边栏推荐
- Determination principle of abbexa DUT ELISA Kit
- Operation manual of abbexa PCR super mixture
- Abbexa plasmid miniprep kit detection procedure
- Snap announced that the upgraded camera products and AR ecology will continue to penetrate the Chinese market
- c语言解决爬楼梯问题
- NLP-RNN
- Abbexa 质粒 MiniPrep 试剂盒检测程序
- C language to solve the problem of climbing stairs
- Epigentek chromatin accessibility test kit principles and procedures
- 运行代码,想加个进度条实时看以下代码运行进度,怎么破?
猜你喜欢
随机推荐
AI首席架构师4-AICA-百度CV技术应用及产业落地心得
I/O流
隔空手势交互,在现实世界上演“得心应手”
R installation / update package error: failed to lock directory '/home/anaconda3/envs/r4.1.2/lib/r/library'
[typecho]找到非markdown脚本语言编写的文章
Imshow() of OpenCV to view the pixel value of the picture
pta7-6悄悄关注
About concurrency and parallelism, are the fathers of go and Erlang wrong?
word论文格式
The sisters sit in the bow of the boat while the brothers walk ashore
【长时间序列预测】Aotoformer 代码详解之[2]模型部件之时间序列分解
How to ensure personal and property safety when traveling
Redis知识点&面试题总结
Moco -Momentum Contrast for Unsupervised Visual Representation Learning
Snap announced that the upgraded camera products and AR ecology will continue to penetrate the Chinese market
How about opening an account with tongdaxin? Is it safe to open an account?
c语言解决爬楼梯问题
MySQL 8.0.29 解压版安装配置方法图文教程
Word使用技巧
外出旅行如何确保人身及财产安全








