当前位置:网站首页>NLP keyword extraction overview

NLP keyword extraction overview

2022-06-09 17:55:00 Brother prawn flying


One 、 There are several ways to extract keywords

 Insert picture description here

Two 、TF-IDF

TF-IDF Algorithm , Mainly through statistical methods , Evaluate the importance of words to documents . A basic idea is , The more times a word appears in a document , Obviously, this word will be relatively more representative , But if this word appears in many documents , No matter how many times he appears, he doesn't have the ability to distinguish documents . So his other basic idea is that if a word appears more than once in fewer documents , The stronger its ability to distinguish documents , It is also representative .

3、 ... and 、TextRank

TextRank The algorithm can be separated from the basis of corpus , Only analyzing a single document can extract the keywords of the document . This is also TextRank Important characteristics of the algorithm .TextRank The basic idea of the algorithm comes from Google Of PageRank Algorithm .

Four 、LDA

LDA Algorithm , It is one of the most popular methods of keyword detection , Each document consists of different words at the same time , There are also several potential themes , Like sports , entertainment , Journalism , Politics . And each topic has its own words , For example, it belongs to “ sports ” Subject may have “ football , Basketball , match ”, Belong to “ entertainment ” Subject may have “ star , The movie , Record ” wait . But in general , The main content of an article is most likely to focus on a few topics , If each topic is covered , Obviously, these topics can not reflect the focus of the article . therefore ,LDA On the basis of the above conditions , According to the words in the document, find the most likely topics and words in the document .

5、 ... and 、word2vec

Word2vec Algorithm , It mainly studies the relationship between words , He converts all non repeating words in all text datasets into vectors , This data format contains the similarity between this word and all other words , So we can classify according to the relationship between words , Through the classification algorithm, we can get the head words of multiple categories , Then calculate the similarity between the words in each category and the category center and sort them , Finally, choose the first few words closest to the center as the key words .

原网站

版权声明
本文为[Brother prawn flying]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206091742469819.html