当前位置:网站首页>NER – Named Entity Recognition Summary
NER – Named Entity Recognition Summary
2022-06-30 09:38:00 【A grain of sand in the vast sea of people】
1. What is named entity recognition ?
Named entity recognition (Named Entity Recognition, abbreviation NER), Also known as “ Proper name recognition ”, It refers to the recognition of entities with specific meaning in the text , Mainly including people's names 、 Place names 、 Organization name 、 Proper nouns, etc . Simple speak , It is to identify the boundaries and categories of entity references in natural texts .
2. The history of named entity recognition
Early rule based 、 The dictionary method is not detailed . At present, the most widely used method should be based on statistics ( The dependence on corpus is relatively large ), Using large-scale corpus to learn the annotation model , To mark the positions .CRF yes NER The current mainstream model , Its objective function considers not only the input state characteristic function , It also includes the label transfer characteristic function . When the model is known , Calculate the predicted output sequence for the input sequence, that is, find the optimal sequence to maximize the objective function , It is a dynamic programming problem , have access to Viterbi Algorithm decoding to get the optimal tag sequence .CRF Its advantage is that it can make use of rich internal and contextual feature information in the process of labeling a location .
3. BiLSTM-CRF
be applied to NER Medium biLSTM-CRF The model is mainly composed of Embedding layer ( There are mainly word vectors , Word vectors and some additional features ), two-way LSTM layer , And finally CRF Layers make up . Experimental results show that biLSTM-CRF Having reached or exceeded the goal of CRF Model , Become the current deep learning based NER The most popular model in the method . In terms of characteristics , This model inherits the advantages of the deep learning method , No feature engineering required , Using word vectors and character vectors can achieve good results , If there are high-quality dictionary features , Can be further improved .
4. summary
Combine neural networks with CRF Model combined CNN/RNN-CRF It's now NER The mainstream model of . about CNN And RNN, No one has an absolute advantage , Each has its merits . because RNN It has a natural sequence structure , therefore RNN-CRF More widely used . Based on neural network structure NER Method , It inherits the advantages of the deep learning method , No need for a lot of artificial features . Only word vector and word vector can reach the mainstream level , Adding high-quality dictionary features can further improve the effect . For a small number of labeled training sets , The migration study , Semi supervised learning should be the focus of future research .
5. Tool recommendation
5.1. Stanford NER
Named entity recognition system based on conditional random field developed by Stanford University , The system parameters are based on CoNLL、MUC-6、MUC-7 and ACE Named training corpus .
Address :https://nlp.stanford.edu/software/CRF-NER.shtml
python Realized Github Address :https://github.com/Lynten/stanford-corenlp
# install :pip install stanfordcorenlp
# Domestic source installation :pip install stanfordcorenlp -i https://pypi.tuna.tsinghua.edu.cn/simple
# Use stanfordcorenlp Identify named entity classes
# Download model first , Download address :https://nlp.stanford.edu/software/corenlp-backup-download.html
# Chinese entity recognition
from stanfordcorenlp import StanfordCoreNLP
zh_model = StanfordCoreNLP(r'stanford-corenlp-full-2018-02-27', lang='zh')
s_zh = ' I love natural language processing technology !'
ner_zh = zh_model.ner(s_zh)
s_zh1 = ' I love tian 'anmen square in Beijing !'
ner_zh1 = zh_model.ner(s_zh1)
print(ner_zh)
print(ner_zh1)
[(' I love ', 'O'), (' natural ', 'O'), (' Language ', 'O'), (' Handle ', 'O'), (' technology ', 'O'), ('!', 'O')]
[(' I love ', 'O'), (' Beijing ', 'STATE_OR_PROVINCE'), (' The tiananmen square ', 'FACILITY'), ('!', 'O')]
# Entity recognition for English
eng_model = StanfordCoreNLP(r'stanford-corenlp-full-2018-02-27')
s_eng = 'I love natural language processing technology!'
ner_eng = eng_model.ner(s_eng)
s_eng1 = 'I love Beijing Tiananmen!'
ner_eng1 = eng_model.ner(s_eng1)
print(ner_eng)
print(ner_eng1)
[('I', 'O'), ('love', 'O'), ('natural', 'O'), ('language', 'O'), ('processing', 'O'), ('technology', 'O'), ('!', 'O')]
[('I', 'O'), ('love', 'O'), ('Beijing', 'CITY'), ('Tiananmen', 'LOCATION'), ('!', 'O')]
5.2 MALLET
An open source package for statistical natural language processing developed by the University of Massachusetts , It can realize named entity recognition in the application of sequence annotation tool . Official address :http://mallet.cs.umass.edu/
5.3 Hanlp
HanLP It's a series of models and algorithms NLP tool kit , It's dominated by big search and fully open source , The goal is to popularize the application of natural language processing in the production environment . Support named entity recognition . Github Address :https://github.com/hankcs/pyhanlp
Official website :http://hanlp.linrunsoft.com/
# install :pip install pyhanlp
# Domestic source installation :pip install pyhanlp -i https://pypi.tuna.tsinghua.edu.cn/simple
# adopt crf The algorithm recognizes entities
from pyhanlp import *
# Examples of transliteration of names
CRFnewSegment = HanLP.newSegment("crf")
term_list = CRFnewSegment.seg(" I love tian 'anmen square in Beijing !")
print(term_list)
[ I /r, Love /v, Beijing /ns, The tiananmen square /ns, !/w]
5.4 NLTK
NLTK It's an efficient Python Built platform , Used to process human natural language data .
Github Address :https://github.com/nltk/nltk Official website :http://www.nltk.org/
# install :pip install nltk
# Domestic source installation :pip install nltk -i https://pypi.tuna.tsinghua.edu.cn/simple
import nltk
s = 'I love natural language processing technology!'
s_token = nltk.word_tokenize(s)
s_tagged = nltk.pos_tag(s_token)
s_ner = nltk.chunk.ne_chunk(s_tagged)
print(s_ner)
5.5 SpaCy
Industrial grade natural language processing tools , Unfortunately, Chinese is not supported . Gihub Address : https://github.com/explosion/spaCy Official website :https://spacy.io/
# install :pip install spaCy
# Domestic source installation :pip install spaCy -i https://pypi.tuna.tsinghua.edu.cn/simple
import spacy
eng_model = spacy.load('en')
s = 'I want to Beijing learning natural language processing technology!'
# Named entity recognition
s_ent = eng_model(s)
for ent in s_ent.ents:
print(ent, ent.label_, ent.label)
Beijing GPE 382
6.6 Crfsuite
You can load your own data set to train CRF Entity recognition model .
Document address :
https://sklearn-crfsuite.readthedocs.io/en/latest/?badge=latest
Code Uploaded :https://github.com/yuquanle/StudyForNLP/blob/master/NLPbasic/NER.ipynb
reference : https://mp.weixin.qq.com/s/7R3Y2-nD5fELPa9rtnK7Tg
边栏推荐
- The elegant combination of walle and Jianbao
- Initialize static resource demo
- What are the SQL add / delete / modify queries?
- Flutter theme (skin) changes
- Flutter 0001, environment configuration
- Code management related issues
- Linear-gradient()
- Express file upload
- Solution to pychart's failure in importing torch package
- Pass anonymous function to simplification principle
猜你喜欢
Clickhouse installation (quick start)
prometheus 监控之 ntp_exporter
9.JNI_ Necessary optimization design
MySQL knowledge summary (useful for thieves)
Tutorial for beginners of small programs day01
7. know JNI and NDK
抽象类和接口
What kind of experience is it to develop a "grandson" who will call himself "Grandpa"?
Talk about how the kotlin collaboration process establishes structured concurrency
[wechat applet] realize applet pull-down refresh and pull-up loading
随机推荐
Ocx control can be called by IE on some computers, but can not be called by IE on some computers
Function simplification principle: save if you can
MySQL internal component structure
Dart 开发技巧
Express - static resource request
JVM notes (III): analysis of JVM object creation and memory allocation mechanism
Row column (vertical and horizontal table) conversion of SQL
3.集成eslint、prettier
Splice and slice functions of JS
基于Svelte3.x桌面端UI组件库Svelte UI
[shutter] solve failed assertion: line 5142 POS 12: '_ debugLocked‘: is not true.
Application of hongruan face recognition
Cb/s Architecture - Implementation Based on cef3+mfc
目标检测yolov5开源项目调试
DDD interview
工作小记: sendto失败 errno 22
Dart basic notes
MySQL explain
Use of Baidu face recognition API
Couldn't load this key (openssh ssh-2 private key (old PEM format))