当前位置:网站首页>Knowledge map pyhanlp realizes named body recognition (with named body recognition code)
Knowledge map pyhanlp realizes named body recognition (with named body recognition code)
2022-07-27 18:37:00 【zkkkkkkkkkkkkk】
The last article will come to use jieba、pyhanlp、smoothnlp Tools for text segmentation , On the basis of word segmentation in the previous article, this article goes on to explain the recognition of named bodies , Interested friends please Point me into the Go to the last article to check the use of word segmentation tools . At the end of this paper, I have some immature named body recognition methods .
One 、 What is an entity ?
What about? , I don't know how to answer such a question . The characteristics of entities must be deterministic , In this way, the entity can be understood as a definite instance . Like “ Glory of Kings “ It is actually a real instance , And just take ” The king “ perhaps ” glory “ Come on , We can't determine what it is at once . This is the concept of entity .
Two 、 What is named body recognition ?
Name recognition abbreviation NER, Named entities generally refer to entities with specific meaning or strong referential in the text , Usually include The person's name 、 Place names 、 Organization name 、 Date time 、 Proper noun etc. , Simply put, it is to find the name of the person in the sentence 、 Place names 、 Name of organization 、 Date time and proper noun entities , It is an important and indispensable step to form the triple of knowledge map .
3、 ... and 、 Naming body recognition classification
It is usually divided into three categories and seven sub categories :
The three categories are : Entity class 、 Time class 、 Digital class
Seven sub categories are : The person's name 、 Place names 、 Organization 、 Time 、 date 、 currency 、 percentage
Four 、 Use pyhanlp Realize simple named body recognition
Here I combine my own immature ideas . Since it is necessary to recognize people's names 、 Place names 、 Organizations, etc , Then we can extract the part of speech of these entities after word segmentation , Then convert it into an entity . Does this achieve the effect of entity extraction ?( Of course, this is only my own view ) Let's go straight to the code :
from pyhanlp import *
# doc.txt The content is : Zhao Ruth is an actor
with open("doc.txt", "r", encoding="utf-8") as file:
txt = file.read()
# Name recognition
nlp = HanLP.newSegment().enableNameRecognize(True)
# participle
cut_word = nlp.seg(txt)
print(cut_word)
# Define entity list
entity_list = []
for word in cut_word:
if word.toString().find("nr") >= 0:
entity = word.toString()
entity_list.append(ww.split('/')[0])
print(entity)
# The result input is as follows , take nr That is, name entity extraction , This achieves the effect of entity extraction .
# [ Zhao Ruth /nr, yes /vshi, One /m, Famous actors /n]
# [' Zhao Ruth ']Above is nr Extraction of , In fact, for other types of entities , Such as place names 、 The organization is treated in the same way as the company name , As long as we analyze the part of speech of the above entities , And we need to ensure the correct word segmentation , So this shows the importance of word segmentation . But in reality, not all entities will segment words correctly when using word segmentation tools , This will cause the participle to be different from what we expected , So as to lead to the accuracy of lethal name and body recognition .
5、 ... and 、 Add phrases to the dictionary
As mentioned above, if the text segmentation is not accurate , It will affect the accuracy of this named body recognition , There is a way to improve the accuracy of word segmentation , But the disadvantage is that it needs manual maintenance . Take the tools in my code pyhanlp give an example , It actually has a thesaurus . my pyhanlp Is in Anaconda3 So my directory is :C:\Users\dell\Anaconda3\Lib\site-packages\pyhanlp\static\data\dictionary\custom

We can see that there are many dictionaries , We choose different vocabularies according to the phrases we want to add . This case is to be added to the human name thesaurus , So we open the dictionary of names . Add , For example, we need to add the personal name phrase Xiao Ming , Then you need to add a behavior : Xiao Ming nr 1

After adding, you can use the above code to identify the named body .
6、 ... and 、 summary
This method is really one-sided , Many colleagues say that this requires artificial maintenance of the dictionary , High cost . If you need to analyze entities in a document , First, you need to get this document , Find the keywords in the document , If the participle is not completely separated , We need to add phrases to the dictionary .
边栏推荐
- XML学习 Day1 : xml / Jsoup解析器 / selector选择器 /Xpath选择器
- Deep learning: Gan case exercise -minst handwritten digits
- 微信小程序微信支付概述
- 2021.7.30 note index
- 2021.7.17 notes MySQL other commands
- [mit 6.s081] LEC 4: page tables notes
- RSA encryption and decryption (compatible with wechat applet environment)
- 搭建一个简单的知识问答系统
- 2021.7.28 notes
- The end of another era!
猜你喜欢
随机推荐
常用词词性表
Deep learning: GCN diagram classification case
知识图谱 — pyhanlp实现命名体识别(附命名体识别代码)
@Considerations for query of convert annotation in JPA
Technology sharing | quick intercom integrated dispatching system
2021.7.17 notes MySQL other commands
3. Opencv geometric transformation
Announcing the acquisition of 30% shares of Wenye, what is the general intention of Dalian United?
怎么会不喜欢呢,CI/CD中轻松发送邮件
Press Google and NVIDIA! Alibaba optical 800 chip won the world's first authoritative test again
Error launching IDEA
[MIT 6.S081] Lec 10: Multiprocessors and locking 笔记
2021.8.1笔记 DBA
Software installation related
[mit 6.s081] LEC 8: page faults notes
[MIT 6.S081] Lec 9: Interrupts 笔记
Huawei mate30 Pro 5g disassembly: self developed chips account for more than half, and American chips still exist!
文件路径读取
1. opencv图片基础操作
[MIT 6.S081] Lab 11: networking

![[MIT 6.S081] Lec 5: Calling conventions and stack frames RISC-V 笔记](/img/1f/6384f4831718477f0567540250f352.png)



![[mit 6.s081] LEC 9: interrupts notes](/img/b6/a8d39aa7ede4eb1c5a74e6d15b3b1c.png)



