当前位置:网站首页>Knowledge map - Jieba, pyhanlp, smoothnlp tools to realize Chinese word segmentation (part of speech)
Knowledge map - Jieba, pyhanlp, smoothnlp tools to realize Chinese word segmentation (part of speech)
2022-07-27 18:36:00 【zkkkkkkkkkkkkk】
Recently, we are also pre researching knowledge map related technologies . This involves some content and technology about natural language processing . At present, some participles have been investigated 、 Named body recognition related technology . Record the use of word segmentation tools today .
One 、 What is knowledge map ?
The knowledge map understood by the author is a huge semantic network , Like the Internet . But every point on the semantic web is an entity , There is an edge before two entities, that is, relationship or attribute . In fact, it means finding a triple , Be similar to ( Entity 、 Relationship 、 Entity ) or ( Entity 、 attribute 、 Property value ) In the form of . The important step here is how to extract triples of this form , And also ensure the correctness of extracting triples . This is undoubtedly a huge challenge .
Two 、 Knowledge map flow chart
The flow chart is drawn according to your own understanding , If there is anything wrong, please correct !( Big guy light spray )

3、 ... and 、 The framework of knowledge map ( Tools ) What are they? ?
At present, we know that there are open source frameworks for knowledge maps , Such as :deepke、bert、cndeepdive wait .
①:deepke It is an open source framework for Chinese relationship extraction in the knowledge engine laboratory of Zhejiang University .
The official address of the project :https://github.com/zjunlp/deepke/
②:bert Is an open source framework of Google .
The official address of the project :https://github.com/google-research/bert
③:deepdive Is an open source knowledge extraction framework of Stanford University , But as early as 2017 It has been in maintenance mode since .
The official address of the project :https://github.com/HazyResearch/deepdive
Four : Use tools to realize text word segmentation
What I learned jieba、pyhanlp、smoothnlp Can be Chinese word segmentation , Here are three tools .
①:jieba participle
direct pip install jieba Installation is fast .
import jieba
import jieba.posseg as pseg
def postag(text):
words = pseg.cut(text)
return words
# Defining text
text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as :C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '
# Sentences are separated by commas
jieba_list = []
sentence = text.split('.')
for i in range(len(sentence)):
word = postag(sentence[i]) # participle
for w in word:
# w There are two properties , Respectively :w.flag== The part of speech ;w.word== word
jieba_list.append([w.flag,w.word])
# Output jieba Participle list
print(jieba_list) ②:pyhanlp participle
It is worth noting that the operation pyhanlp You need to follow the local java Environmental Science , Because it is calling java Interface .
from pyhanlp import *
text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as :C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '
# Period segmentation
sentence = text.split('.')
pyhanlp_list = []
for i in range(len(sentence)):
pyhanlp_list.append(HanLP.segment(sentence[i]))
# Output pyhanlp Segmentation result
print(pyhanlp_list)
③:smoothnlp participle
This one is direct pip install smoothnlp that will do .
from smoothnlp.algorithm.phrase import extract_phrase
text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as :C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '
# Sentences are divided by periods
sentence = text.split('.')
smoothnlp_list = []
for i in range(len(sentence)):
smoothnlp_list.append(extract_phrase(sentence[i]))
# Output smoothnlp Segmentation result
print("smoothnlp:",a)
5、 ... and 、 Part of speech table
Click to enter Part of speech list of common words You can see it.
6、 ... and 、 summary
Today, I mainly record the use of word segmentation tools , By the way, I introduced my understanding of the knowledge map . The next article will record the relevant technologies and ideas about named body recognition . Click to enter : Named body recognition article
边栏推荐
- MySQL learning Day1 DDL, DML, DQL basic query
- Software installation related
- Preliminary introduction to C miscellaneous lecture linked list
- Deep learning: gat
- Glory and Xiaomi reported on the double 11: they all called themselves champions
- MySQL learning day3 multi table query / transaction / DCL
- 2021.8.1 notes DBA
- 2021.8.1 Notes database design
- 浅谈AI深度学习的模型训练和推理
- MySQL学习 Day1 DDL、DML、DQL基础查询
猜你喜欢

深度学习:GAN案例练习-minst手写数字

2021.8.1 notes DBA
![[mit 6.s081] LEC 5: calling conventions and stack frames risc-v notes](/img/1f/6384f4831718477f0567540250f352.png)
[mit 6.s081] LEC 5: calling conventions and stack frames risc-v notes

1. OpenCV image basic operation
![[MIT 6.S081] Lab 11: networking](/img/9d/cca59a662412f3c3c57c26c5987a24.png)
[MIT 6.S081] Lab 11: networking
![[MIT 6.S081] Lab 4: traps](/img/8b/ca4819f8b1cfc6233745a124790674.png)
[MIT 6.S081] Lab 4: traps

2021.7.28笔记 事务

2021.7.17 notes MySQL other commands

Deep learning: a survey of behavior recognition

2. 改变颜色空间及颜色检测
随机推荐
[MIT 6.S081] Lab 10: mmap
XML learning Day1: XML / jsup parser / selector /xpath selector
Labels such as {@code}, {@link} and < P > in the notes
Deep learning: GCN case
2021.8.1笔记 数据库设计
Technology sharing | quick intercom integrated dispatching system
1. OpenCV image basic operation
2021.8.1 notes DBA
搭建一个简单的知识问答系统
After being "expelled" from bitland, the Jank group said for the first time: it will return as soon as possible through legal channels!
[mit 6.s081] LEC 8: page faults notes
Deep learning - VIDEO behavior recognition: paper reading - two stream revolutionary networks for action recognition in videos
Press Google and NVIDIA! Alibaba optical 800 chip won the world's first authoritative test again
[MIT 6.S081] Lab 4: traps
[MIT 6.S081] Lab 5: xv6 lazy page allocation
2021.7.31笔记 视图
Super practical! After reading the kubernetes study notes hidden by Alibaba P9, call NB directly
Glory and Xiaomi reported on the double 11: they all called themselves champions
uniapp运行到手机(真机调试)
MySQL学习 Day1 DDL、DML、DQL基础查询