当前位置：网站首页>Knowledge map - Jieba, pyhanlp, smoothnlp tools to realize Chinese word segmentation (part of speech)

Knowledge map - Jieba, pyhanlp, smoothnlp tools to realize Chinese word segmentation (part of speech)

2022-07-27 18:36:00 【zkkkkkkkkkkkkk】

Recently, we are also pre researching knowledge map related technologies . This involves some content and technology about natural language processing . At present, some participles have been investigated 、 Named body recognition related technology . Record the use of word segmentation tools today .

One 、 What is knowledge map ？

The knowledge map understood by the author is a huge semantic network , Like the Internet . But every point on the semantic web is an entity , There is an edge before two entities, that is, relationship or attribute . In fact, it means finding a triple , Be similar to （ Entity 、 Relationship 、 Entity ） or （ Entity 、 attribute 、 Property value ） In the form of . The important step here is how to extract triples of this form , And also ensure the correctness of extracting triples . This is undoubtedly a huge challenge .

Two 、 Knowledge map flow chart

The flow chart is drawn according to your own understanding , If there is anything wrong, please correct ！（ Big guy light spray ）

3、 ... and 、 The framework of knowledge map （ Tools ） What are they? ？

At present, we know that there are open source frameworks for knowledge maps , Such as ：deepke、bert、cndeepdive wait .

①：deepke It is an open source framework for Chinese relationship extraction in the knowledge engine laboratory of Zhejiang University .

The official address of the project ：https://github.com/zjunlp/deepke/

②：bert Is an open source framework of Google .

The official address of the project ：https://github.com/google-research/bert
③：deepdive Is an open source knowledge extraction framework of Stanford University , But as early as 2017 It has been in maintenance mode since .

The official address of the project ：https://github.com/HazyResearch/deepdive

Four ： Use tools to realize text word segmentation

What I learned jieba、pyhanlp、smoothnlp Can be Chinese word segmentation , Here are three tools .

①：jieba participle

direct pip install jieba Installation is fast .

import jieba
import jieba.posseg as pseg


def postag(text):
    words = pseg.cut(text)
    return words


#  Defining text 
text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as ：C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '

#  Sentences are separated by commas 
jieba_list = []
sentence = text.split('.')
for i in range(len(sentence)):
    word = postag(sentence[i])    # participle 
    for w in word:
        # w There are two properties , Respectively ：w.flag== The part of speech ;w.word== word 
        jieba_list.append([w.flag,w.word])

#  Output jieba Participle list 
print(jieba_list)

②：pyhanlp participle

It is worth noting that the operation pyhanlp You need to follow the local java Environmental Science , Because it is calling java Interface .

from pyhanlp import *

text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as ：C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '

#  Period segmentation 
sentence = text.split('.')
pyhanlp_list = []
for i in range(len(sentence)):
    pyhanlp_list.append(HanLP.segment(sentence[i]))

#  Output pyhanlp Segmentation result 
print(pyhanlp_list)

③：smoothnlp participle

This one is direct pip install smoothnlp that will do .

from smoothnlp.algorithm.phrase import extract_phrase

text = 'jieba Its main function is to do Chinese word segmentation , Simple word segmentation 、 Parallel word segmentation 、 Command line participle , Of course, its function is not limited to this , At present, keyword extraction is also supported 、 Part of speech tagging 、 Word location query, etc . What is more pleasant is jieba Although based on python, But it also supports other languages and platforms , Such as ：C++、Go、R、Rust、Node.js、PHP、 iOS、Android etc. . therefore jieba It can meet the needs of all kinds of developers '

#  Sentences are divided by periods 
sentence = text.split('.')
smoothnlp_list = []
for i in range(len(sentence)):
    smoothnlp_list.append(extract_phrase(sentence[i]))

#  Output smoothnlp Segmentation result 
print("smoothnlp:",a)

5、 ... and 、 Part of speech table

Click to enter Part of speech list of common words You can see it.

6、 ... and 、 summary

Today, I mainly record the use of word segmentation tools , By the way, I introduced my understanding of the knowledge map . The next article will record the relevant technologies and ideas about named body recognition . Click to enter ： Named body recognition article

原网站

版权声明
本文为[zkkkkkkkkkkkkk]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207271611597439.html