当前位置:网站首页>Spacetutorial (continuous updating...)
Spacetutorial (continuous updating...)
2022-06-28 14:47:00 【The gods were silent】
The gods were silent - personal CSDN Blog Directory
Last updated :2022.6.27
Earliest update time :2022.6.27
In this paper, spacy How to use the model , namely spacy Of API Use the tutorial .spacy Bag API It basically depends on a specific model (trained pipeline) To use , This article is mainly in English (en_core_web_sm) And Chinese (zh_core_web_sm) To make the sample , After all, I only know these two languages .
spacy Model official website :Trained Models & Pipelines · spaCy Models Documentation
List of articles
1. participle
Official website example ( You can use it directly on the Internet docker function ):
import spacy
from spacy.lang.en.examples import sentences
nlp = spacy.load("en_core_web_sm")
doc = nlp(sentences[0])
print(doc.text)
for token in doc:
print(token.text, token.pos_, token.dep_)
Output :
Apple is looking at buying U.K. startup for $1 billion
Apple PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.K. PROPN dobj
startup NOUN advcl
for ADP prep
$ SYM quantmod
1 NUM compound
billion NUM pobj
You can see that the model has tokenize, And each token Part of speech of (pos_) and dependency relation(dep_)( I don't know what this is . See :DependencyParser · spaCy API Documentation)
2. Stop Thesaurus
Defaults See Language · spaCy API Documentation
import spacy
sp=spacy.load('en_core_web_sm')
StopWord=sp.Defaults.stop_words
StopWord Is a stop word ( String format ) Set of components .
3. Clause
Sentencizer See Sentencizer · spaCy API Documentation
The sentence here is the kind of complete sentence , The one that uses the criteria such as full stops as the division criteria .
import spacy
from spacy.lang.zh.examples import sentences
nlp = spacy.load("zh_core_web_sm")
total_doc=''.join(sentences)
nlp.add_pipe('sentencizer', name='sentence_segmenter', before='parser')
doc = nlp(total_doc)
print(doc.text)
for token in doc:
print(token)
print(token.is_sent_start)
for sent in doc.sents:
print(sent)
Output omitted . All in all is_sent_start The attribute is True Of token Is the beginning of the sentence token,doc.sents Is an iterator of the sentence list .
in addition v2.0 edition spacy There is such a phrasing , stay v3.0( I am a 3.2.4) Version of spacy Not available in , I haven't tried :
from seg.newline.segmenter import NewLineSegmenter # note that pip package is called spacyss
import spacy
nlseg = NewLineSegmenter()
nlp = spacy.load('en')
nlp.add_pipe(nlseg.set_sent_starts, name='sentence_segmenter', before='parser')
doc = nlp(my_doc_text)
The required package is :spacyss · PyPI
边栏推荐
- Combined sum leetcode
- open3d里pointcloud和numpy数组之间的转化
- 【空间&单细胞组学】第1期:单细胞结合空间转录组研究PDAC肿瘤微环境
- dolphinscheduler2. Installation of X (valid for personal test)
- Performance comparison of deep learning models on cat and dog image data sets
- 快手投资电商服务商易心优选
- Adding virtual environments to the Jupiter notebook
- Youju new material rushes to Shenzhen Stock Exchange: it plans to raise 650million yuan, with an annual revenue of 333million yuan
- 量子前沿英雄谱|“光量子探险家”McMahon:将任何物理系统变成神经网络
- 计算器(力扣)
猜你喜欢

2022下半年软考考试时间安排已确定!

老板嘱咐了三遍:低调、低调、低调

Is PMP really useful?

物联网低代码平台常用《组件介绍》

PMP真的有用吗?

名创优品通过上市聆讯:寻求双重主要上市 年营收91亿

Recommended practice sharing of Zhilian recruitment based on Nebula graph

vscode编写markdown文件并生成pdf

dolphinscheduler2. Installation of X (valid for personal test)

Only four breakthrough Lenovo smart Summer Palace in mainland China won the "IDC Asia Pacific Smart City Award in 2022"
随机推荐
请问一下,是不是insert all这种oracle的批量新增没拦截?
2022 welder (technician) examination question bank simulated examination platform operation
The latest pycharm activation cracking code in 2022 is permanent_ Detailed installation tutorial (applicable to multiple versions)
2022年焊工(技师)考试题库模拟考试平台操作
Only four breakthrough Lenovo smart Summer Palace in mainland China won the "IDC Asia Pacific Smart City Award in 2022"
【空间&单细胞组学】第1期:单细胞结合空间转录组研究PDAC肿瘤微环境
RAM ROM FLASH的区别
哪个证券公司最大最安全 怎么办理开户最安全
Who is the main body of the waiting insurance record? Record in the local network security, right?
Performance comparison of deep learning models on cat and dog image data sets
What are the benefits of this PMP certificate?
计算器(力扣)
美因基因港交所上市:市值43亿港元 IPO被市场忽略
Differences between ram ROM flash
Recommendation letter brain correspondent: if love is just a chemical reaction, can you still believe in love?
【中移芯昇】5. spi接口测试tf卡
PMP认证证书的续证费用是多少?
Conversion between pointcloud and numpy arrays in open3d
New drug discovery methods, AstraZeneca team improves ab initio molecular design through course learning
324. 摆动排序 II : 不简单的构造题