当前位置:网站首页>Forward maximum matching method
Forward maximum matching method
2022-07-06 21:08:00 【wx5d786476cd8b2】
class MM(object):
def __init__(self,dic_path):
self.dictionary=set()
self.maximum=0
# Read the dictionary
with open(dic_path,'r',encoding='utf-8') as f:
for line in f:
line=line.strip()
if not line:
continue
self.dictionary.add(line)
if self.maximum<len(line):
self.maximum=len(line)
def cut(self, text):
print(self.dictionary)
result=[]
index=len(text)
print(index)
n=0
while index>0:
word=None
for size in range(self.maximum,0,-1):
print(size)
if index-size<0:
continue
piece = text[n:n+size]
print('piece',piece)
if piece in self.dictionary:
word=piece
result.append(word)
index-=size
print('ooooop',index)
n+=size
break
if word is None:
n+=1
index-=1
return result[::]
def main():
text=" Nanjing Yangtze River Bridge "
t=MM(r'C:\Users\ljy\Desktop\learning-nlp-master\chapter-3\data\imm_dic.utf8')
print(len(t.cut(text)))
main()
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
边栏推荐
- 1_ Introduction to go language
- use. Net drives the OLED display of Jetson nano
- 【Redis设计与实现】第一部分 :Redis数据结构和对象 总结
- 1500万员工轻松管理,云原生数据库GaussDB让HR办公更高效
- Swagger UI教程 API 文档神器
- Laravel notes - add the function of locking accounts after 5 login failures in user-defined login (improve system security)
- OneNote 深度评测:使用资源、插件、模版
- Reference frame generation based on deep learning
- js中,字符串和数组互转(二)——数组转为字符串的方法
- 基于STM32单片机设计的红外测温仪(带人脸检测)
猜你喜欢
【论文解读】用于白内障分级/分类的机器学习技术
【OpenCV 例程200篇】220.对图像进行马赛克处理
基于深度学习的参考帧生成
性能测试过程和计划
每个程序员必须掌握的常用英语词汇(建议收藏)
Data Lake (VIII): Iceberg data storage format
Reference frame generation based on deep learning
审稿人dis整个研究方向已经不仅仅是在审我的稿子了怎么办?
None of the strongest kings in the monitoring industry!
Kubernetes learning summary (20) -- what is the relationship between kubernetes and microservices and containers?
随机推荐
968 edit distance
Tips for web development: skillfully use ThreadLocal to avoid layer by layer value transmission
New database, multidimensional table platform inventory note, flowus, airtable, seatable, Vig table Vika, Feishu multidimensional table, heipayun, Zhixin information, YuQue
2022菲尔兹奖揭晓!首位韩裔许埈珥上榜,四位80后得奖,乌克兰女数学家成史上唯二获奖女性
Redis insert data garbled solution
SAP Fiori应用索引大全工具和 SAP Fiori Tools 的使用介绍
The mail command is used in combination with the pipeline command statement
Aiko ai Frontier promotion (7.6)
KDD 2022 | realize unified conversational recommendation through knowledge enhanced prompt learning
Opencv learning example code 3.2.3 image binarization
OneNote 深度评测:使用资源、插件、模版
Reflection operation exercise
R語言可視化兩個以上的分類(類別)變量之間的關系、使用vcd包中的Mosaic函數創建馬賽克圖( Mosaic plots)、分別可視化兩個、三個、四個分類變量的關系的馬賽克圖
【mysql】游标的基本使用
for循环中break与continue的区别——break-完全结束循环 & continue-终止本次循环
Laravel notes - add the function of locking accounts after 5 login failures in user-defined login (improve system security)
ICML 2022 | Flowformer: 任务通用的线性复杂度Transformer
如何实现常见框架
038. (2.7) less anxiety
数据湖(八):Iceberg数据存储格式