当前位置:网站首页>Forward maximum matching method
Forward maximum matching method
2022-07-06 21:08:00 【wx5d786476cd8b2】
class MM(object):
def __init__(self,dic_path):
self.dictionary=set()
self.maximum=0
# Read the dictionary
with open(dic_path,'r',encoding='utf-8') as f:
for line in f:
line=line.strip()
if not line:
continue
self.dictionary.add(line)
if self.maximum<len(line):
self.maximum=len(line)
def cut(self, text):
print(self.dictionary)
result=[]
index=len(text)
print(index)
n=0
while index>0:
word=None
for size in range(self.maximum,0,-1):
print(size)
if index-size<0:
continue
piece = text[n:n+size]
print('piece',piece)
if piece in self.dictionary:
word=piece
result.append(word)
index-=size
print('ooooop',index)
n+=size
break
if word is None:
n+=1
index-=1
return result[::]
def main():
text=" Nanjing Yangtze River Bridge "
t=MM(r'C:\Users\ljy\Desktop\learning-nlp-master\chapter-3\data\imm_dic.utf8')
print(len(t.cut(text)))
main()
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
边栏推荐
- Study notes of grain Mall - phase I: Project Introduction
- Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
- Database - how to get familiar with hundreds of tables of the project -navicat these unique skills, have you got it? (exclusive experience)
- Laravel笔记-自定义登录中新增登录5次失败锁账户功能(提高系统安全性)
- 华为设备命令
- Web开发小妙招:巧用ThreadLocal规避层层传值
- 防火墙基础之外网服务器区部署和双机热备
- 硬件开发笔记(十): 硬件开发基本流程,制作一个USB转RS232的模块(九):创建CH340G/MAX232封装库sop-16并关联原理图元器件
- PHP saves session data to MySQL database
- #yyds干货盘点#重新梳理箭头函数的this
猜你喜欢

审稿人dis整个研究方向已经不仅仅是在审我的稿子了怎么办?

Intel 48 core new Xeon run point exposure: unexpected results against AMD zen3 in 3D cache

Spark SQL chasing Wife Series (initial understanding)

强化学习-学习笔记5 | AlphaGo

OAI 5g nr+usrp b210 installation and construction

Infrared thermometer based on STM32 single chip microcomputer (with face detection)

Deployment of external server area and dual machine hot standby of firewall Foundation

请问sql group by 语句问题

OneNote 深度评测:使用资源、插件、模版

【Redis设计与实现】第一部分 :Redis数据结构和对象 总结
随机推荐
Notes - detailed steps of training, testing and verification of yolo-v4-tiny source code
全网最全的知识库管理工具综合评测和推荐:FlowUs、Baklib、简道云、ONES Wiki 、PingCode、Seed、MeBox、亿方云、智米云、搜阅云、天翎
Simple continuous viewing PTA
Huawei device command
【Redis设计与实现】第一部分 :Redis数据结构和对象 总结
Is it safe to open an account in flush? Which securities company is good at opening an account? Low handling charges
OSPF multi zone configuration
KDD 2022 | 通过知识增强的提示学习实现统一的对话式推荐
快过年了,心也懒了
Taylor series fast Fourier transform (FFT)
3D face reconstruction: from basic knowledge to recognition / reconstruction methods!
Redis insert data garbled solution
The biggest pain point of traffic management - the resource utilization rate cannot go up
新型数据库、多维表格平台盘点 Notion、FlowUs、Airtable、SeaTable、维格表 Vika、飞书多维表格、黑帕云、织信 Informat、语雀
Yyds dry goods count re comb this of arrow function
Common English vocabulary that every programmer must master (recommended Collection)
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
Application layer of tcp/ip protocol cluster
审稿人dis整个研究方向已经不仅仅是在审我的稿子了怎么办?
Word bag model and TF-IDF