当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Examples of unconventional aggregation
- 如何将数据变成资产?吸引数据科学家
- 从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
- PHP应用对接Justswap专用开发包【JustSwap.PHP】
- Nodejs crawler captures ancient books and records, a total of 16000 pages, experience summary and project sharing
- Linked blocking Queue Analysis of blocking queue
- Arrangement of basic knowledge points
- 嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
- 2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...
- Network programming NiO: Bio and NiO
猜你喜欢
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
How to encapsulate distributed locks more elegantly
快快使用ModelArts,零基础小白也能玩转AI!
你的财务报告该换个高级的套路了——财务分析驾驶舱
This article will introduce you to jest unit test
如何将数据变成资产?吸引数据科学家
Technical director, to just graduated programmers a word - do a good job in small things, can achieve great things
Basic principle and application of iptables
Flink的DataSource三部曲之二:内置connector
在大规模 Kubernetes 集群上实现高 SLO 的方法
随机推荐
Synchronous configuration from git to consult with git 2consul
Flink的DataSource三部曲之二:内置connector
怎么理解Python迭代器与生成器?
业内首发车道级导航背后——详解高精定位技术演进与场景应用
How long does it take you to work out an object-oriented programming interview question from Ali school?
OPTIMIZER_ Trace details
Jmeter——ForEach Controller&Loop Controller
Elasticsearch database | elasticsearch-7.5.0 application construction
华为云“四个可靠”的方法论
使用 Iceberg on Kubernetes 打造新一代云原生数据湖
2019年的一个小目标,成为csdn的博客专家,纪念一下
前端都应懂的入门基础-github基础
Relationship between business policies, business rules, business processes and business master data - modern analysis
Skywalking series blog 1 - install stand-alone skywalking
“颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
10 easy to use automated testing tools
Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
中小微企业选择共享办公室怎么样?
Process analysis of Python authentication mechanism based on JWT
中国提出的AI方法影响越来越大,天大等从大量文献中挖掘AI发展规律