当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
- Listening to silent words: hand in hand teaching you sign language recognition with modelarts
- I think it is necessary to write a general idempotent component
- 人工智能学什么课程?它将替代人类工作?
- After brushing leetcode's linked list topic, I found a secret!
- 你的财务报告该换个高级的套路了——财务分析驾驶舱
- Nodejs crawler captures ancient books and records, a total of 16000 pages, experience summary and project sharing
- Tool class under JUC package, its name is locksupport! Did you make it?
- CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
- Linked blocking Queue Analysis of blocking queue
猜你喜欢

ipfs正舵者Filecoin落地正当时 FIL币价格破千来了

Flink的DataSource三部曲之二:内置connector

Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection

2019年的一个小目标,成为csdn的博客专家,纪念一下

git rebase的時候捅婁子了,怎麼辦?線上等……

Examples of unconventional aggregation

多机器人行情共享解决方案

Python自动化测试学习哪些知识?

采购供应商系统是什么?采购供应商管理平台解决方案

一篇文章带你了解CSS3圆角知识
随机推荐
How do the general bottom buried points do?
中小微企业选择共享办公室怎么样?
The difference between Es5 class and ES6 class
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
一篇文章带你了解CSS3圆角知识
Did you blog today?
Jmeter——ForEach Controller&Loop Controller
Tool class under JUC package, its name is locksupport! Did you make it?
6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
Synchronous configuration from git to consult with git 2consul
[event center azure event hub] interpretation of error information found in event hub logs
Group count - word length
Elasticsearch 第六篇:聚合統計查詢
Thoughts on interview of Ali CCO project team
容联完成1.25亿美元F轮融资
Keyboard entry lottery random draw
数据产品不就是报表吗?大错特错!这分类里有大学问
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
基於MVC的RESTFul風格API實戰