当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Asp.Net Core learning notes: Introduction
- 速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)
- It's so embarrassing, fans broke ten thousand, used for a year!
- Use of vuepress
- Installing the consult cluster
- Architecture article collection
- What problems can clean architecture solve? - jbogard
- 助力金融科技创新发展,ATFX走在行业最前列
- Tool class under JUC package, its name is locksupport! Did you make it?
- 6.1.2 handlermapping mapping processor (2) (in-depth analysis of SSM and project practice)
猜你喜欢

What is the side effect free method? How to name it? - Mario

Arrangement of basic knowledge points

The difference between Es5 class and ES6 class

Summary of common string algorithms

This article will introduce you to jest unit test

向北京集结!OpenI/O 2020启智开发者大会进入倒计时

Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!

Basic principle and application of iptables

华为云“四个可靠”的方法论

Use of vuepress
随机推荐
從小公司進入大廠,我都做對了哪些事?
钻石标准--Diamond Standard
嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
Cos start source code and creator
Python crawler actual combat details: crawling home of pictures
EOS创始人BM: UE,UBI,URI有什么区别?
TRON智能钱包PHP开发包【零TRX归集】
小程序入门到精通(二):了解小程序开发4个重要文件
Analysis of react high order components
hadoop 命令总结
H5 makes its own video player (JS Part 2)
Architecture article collection
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
Real time data synchronization scheme based on Flink SQL CDC
基於MVC的RESTFul風格API實戰
使用 Iceberg on Kubernetes 打造新一代云原生数据湖
Introduction to Google software testing
人工智能学什么课程?它将替代人类工作?
Top 10 best big data analysis tools in 2020
Group count - word length