当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Python crawler actual combat details: crawling home of pictures
- 助力金融科技创新发展,ATFX走在行业最前列
- Troubleshooting and summary of JVM Metaspace memory overflow
- 一篇文章带你了解SVG 渐变知识
- Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
- 每个前端工程师都应该懂的前端性能优化总结:
- Existence judgment in structured data
- Did you blog today?
- 6.4 viewresolver view parser (in-depth analysis of SSM and project practice)
- Installing the consult cluster
猜你喜欢
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
数据产品不就是报表吗?大错特错!这分类里有大学问
快快使用ModelArts,零基础小白也能玩转AI!
TRON智能钱包PHP开发包【零TRX归集】
Just now, I popularized two unique skills of login to Xuemei
Existence judgment in structured data
ipfs正舵者Filecoin落地正当时 FIL币价格破千来了
Technical director, to just graduated programmers a word - do a good job in small things, can achieve great things
Arrangement of basic knowledge points
钻石标准--Diamond Standard
随机推荐
Summary of common algorithms of linked list
速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)
I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
Not long after graduation, he earned 20000 yuan from private work!
PN8162 20W PD快充芯片,PD快充充电器方案
Let the front-end siege division develop independently from the back-end: Mock.js
ES6 essence:
采购供应商系统是什么?采购供应商管理平台解决方案
Details of dapr implementing distributed stateful service
Microservices: how to solve the problem of link tracing
Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
Save the file directly to Google drive and download it back ten times faster
Calculation script for time series data
数据产品不就是报表吗?大错特错!这分类里有大学问
Skywalking series blog 2-skywalking using
Face to face Manual Chapter 16: explanation and implementation of fair lock of code peasant association lock and reentrantlock
多机器人行情共享解决方案
关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)
2019年的一个小目标,成为csdn的博客专家,纪念一下
Keyboard entry lottery random draw