当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- 熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
- Flink的DataSource三部曲之二:内置connector
- I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
- 2019年的一个小目标,成为csdn的博客专家,纪念一下
- Summary of common algorithms of linked list
- Serilog原始碼解析——使用方法
- What is the side effect free method? How to name it? - Mario
- ES6 essence:
- 100元扫货阿里云是怎样的体验?
- Troubleshooting and summary of JVM Metaspace memory overflow
猜你喜欢
在大规模 Kubernetes 集群上实现高 SLO 的方法
熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
做外包真的很难,身为外包的我也无奈叹息。
你的财务报告该换个高级的套路了——财务分析驾驶舱
Vue 3 responsive Foundation
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
Basic principle and application of iptables
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor
容联完成1.25亿美元F轮融资
Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的
随机推荐
合约交易系统开发|智能合约交易平台搭建
I think it is necessary to write a general idempotent component
Tool class under JUC package, its name is locksupport! Did you make it?
In order to save money, I learned PHP in one day!
阿里云Q2营收破纪录背后,云的打开方式正在重塑
How to select the evaluation index of classification model
Python3 e-learning case 4: writing web proxy
Analysis of ThreadLocal principle
Let the front-end siege division develop independently from the back-end: Mock.js
The difference between Es5 class and ES6 class
DRF JWT authentication module and self customization
Top 10 best big data analysis tools in 2020
CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
Character string and memory operation function in C language
This article will introduce you to jest unit test
一篇文章带你了解CSS3圆角知识
多机器人行情共享解决方案
如何将数据变成资产?吸引数据科学家
The choice of enterprise database is usually decided by the system architect - the newstack
基於MVC的RESTFul風格API實戰