当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
Variant character : Feel the hat , Whatever , It is said that , Disgusting
The person's name , Wrong place name : Hami ( just : hami )
Pinyin error : Cough number (ke shu)—> ke sou,
Intellectual error : Huangpu, Guangzhou ( Pu )
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
- 熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
- Flink的DataSource三部曲之二:内置connector
- I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
- 2019年的一个小目标,成为csdn的博客专家,纪念一下
- Summary of common algorithms of linked list
- Serilog原始碼解析——使用方法
- What is the side effect free method? How to name it? - Mario
- ES6 essence:
- 100元扫货阿里云是怎样的体验?
- Troubleshooting and summary of JVM Metaspace memory overflow
在大规模 Kubernetes 集群上实现高 SLO 的方法
Vue 3 responsive Foundation
Basic principle and application of iptables
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor
I think it is necessary to write a general idempotent component
Tool class under JUC package, its name is locksupport! Did you make it?
In order to save money, I learned PHP in one day!
How to select the evaluation index of classification model
Python3 e-learning case 4: writing web proxy
Analysis of ThreadLocal principle
Let the front-end siege division develop independently from the back-end: Mock.js
The difference between Es5 class and ES6 class
DRF JWT authentication module and self customization
Top 10 best big data analysis tools in 2020
Character string and memory operation function in C language
This article will introduce you to jest unit test
The choice of enterprise database is usually decided by the system architect - the newstack