当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- 6.5 request to view name translator (in-depth analysis of SSM and project practice)
- Examples of unconventional aggregation
- Linked blocking Queue Analysis of blocking queue
- (2)ASP.NET Core3.1 Ocelot路由
- Subordination judgment in structured data
- 熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
- 6.1.1 handlermapping mapping processor (1) (in-depth analysis of SSM and project practice)
- (1) ASP.NET Introduction to core3.1 Ocelot
- JVM memory area and garbage collection
- Group count - word length
猜你喜欢

Just now, I popularized two unique skills of login to Xuemei

Existence judgment in structured data

100元扫货阿里云是怎样的体验?

加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?

阿里云Q2营收破纪录背后,云的打开方式正在重塑

Examples of unconventional aggregation

至联云解析:IPFS/Filecoin挖矿为什么这么难?

Grouping operation aligned with specified datum

人工智能学什么课程?它将替代人类工作?

In order to save money, I learned PHP in one day!
随机推荐
全球疫情加速互联网企业转型,区块链会是解药吗?
阿里云Q2营收破纪录背后,云的打开方式正在重塑
I think it is necessary to write a general idempotent component
Save the file directly to Google drive and download it back ten times faster
如何玩转sortablejs-vuedraggable实现表单嵌套拖拽功能
Introduction to Google software testing
Did you blog today?
Technical director, to just graduated programmers a word - do a good job in small things, can achieve great things
“颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
Summary of common string algorithms
ES6学习笔记(五):轻松了解ES6的内置扩展对象
Analysis of react high order components
人工智能学什么课程?它将替代人类工作?
Thoughts on interview of Ali CCO project team
PN8162 20W PD快充芯片,PD快充充电器方案
Python crawler actual combat details: crawling home of pictures
Skywalking series blog 1 - install stand-alone skywalking
Listening to silent words: hand in hand teaching you sign language recognition with modelarts
Serilog原始碼解析——使用方法
Skywalking series blog 5-apm-customize-enhance-plugin