当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
- Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
- 50 + open source projects are officially assembled, and millions of developers are voting
- 每个前端工程师都应该懂的前端性能优化总结:
- Python自动化测试学习哪些知识?
- htmlcss
- Just now, I popularized two unique skills of login to Xuemei
- 快快使用ModelArts,零基础小白也能玩转AI!
- 教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
- DRF JWT authentication module and self customization
猜你喜欢
How to select the evaluation index of classification model
合约交易系统开发|智能合约交易平台搭建
Vue 3 responsive Foundation
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor
從小公司進入大廠,我都做對了哪些事?
DevOps是什么
Flink的DataSource三部曲之二:内置connector
采购供应商系统是什么?采购供应商管理平台解决方案
Grouping operation aligned with specified datum
使用 Iceberg on Kubernetes 打造新一代云原生数据湖
随机推荐
6.4 viewresolver view parser (in-depth analysis of SSM and project practice)
Linked blocking Queue Analysis of blocking queue
一篇文章带你了解SVG 渐变知识
htmlcss
IPFS/Filecoin合法性:保护个人隐私不被泄露
熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
DevOps是什么
Swagger 3.0 天天刷屏,真的香嗎?
hadoop 命令总结
Summary of common string algorithms
Save the file directly to Google drive and download it back ten times faster
50 + open source projects are officially assembled, and millions of developers are voting
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
Using Es5 to realize the class of ES6
Subordination judgment in structured data
Examples of unconventional aggregation
Basic principle and application of iptables
中小微企业选择共享办公室怎么样?
Elasticsearch 第六篇:聚合統計查詢
PHP应用对接Justswap专用开发包【JustSwap.PHP】