当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector

Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector

2020-11-06 01:21:00 Elementary school students in IT field

Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733

Chinese text error correction Division

Chinese text error correction task , Common error types include :

  • Homophonic words , Such as With a pair of eyes - With a pair of glasses
  • Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
  • The word order is reversed , Such as Woody Allen - Alan woody
  • Word completion , If love has Providence - If love has Providence
  • The shape is wrong , Such as Sorghum - sorghum
  • Chinese pinyin spelling , Such as xingfu- Happiness
  • Chinese Pinyin abbreviation , Such as sz- Shenzhen
  • Grammar mistakes , It's hard to imagine - unimaginable

Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .

This paper briefly summarizes the types of typographical errors in Chinese :

  1. Variant character : Feel the hat , Whatever , It is said that , Disgusting

  2. The person's name , Wrong place name : Hami ( just : hami )

  3. Pinyin error : Cough number (ke shu)—> ke sou,

  4. Intellectual error : Huangpu, Guangzhou ( Pu )

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢