当前位置：网站首页>Voice assistant -- Qu -- query error correction and rewriting

Voice assistant -- Qu -- query error correction and rewriting

2022-06-12 07:32:00 【Turned_ MZ】

This chapter , Let's take a look at the voice assistant QU Layer query Correction and rewriting .

Why should we correct mistakes ：

Because the vast majority of voice assistants query The sources are all voice conversations ,ASR The result of the module will be due to incorrect reception 、 Less radio 、 Or the error of the recognized word causes the input to NLU Layer of query It's wrong. . FALSE query It will directly affect the identification of downstream services , Resulting in an error in the final execution result , Impact on user experience . such as ：

（ error ） Turn on peace silence ->（ correct ） Open the peace elite .
（ error ） first 7 An alarm clock at ->（ correct ） Order one 7 An alarm clock at .
（ error ） Order one 7 Point coward -> （ correct ） Order one 7 An alarm clock at .

At the same time as ASR It has its own language model query To smooth , So into NLU Is the smoothed result , So for NLU To make corrections , Because there is no user's original voice message , So it is more difficult to implement . meanwhile , Considering the online real-time and stability , This error correction requires high performance .

Solution ：

query The error correction solution mainly includes several steps ： Confusion mining 、 Data cleaning 、 Error detection 、 Candidate recall 、 Candidate ranking 、 post-processing . According to the effect and real-time requirements , It can also be divided into online end and offline end .

The following is a brief description of each module .

1、 Confuse mining with data cleansing

The purpose of this step is mainly to establish confusion words 、 Confusion database , Provide database support for subsequent error detection and candidate recall .

In order to dig out the possible confused words and sentences , Here we can use new word discovery to do , Mining words with high internal cohesion and rich adjacent words , The methods of new word discovery are not discussed here . besides , You can also mine confused words and sentences , Make use of the near tone similarity 、 Semantic similarity 、 The method of phonetic string mining to confused words . New words to be discovered + Confused words , After cleaning by various methods , Form a dictionary of confusing words 、 pinyin trie Trees, etc , As a basic database .

2、 Error detection

This step is mainly for the input query Error detection , Identify if there may be an error , And where there are errors . The more common method is ：

Use the obfuscation dictionary to match , the query After word segmentation , Whether to match the confusion words in the confusion dictionary .
Using the language model ngram, That is to calculate the co-occurrence probability of a word in a sentence and its left and right neighbors , If the co-occurrence probability is very low , It is very likely that this word is wrong , Here we assume that in the data of full sentences , Right co-occurrence is far more than wrong co-occurrence , Otherwise, it cannot be detected .
Use Pinyin CBOW, That is, add the Pinyin vector , The phonetic vector of the context of the target word , utilize CBOW To predict the Pinyin vector of the target word , Then match the predicted vector with the Pinyin vector of the target word , If the similarity is low, there may be errors .
Use Bert As a language model , Directly predict the probability of each word in the sentence , However, the effect of this method is not controllable , At the same time as bert Slow performance , Therefore, online real-time solutions are rarely used , It can be used in offline mining .

3、 Candidate recall

This step is mainly to find the correct word for the wrong word , such as “ Peaceful silence ” Medium “ Mute ”, The correct word should be “ elite ”. This step mainly uses various methods to find the correct word corresponding to the wrong word , And the probability of these correct words , And then do a comprehensive sort . Some of the available features and options are ：

Confusing dictionaries ： That is, through the detected wrong words , To the dictionary of confused words , Find the correct word .
Language model ： Through two-way 2gram, The candidate words with high probability corresponding to the wrong word position are predicted .
pinyin CBOW： That is, through the context Pinyin vector , utilize CBOW The phonetic vector of the wrong secondary position can be predicted in the way of , And the corresponding candidate words .
DTW Near tone similarity ： mention DTW Before , First of all, we can understand dimsim This open source library , This library can be used to measure the similarity of two Chinese pronunciation , It mainly uses the vowel and vowel vectors to carry out inner product , however DTW It is required that the length of two words comparing similarity must be the same , But in practice, there may be multiple radio stations , Less radio, etc , Therefore, it is necessary to compare the phonetic similarity of two different lengths , It's here DTW Method , namely Dynamic Time Warping, Its main idea is dynamic programming , The point at one time of the sequence corresponds to multiple continuous points at another time , There is no explanation here , Sure google Search for . Match by near tone similarity , We can find more similar words in the confusion dictionary .
bert Language model ： That is, direct use of Bert Predict the correct word corresponding to the wrong word position .

4、 Candidate ranking

This step is about the potentially correct words found above , Make a comprehensive sort . The available features are ： Edit distance 、 Near tone similarity 、 Language model probability , Black and white list ,PPL fraction .

Here's to say PPL fraction ,perplexity( Confusion ) It is used to measure the quality of a probability distribution or probability model to predict the sample , Simply speaking , Is to measure whether a sentence is natural language , The lower the score, the better .

take 2.3 Candidate words found in , utilize 2.4 Find the scores of each dimension of each candidate word in , Then use the gradient lifting tree GBDT Make a comprehensive sort , Of course , You can also use xgboost, The effect will be better , About GBDT and xgboost The difference between , Later articles will compare the two .

Expand ideas ：

In the above modules , It's all mentioned bert, In fact, it can be used directly bert, Make an end-to-end translation model , Take the task of error correction as the task of translation , The input is wrong query, The output is corrected query, However, the effect of this method is not controllable , The risk of online real-time error correction is high , Because once the error is corrected in this step , Lead to query More difficult to understand , It will bring butterfly effect errors to downstream tasks . Of course , If its application scenario is similar to that on search engines , Give a candidate result for the user to choose , Instead of directly executing , This is also a good choice .

原网站

版权声明
本文为[Turned_ MZ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010556460820.html