当前位置：网站首页>Voice assistant - Multi round conversation (theory and concept)

Voice assistant - Multi round conversation (theory and concept)

2022-06-12 07:32:00 【Turned_ MZ】

In this chapter, let's take a look at the implementation scheme of multi round dialogue in voice assistant . In the current multi round implementation scheme of voice assistant , It is basically divided into two parts ：rule_base（ rule-based ） and model_base（ Model-based ）, From the perspective of controllable online effects and rapid development , Most multi round conversations are still rule_base Of , Generally divided into ： Based on finite state automata , Or based on a dialogue script , At this point, it can be used in some modules in the whole process model Play the role of improving the accuracy of calling , As for pure model_base Multiple wheels of , At present, they are still in the academic circles , Very few have landed in industry .

Basic concepts ：

1、 Closed domain multi round sessions

Task driven , The main purpose is to complete the necessary slots , And execute the intention according to the slot position .
Maintain a polling state per session , There are clear entry and exit States .

1.1 Polling status

The closed domain multi - round dialogue can utilize finite state automata （FSM） To achieve . When the user's intention is preliminarily defined , Enter polling state , When the necessary slots are satisfied or the user actively interrupts , Exit polling . Take the following example ：

U:： Help me navigate # Recognize the user's intention to navigate , At the same time, the necessary slots are missing , Enter polling state
A:： Where are you going ？         # Polling status
U： Beijing                          # Polling status , The user replies to the question
A： well # The necessary slot position shall meet , Exit polling status

When in polling state ,BOT Based on the current FSM The status of the node is used to query the user , And jump to the node according to the user's answer , Until the polling state ends . Here's the picture ：

Because closed domain multi round sessions have explicit polling status , And it's assistant led , Therefore, it is generally considered that the user's next round of script probability is to reply to the current assistant's query , At this time, you can choose to directly judge and fill in the slot information , Of course , If you want to do something more elaborate , It is necessary to determine whether the user's reply is a missing slot , There are two situations ：

U: Buy a train ticket # Identify user intent , Missing location and time slot
A: Where are you going ？ # Enter polling state
U: I'd like to buy one that leaves tomorrow # The user did not answer the question , But the slot of the current scene , At this time, you need to be able to fill this slot into the time slot
A: well , So where do you want to go ？ # It is found that the location slot is still missing , Keep asking

U: Buy a train ticket # Identify user intent
A: Where are you going ？ # Enter polling state
U: What's the weather like today? ？ # The user did not answer the question , Other executable intentions
A: It's sunny today , The temperature .. # You need to be able to jump out , Carry out other intentions
U: Beijing # The user replied to the previous question , This belongs to interrupt recovery , Need to be able to connect

The above two cases are common problems in closed domain multiple rounds , The solution is simple , For the first question , You can rely on NER Identify whether the user reply contains the entity type corresponding to the slot , For the second question , Scene classification or intention model can be used to determine whether there are other intentions .

2、 Open domain multi round sessions

Different from the closed domain with multiple rounds of clear polling status , Open domain multi round sessions generally mean that there is no explicit polling status , It is necessary to judge whether the current round of the user's script is related to the above based on semantic understanding , There are generally three types ： Anaphora digestion 、 Omit and complete 、 Semantic succession .

Take the following example ：

（ Anaphora digestion ）
U: Who is Jay Chou ？
A: Jay Chou is .......
U: I want to listen to his song # It should be understood as ： I want to listen to Jay Chou's song
（ Omit and complete ）
U: What's the weather like today? ？
A: The weather today ......
U： What about tomorrow ？ # It should be understood as ： What's the weather like tomorrow
（ Semantic succession ）
U: Set a date for tomorrow morning 8 An alarm clock at
A: well , I've set it for you
U: Cancel the alarm clock # It should be understood as ： Cancel the set tomorrow morning 8 An alarm clock at .

Here is the difference between ellipsis completion and semantic inheritance , Omission and completion generally means that if you simply look at the current round of dialogue , It is not certain what the intention is , It is necessary to combine the above to determine the specific intention . Semantic succession means that the current round has a clear intention , Only the slot is missing , It is necessary to judge whether there is groove position information that can be smoothly supported in combination with the above text .

In addition to the above three types of multi wheel , There are two other cases , However, few of them appear online , So we haven't realized it yet , Here is a brief introduction ：

（ Movement correction ） The user's current session is the correction of his previous session
U: Call Xiaohong
U: Rainbow of rainbow # At this point, it needs to be understood as ： Call Xiaohong
（ Entity disambiguation ） There is ambiguity in the entity in this turn of the script , Disambiguation is required in combination with the above
U: How about Samsung's mobile phone ？
U: I prefer apples

2.1、 Context / context

The context here is not necessarily the closest to the current script , But the semantic level and the current association intention , For example, the example of interrupt recovery just mentioned .

At the same time, the above is not limited to the historical dialogue information , It can be divided into the following categories ：

Dialogue context ：
- The user's current session and the user's previous session form a context , For example, the examples listed above are all like this
- The user's current script and the above assistant's reply form the context , Take the following example ：

U: Where is the window of the world ？
A: The window of the world is in Shenzhen .
U: Buy a ticket to go there . # It should be understood as ： Buy a ticket to Shenzhen .

Device context ： It refers to that the user's current script is related to the user's current equipment status , Take the following example ：

# The user is currently using QQ The music played by Faye Wong 《 Red bean 》
U: Change to Fang Datong's . # At this point, it needs to be understood as ： It was sung by Fang Datong 《 Red bean 》

2.2 Slot alignment

After understanding the semantic relationship between this turn of scripts and the above scripts , The context related information needs to be extracted and added to the intention of this round , There are generally two types ：

Slot inheritance

U: What's the weather like tomorrow
U: The weather in Beijing ？ # Beijing Tomorrow, What's the weather like? ？

Intended to inherit

U: What's the weather like today?
U: What about tomorrow ？ # What's the weather like tomorrow ？

That's all for this chapter , In the next chapter, let's take a look at semantic inheritance in multiple rounds of open domains 、 Omit and complete 、 Refers to the specific implementation process and details of resolution .

原网站

版权声明
本文为[Turned_ MZ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010556460697.html