当前位置:网站首页>Voice assistant - Multi round conversation (theory and concept)
Voice assistant - Multi round conversation (theory and concept)
2022-06-12 07:32:00 【Turned_ MZ】
In this chapter, let's take a look at the implementation scheme of multi round dialogue in voice assistant . In the current multi round implementation scheme of voice assistant , It is basically divided into two parts :rule_base( rule-based ) and model_base( Model-based ), From the perspective of controllable online effects and rapid development , Most multi round conversations are still rule_base Of , Generally divided into : Based on finite state automata , Or based on a dialogue script , At this point, it can be used in some modules in the whole process model Play the role of improving the accuracy of calling , As for pure model_base Multiple wheels of , At present, they are still in the academic circles , Very few have landed in industry .
Basic concepts :
1、 Closed domain multi round sessions
- Task driven , The main purpose is to complete the necessary slots , And execute the intention according to the slot position .
- Maintain a polling state per session , There are clear entry and exit States .
1.1 Polling status
The closed domain multi - round dialogue can utilize finite state automata (FSM) To achieve . When the user's intention is preliminarily defined , Enter polling state , When the necessary slots are satisfied or the user actively interrupts , Exit polling . Take the following example :
U:: Help me navigate # Recognize the user's intention to navigate , At the same time, the necessary slots are missing , Enter polling state
A:: Where are you going ? # Polling status
U: Beijing # Polling status , The user replies to the question
A: well # The necessary slot position shall meet , Exit polling status
When in polling state ,BOT Based on the current FSM The status of the node is used to query the user , And jump to the node according to the user's answer , Until the polling state ends . Here's the picture :

Because closed domain multi round sessions have explicit polling status , And it's assistant led , Therefore, it is generally considered that the user's next round of script probability is to reply to the current assistant's query , At this time, you can choose to directly judge and fill in the slot information , Of course , If you want to do something more elaborate , It is necessary to determine whether the user's reply is a missing slot , There are two situations :
U: Buy a train ticket # Identify user intent , Missing location and time slot
A: Where are you going ? # Enter polling state
U: I'd like to buy one that leaves tomorrow # The user did not answer the question , But the slot of the current scene , At this time, you need to be able to fill this slot into the time slot
A: well , So where do you want to go ? # It is found that the location slot is still missing , Keep asking
U: Buy a train ticket # Identify user intent
A: Where are you going ? # Enter polling state
U: What's the weather like today? ? # The user did not answer the question , Other executable intentions
A: It's sunny today , The temperature .. # You need to be able to jump out , Carry out other intentions
U: Beijing # The user replied to the previous question , This belongs to interrupt recovery , Need to be able to connect
The above two cases are common problems in closed domain multiple rounds , The solution is simple , For the first question , You can rely on NER Identify whether the user reply contains the entity type corresponding to the slot , For the second question , Scene classification or intention model can be used to determine whether there are other intentions .
2、 Open domain multi round sessions
Different from the closed domain with multiple rounds of clear polling status , Open domain multi round sessions generally mean that there is no explicit polling status , It is necessary to judge whether the current round of the user's script is related to the above based on semantic understanding , There are generally three types : Anaphora digestion 、 Omit and complete 、 Semantic succession .
Take the following example :
( Anaphora digestion )
U: Who is Jay Chou ?
A: Jay Chou is .......
U: I want to listen to his song # It should be understood as : I want to listen to Jay Chou's song
( Omit and complete )
U: What's the weather like today? ?
A: The weather today ......
U: What about tomorrow ? # It should be understood as : What's the weather like tomorrow
( Semantic succession )
U: Set a date for tomorrow morning 8 An alarm clock at
A: well , I've set it for you
U: Cancel the alarm clock # It should be understood as : Cancel the set tomorrow morning 8 An alarm clock at .
Here is the difference between ellipsis completion and semantic inheritance , Omission and completion generally means that if you simply look at the current round of dialogue , It is not certain what the intention is , It is necessary to combine the above to determine the specific intention . Semantic succession means that the current round has a clear intention , Only the slot is missing , It is necessary to judge whether there is groove position information that can be smoothly supported in combination with the above text .
In addition to the above three types of multi wheel , There are two other cases , However, few of them appear online , So we haven't realized it yet , Here is a brief introduction :
( Movement correction ) The user's current session is the correction of his previous session
U: Call Xiaohong
U: Rainbow of rainbow # At this point, it needs to be understood as : Call Xiaohong
( Entity disambiguation ) There is ambiguity in the entity in this turn of the script , Disambiguation is required in combination with the above
U: How about Samsung's mobile phone ?
U: I prefer apples
2.1、 Context / context
The context here is not necessarily the closest to the current script , But the semantic level and the current association intention , For example, the example of interrupt recovery just mentioned .
At the same time, the above is not limited to the historical dialogue information , It can be divided into the following categories :
- Dialogue context :
- The user's current session and the user's previous session form a context , For example, the examples listed above are all like this
- The user's current script and the above assistant's reply form the context , Take the following example :
U: Where is the window of the world ?
A: The window of the world is in Shenzhen .
U: Buy a ticket to go there . # It should be understood as : Buy a ticket to Shenzhen .
- Device context : It refers to that the user's current script is related to the user's current equipment status , Take the following example :
# The user is currently using QQ The music played by Faye Wong 《 Red bean 》
U: Change to Fang Datong's . # At this point, it needs to be understood as : It was sung by Fang Datong 《 Red bean 》
2.2 Slot alignment
After understanding the semantic relationship between this turn of scripts and the above scripts , The context related information needs to be extracted and added to the intention of this round , There are generally two types :
- Slot inheritance
U: What's the weather like tomorrow
U: The weather in Beijing ? # Beijing Tomorrow, What's the weather like? ?
- Intended to inherit
U: What's the weather like today?
U: What about tomorrow ? # What's the weather like tomorrow ?
That's all for this chapter , In the next chapter, let's take a look at semantic inheritance in multiple rounds of open domains 、 Omit and complete 、 Refers to the specific implementation process and details of resolution .
边栏推荐
- Leetcode34. find the first and last positions of elements in a sorted array
- Non IID data and continuous learning processes in federated learning: a long road ahead
- Thoroughly understand the "rotation matrix / Euler angle / quaternion" and let you experience the beauty of three-dimensional rotation
- Kali and programming: how to quickly build the OWASP website security test range?
- Detailed explanation of coordinate tracking of TF2 operation in ROS (example + code)
- RT thread studio learning (VII) using multiple serial ports
- Pyhon的第五天
- Xshell installation
- Summary of semantic segmentation learning (I) -- basic concepts
- AcWing——4268. 性感素
猜你喜欢

Day 5 of pyhon

Formatting the generalization forgetting trade off in continuous learning

MySQL index (easy to handle in one article)

Class as a non type template parameter of the template

Personalized federated learning using hypernetworks paper reading notes + code interpretation

knife4j 初次使用

Modelarts training task 1

Federated meta learning with fast convergence and effective communication

Construction of running water lamp experiment with simulation software proteus

Detailed explanation of memory addressing in 8086 real address mode
随机推荐
Node: cannot open /node: access denied
LED lighting experiment with simulation software proteus
Modelarts training task 1
paddlepaddl 28 支持任意维度数据的梯度平衡机制GHM Loss的实现(支持ignore_index、class_weight,支持反向传播训练,支持多分类)
R语言使用caTools包的sample.split函数将机器学习数据集划分为训练集和测试集
2022起重机械指挥考试题模拟考试平台操作
Static coordinate transformation in ROS (analysis + example)
Embedded gd32 code read protection
Imx6q PWM drive
R语言glm函数构建泊松回归模型(possion)、epiDisplay包的poisgof函数对拟合的泊松回归模型进行拟合优度检验、即模型拟合的效果、验证模型是否有过度离散overdispersion
Non IID data and continuous learning processes in federated learning: a long road ahead
Qt实现托盘
Day 6 of pyhon
Use of gt911 capacitive touch screen
Test left shift real introduction
[wax chain tour] release a free and open source alien worlds script TLM
Tradeoff and selection of SWC compatible Polyfill
Dynamic coordinate transformation in ROS (dynamic parameter adjustment + dynamic coordinate transformation)
‘CMRESHandler‘ object has no attribute ‘_timer‘,socket.gaierror: [Errno 8] nodename nor servname pro
D cannot use a non CTFE pointer