当前位置:网站首页>Voice assistant - Multi round conversation (theory and concept)

Voice assistant - Multi round conversation (theory and concept)

2022-06-12 07:32:00 Turned_ MZ

        In this chapter, let's take a look at the implementation scheme of multi round dialogue in voice assistant . In the current multi round implementation scheme of voice assistant , It is basically divided into two parts :rule_base( rule-based )  and model_base( Model-based ), From the perspective of controllable online effects and rapid development , Most multi round conversations are still rule_base Of , Generally divided into : Based on finite state automata , Or based on a dialogue script , At this point, it can be used in some modules in the whole process model Play the role of improving the accuracy of calling , As for pure model_base Multiple wheels of , At present, they are still in the academic circles , Very few have landed in industry .

Basic concepts :

1、 Closed domain multi round sessions

  •         Task driven , The main purpose is to complete the necessary slots , And execute the intention according to the slot position .
  •         Maintain a polling state per session , There are clear entry and exit States .

1.1 Polling status

        The closed domain multi - round dialogue can utilize finite state automata (FSM) To achieve . When the user's intention is preliminarily defined , Enter polling state , When the necessary slots are satisfied or the user actively interrupts , Exit polling . Take the following example :

U:: Help me navigate                    # Recognize the user's intention to navigate , At the same time, the necessary slots are missing , Enter polling state

A:: Where are you going ?         # Polling status

U: Beijing                          # Polling status , The user replies to the question

A: well                            # The necessary slot position shall meet , Exit polling status         

         When in polling state ,BOT Based on the current FSM The status of the node is used to query the user , And jump to the node according to the user's answer , Until the polling state ends . Here's the picture :

         Because closed domain multi round sessions have explicit polling status , And it's assistant led , Therefore, it is generally considered that the user's next round of script probability is to reply to the current assistant's query , At this time, you can choose to directly judge and fill in the slot information , Of course , If you want to do something more elaborate , It is necessary to determine whether the user's reply is a missing slot , There are two situations :

U: Buy a train ticket                 # Identify user intent , Missing location and time slot

A: Where are you going ?            # Enter polling state

U: I'd like to buy one that leaves tomorrow     #  The user did not answer the question , But the slot of the current scene , At this time, you need to be able to fill this slot into the time slot

A: well , So where do you want to go ? # It is found that the location slot is still missing , Keep asking

U: Buy a train ticket         # Identify user intent

A: Where are you going ?       # Enter polling state

U: What's the weather like today? ? # The user did not answer the question , Other executable intentions

A: It's sunny today , The temperature ..  #  You need to be able to jump out , Carry out other intentions

U: Beijing                         # The user replied to the previous question , This belongs to interrupt recovery , Need to be able to connect

         The above two cases are common problems in closed domain multiple rounds , The solution is simple , For the first question , You can rely on NER Identify whether the user reply contains the entity type corresponding to the slot , For the second question , Scene classification or intention model can be used to determine whether there are other intentions .

2、 Open domain multi round sessions  

        Different from the closed domain with multiple rounds of clear polling status , Open domain multi round sessions generally mean that there is no explicit polling status , It is necessary to judge whether the current round of the user's script is related to the above based on semantic understanding , There are generally three types : Anaphora digestion 、 Omit and complete 、 Semantic succession .

Take the following example :

( Anaphora digestion )

U: Who is Jay Chou ?

A: Jay Chou is .......

U: I want to listen to his song         # It should be understood as : I want to listen to Jay Chou's song

( Omit and complete )

U: What's the weather like today? ?

A: The weather today ......

U: What about tomorrow ?        # It should be understood as : What's the weather like tomorrow

( Semantic succession )

U: Set a date for tomorrow morning 8 An alarm clock at

A: well , I've set it for you

U: Cancel the alarm clock         # It should be understood as : Cancel the set tomorrow morning 8 An alarm clock at .

          Here is the difference between ellipsis completion and semantic inheritance , Omission and completion generally means that if you simply look at the current round of dialogue , It is not certain what the intention is , It is necessary to combine the above to determine the specific intention . Semantic succession means that the current round has a clear intention , Only the slot is missing , It is necessary to judge whether there is groove position information that can be smoothly supported in combination with the above text .

        In addition to the above three types of multi wheel , There are two other cases , However, few of them appear online , So we haven't realized it yet , Here is a brief introduction :

( Movement correction ) The user's current session is the correction of his previous session

U: Call Xiaohong

U: Rainbow of rainbow         # At this point, it needs to be understood as : Call Xiaohong

( Entity disambiguation ) There is ambiguity in the entity in this turn of the script , Disambiguation is required in combination with the above

U: How about Samsung's mobile phone ?

U: I prefer apples

2.1、 Context / context

        The context here is not necessarily the closest to the current script , But the semantic level and the current association intention , For example, the example of interrupt recovery just mentioned .

        At the same time, the above is not limited to the historical dialogue information , It can be divided into the following categories :

  • Dialogue context :
    • The user's current session and the user's previous session form a context , For example, the examples listed above are all like this
    • The user's current script and the above assistant's reply form the context , Take the following example :

U: Where is the window of the world ?

 A: The window of the world is in Shenzhen .

U: Buy a ticket to go there . # It should be understood as : Buy a ticket to Shenzhen .

  • Device context : It refers to that the user's current script is related to the user's current equipment status , Take the following example :

# The user is currently using QQ The music played by Faye Wong 《 Red bean 》

U: Change to Fang Datong's . # At this point, it needs to be understood as : It was sung by Fang Datong 《 Red bean 》    

2.2 Slot alignment

        After understanding the semantic relationship between this turn of scripts and the above scripts , The context related information needs to be extracted and added to the intention of this round , There are generally two types :

  • Slot inheritance

U: What's the weather like tomorrow

U: The weather in Beijing ?   # Beijing Tomorrow, What's the weather like? ?

  • Intended to inherit

U: What's the weather like today?

U: What about tomorrow ?         # What's the weather like tomorrow ?

         That's all for this chapter , In the next chapter, let's take a look at semantic inheritance in multiple rounds of open domains 、 Omit and complete 、 Refers to the specific implementation process and details of resolution .

原网站

版权声明
本文为[Turned_ MZ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203010556460697.html