当前位置：网站首页>Chapter7-11_ Deep Learning for Question Answering (2/2)

Chapter7-11_ Deep Learning for Question Answering (2/2)

2022-06-13 02:13:00 【zjuPeco】

List of articles

1 Simple Question: Match & Extract
2 Complex Question: Reasoning
3 Dialogue QA

This article is for teacher lihongyi 【Deep Learning for Question Answering (2/2)】 My course notes , Course video youtube Address , spot here ( Need to climb over the wall ).

The pictures used in the following paragraphs are all from Mr. lihongyi PPT, If there is infringement , Must delete .

Article index ：

Part 1 - 7-10 Deep Learning for Question Answering (1/2)

The next part - 7-12 Controllable Chatbot

General catalogue

1 Simple Question: Match & Extract

Simple Question It means that you can learn from a given source Pass through Match and Extract The way to find the answer to the question . such as SQuAD The problem in is Simple Question. The first question in the figure below , We can go through "precipitation" and "fall" These two key words locate the answer in source In the first sentence of (Match), Then extract the answer from it "gravity"(Extract).
7-11-1

The model architecture to solve this problem is shown in the following figure , Usually called "Query-to-context Attention". Let's look at it from the bottom up , First of all, we will put source Input to "module for source" in , Then output each token Corresponding embedding, Every one here token Two... Will be output embedding, One is equivalent to attention Medium key, The other is equivalent to attention Medium value. meanwhile , You will also enter questions into "module for answer" in , Get one embedding, This is equivalent to attention Medium query. We will put the yellow one in the lower right corner query And the blue one in the lower left corner key Conduct match, After normalization , Each green value You get a weight . Weighted sum to get the upper dark blue eigenvector , Enter it into "module for answer" among , Get the final result . If there is only one answer token, It's a classification problem ; If the answer is one of them span, Then predict the beginning and end token The location of .
7-11-2
The most classic model of this architecture is 2015 Year of End-To-End Memory Networks.

The above architecture can be modified . Is that when "module for question" Output embedding There are many times , That is, there are many query When , What should we do ？ We can put each query and key To make a match, That way, everyone value You get multiple weights , Let's take the biggest one .
7-11-3
A typical example of this approach is 2016 Year of Ask, Attend and Answer: Exploring Question-Guided
Spatial Attention for Visual Question Answering.

There is also an architecture , be called "context-to-query attention". such attention It's a source Every token Corresponding embedding As query,question Every token Corresponding embedding As key, Then the two do attention. And you end up with a vector , And then sum this vector with query At the time of the attention combination , Become the green vector in the figure below , Each blue token Will get a green vector , Then input all the green vectors "module for answer" in , Get the answer .
7-11-4
The architecture shown in the figure above is used in practice , May be right context do self-attention, The more classic ones are Gated Self-Matching Networks for Reading Comprehension and Question Answering

that query-to-context and context-to-query Who is the best of the two ？ Since I can't tell clearly , We might as well use both .Bidirectional Attention Flow for Machine Comprehension, Dynamic Coattention Networks
For Question Answerin,QANet: Combining Local Convolution
with Global Self-Attention for Reading Comprehension These articles are used for both .

But today it seems , Everything mentioned above ,BERT There are all of them , This is also BERT Strong reasons , We just need one BERT That's enough .
7-11-5

2 Complex Question: Reasoning

complex question Today, it seems that the problem of has not yet found a suitable solution , So here is just a general introduction to what is complex question And existing solutions .

complex question and simple question The difference is that ,simple question The answer must be source Can be found directly in a passage of , and complex question Need more than one source Jump repeatedly between , Combine multiple information to get the answer .Qangaroo It's just one. complex question Data set of . For example, the question in the following figure is to ask "Hanging gardens of Mumbai" In which country , According to the first source I can only know it's in "Arabian Sea", but Arabian Sea It's not a country , We have to follow the following source To know in India.
7-11-6

Belong to complex question There are also Hoppot QA,DROP wait . Some of these problems even require the machine to be based on source To add, subtract, multiply and divide .

This kind of source The practice of skipping between can be shown as the following figure . First, according to the question match In a word , And then according to extract Content update for query, Do it again match and extract And updates , Until we find the answer we need . Generally, skipping several times is a preset super parameter .
7-11-7

In the network architecture , It can be represented as the following figure .source Each of the token Can become two embedding（ Blue and orange ）, respectively key and value. use query Of embedding To do attention, Combine the previous query Get new query. So again and again , The number of repetitions is preset by the person .
7-11-8

Of course, there are also models that let the model learn how many times to jump , See also ReasoNet.

This jumping method is very suitable for graph networks , So some people have also made use of graph network to solve complex question Attempts to , See also Dynamically Fused Graph Network for Multi-hop Reasoning. But others have found that , When training , hold BERT It's also going on finetune when , It's almost the same with or without the effect of the network , because BERT Inside self-attention It is a dense graph network , See also Is Graph Structure Necessary for Multi-hop Reasoning?.
7-11-9

3 Dialogue QA

Dialogue QA The difference from the previous two types of problems is ,Dialogue QA There will be multiple consecutive problems , The current question will need information from the previous question , Typical ones are CoQA. The answer must be in source In the middle of .
7-11-10
Some people who ask questions haven't seen it at all source The situation of , In this case , You may ask some unanswered questions , See also QuAC.

When solving this kind of problem , The problem of embedding Usually need to answer the previous question embedding Also do attention.
7-11-11
Currently for Dialogue QA Each model is in SQuAD The effect on is shown in the following figure , The top line is the human performance , The following ones rank top 5 The performance of the model , Before you can see 5 Our models surpass human beings . however , Is the model really so powerful ？
7-11-12
This is not the case , The generalization ability of the model is very poor , stay SQuAD On the training model , stay bAbI The performance was very poor .
7-11-13

Some even found that , When turning the question from "What did Tesla spend Astor’s money on" Reduced to "did" when , The model can even answer correctly , And the confidence level has increased . It can be seen that the model does not really learn semantics , Only the distribution of training set and verification set is very similar , The model learned a certain distribution .
7-11-14

When someone changed the distribution of the training set and the verification set , The model failed miserably .
7-11-15
It seems that they are studying QA The place of , Man has a long way to go .