当前位置:网站首页>Chapter7-11_ Deep Learning for Question Answering (2/2)
Chapter7-11_ Deep Learning for Question Answering (2/2)
2022-06-13 02:13:00 【zjuPeco】
This article is for teacher lihongyi 【Deep Learning for Question Answering (2/2)】 My course notes , Course video youtube Address , spot here ( Need to climb over the wall ).
The pictures used in the following paragraphs are all from Mr. lihongyi PPT, If there is infringement , Must delete .
Article index :
Part 1 - 7-10 Deep Learning for Question Answering (1/2)
The next part - 7-12 Controllable Chatbot
1 Simple Question: Match & Extract
Simple Question It means that you can learn from a given source Pass through Match and Extract The way to find the answer to the question . such as SQuAD The problem in is Simple Question. The first question in the figure below , We can go through "precipitation" and "fall" These two key words locate the answer in source In the first sentence of (Match), Then extract the answer from it "gravity"(Extract).
The model architecture to solve this problem is shown in the following figure , Usually called "Query-to-context Attention". Let's look at it from the bottom up , First of all, we will put source Input to "module for source" in , Then output each token Corresponding embedding, Every one here token Two... Will be output embedding, One is equivalent to attention Medium key, The other is equivalent to attention Medium value. meanwhile , You will also enter questions into "module for answer" in , Get one embedding, This is equivalent to attention Medium query. We will put the yellow one in the lower right corner query And the blue one in the lower left corner key Conduct match, After normalization , Each green value You get a weight . Weighted sum to get the upper dark blue eigenvector , Enter it into "module for answer" among , Get the final result . If there is only one answer token, It's a classification problem ; If the answer is one of them span, Then predict the beginning and end token The location of .
The most classic model of this architecture is 2015 Year of End-To-End Memory Networks.
The above architecture can be modified . Is that when "module for question" Output embedding There are many times , That is, there are many query When , What should we do ? We can put each query and key To make a match, That way, everyone value You get multiple weights , Let's take the biggest one .
A typical example of this approach is 2016 Year of Ask, Attend and Answer: Exploring Question-Guided
Spatial Attention for Visual Question Answering.
There is also an architecture , be called "context-to-query attention". such attention It's a source Every token Corresponding embedding As query,question Every token Corresponding embedding As key, Then the two do attention. And you end up with a vector , And then sum this vector with query At the time of the attention combination , Become the green vector in the figure below , Each blue token Will get a green vector , Then input all the green vectors "module for answer" in , Get the answer .
The architecture shown in the figure above is used in practice , May be right context do self-attention, The more classic ones are Gated Self-Matching Networks for Reading Comprehension and Question Answering
that query-to-context and context-to-query Who is the best of the two ? Since I can't tell clearly , We might as well use both .Bidirectional Attention Flow for Machine Comprehension, Dynamic Coattention Networks
For Question Answerin,QANet: Combining Local Convolution
with Global Self-Attention for Reading Comprehension These articles are used for both .
But today it seems , Everything mentioned above ,BERT There are all of them , This is also BERT Strong reasons , We just need one BERT That's enough .
2 Complex Question: Reasoning
complex question Today, it seems that the problem of has not yet found a suitable solution , So here is just a general introduction to what is complex question And existing solutions .
complex question and simple question The difference is that ,simple question The answer must be source Can be found directly in a passage of , and complex question Need more than one source Jump repeatedly between , Combine multiple information to get the answer .Qangaroo It's just one. complex question Data set of . For example, the question in the following figure is to ask "Hanging gardens of Mumbai" In which country , According to the first source I can only know it's in "Arabian Sea", but Arabian Sea It's not a country , We have to follow the following source To know in India.
Belong to complex question There are also Hoppot QA,DROP wait . Some of these problems even require the machine to be based on source To add, subtract, multiply and divide .
This kind of source The practice of skipping between can be shown as the following figure . First, according to the question match In a word , And then according to extract Content update for query, Do it again match and extract And updates , Until we find the answer we need . Generally, skipping several times is a preset super parameter .
In the network architecture , It can be represented as the following figure .source Each of the token Can become two embedding( Blue and orange ), respectively key and value. use query Of embedding To do attention, Combine the previous query Get new query. So again and again , The number of repetitions is preset by the person .
Of course, there are also models that let the model learn how many times to jump , See also ReasoNet.
This jumping method is very suitable for graph networks , So some people have also made use of graph network to solve complex question Attempts to , See also Dynamically Fused Graph Network for Multi-hop Reasoning. But others have found that , When training , hold BERT It's also going on finetune when , It's almost the same with or without the effect of the network , because BERT Inside self-attention It is a dense graph network , See also Is Graph Structure Necessary for Multi-hop Reasoning?.
3 Dialogue QA
Dialogue QA The difference from the previous two types of problems is ,Dialogue QA There will be multiple consecutive problems , The current question will need information from the previous question , Typical ones are CoQA. The answer must be in source In the middle of .
Some people who ask questions haven't seen it at all source The situation of , In this case , You may ask some unanswered questions , See also QuAC.
When solving this kind of problem , The problem of embedding Usually need to answer the previous question embedding Also do attention.
Currently for Dialogue QA Each model is in SQuAD The effect on is shown in the following figure , The top line is the human performance , The following ones rank top 5 The performance of the model , Before you can see 5 Our models surpass human beings . however , Is the model really so powerful ?
This is not the case , The generalization ability of the model is very poor , stay SQuAD On the training model , stay bAbI The performance was very poor .
Some even found that , When turning the question from "What did Tesla spend Astor’s money on" Reduced to "did" when , The model can even answer correctly , And the confidence level has increased . It can be seen that the model does not really learn semantics , Only the distribution of training set and verification set is very similar , The model learned a certain distribution .
When someone changed the distribution of the training set and the verification set , The model failed miserably .
It seems that they are studying QA The place of , Man has a long way to go .
边栏推荐
- Padavan mounts SMB sharing and compiles ffmpeg
- Learning notes 51 single chip microcomputer keyboard (non coding keyboard and coding keyboard, scanning mode of non coding keyboard, independent keyboard, matrix keyboard)
- Huawei equipment configures private IP routing FRR
- 4.11 introduction to firmware image package
- [pytorch]fixmatch code explanation - data loading
- [printf function and scanf function] (learning note 5 -- standard i/o function)
- Basic exercises of test questions Fibonacci series
- Day 1 of the 10 day smart lock project (understand the SCM stm32f401ret6 and C language foundation)
- STM32 external interrupt Usage Summary
- ROS learning -5 how function packs with the same name work (workspace coverage)
猜你喜欢

In the third quarter, the revenue and net profit increased "against the trend". What did vatti do right?

Combining strings and numbers using ssstream

STM32 IIC protocol controls pca9685 steering gear drive board
![[learning notes] xr872 audio driver framework analysis](/img/1a/008a89f835dc1b350a1f1ff27bee00.jpg)
[learning notes] xr872 audio driver framework analysis

Paipai loan parent company Xinye quarterly report diagram: revenue of RMB 2.4 billion, net profit of RMB 530million, a year-on-year decrease of 10%

QT realizes mind mapping function (II)
![[pytorch] kaggle image classification competition arcface + bounding box code learning](/img/1e/5e921987754da1e1750acdadb36849.jpg)
[pytorch] kaggle image classification competition arcface + bounding box code learning

STM32F103 IIC OLED program migration complete engineering code

Decoding iFLYTEK open platform 2.0 is a fertile land for developers and a source of industrial innovation
![[the second day of the actual combat of the smart lock project based on stm32f401ret6 in 10 days] light up with the key ----- input and output of GPIO](/img/98/77191c51c1bab28448fe197ea13a33.jpg)
[the second day of the actual combat of the smart lock project based on stm32f401ret6 in 10 days] light up with the key ----- input and output of GPIO
随机推荐
[learning notes] xr872 GUI littlevgl 8.0 migration (file system)
16 embedded C language interview questions (Classic)
Easydl related documents and codes
Share three stories about CMDB
Configuring virtual private network FRR for Huawei equipment
Leetcode 93 recovery IP address
Bluetooth module: use problem collection
华为设备配置IP和虚拟专用网混合FRR
Laptop touch pad operation
Looking at Qianxin's "wild prospect" of network security from the 2021 annual performance report
[unity] problems encountered in packaging webgl project and their solutions
SWD debugging mode of stm32
[analysis notes] source code analysis of siliconlabs efr32bg22 Bluetooth mesh sensorclient
Ctrip reshapes new Ctrip
【Unity】打包WebGL项目遇到的问题及解决记录
Basic exercise of test questions decimal to hexadecimal
Sensor: sht30 temperature and humidity sensor testing ambient temperature and humidity experiment (code attached at the bottom)
华为设备配置私网IP路由FRR
Sensorless / inductive manufacturing of brushless motor drive board based on stm32
分享三个关于CMDB的小故事