当前位置:网站首页>Chapter7-11_ Deep Learning for Question Answering (2/2)
Chapter7-11_ Deep Learning for Question Answering (2/2)
2022-06-13 02:13:00 【zjuPeco】
This article is for teacher lihongyi 【Deep Learning for Question Answering (2/2)】 My course notes , Course video youtube Address , spot here ( Need to climb over the wall ).
The pictures used in the following paragraphs are all from Mr. lihongyi PPT, If there is infringement , Must delete .
Article index :
Part 1 - 7-10 Deep Learning for Question Answering (1/2)
The next part - 7-12 Controllable Chatbot
1 Simple Question: Match & Extract
Simple Question It means that you can learn from a given source Pass through Match and Extract The way to find the answer to the question . such as SQuAD The problem in is Simple Question. The first question in the figure below , We can go through "precipitation" and "fall" These two key words locate the answer in source In the first sentence of (Match), Then extract the answer from it "gravity"(Extract).
The model architecture to solve this problem is shown in the following figure , Usually called "Query-to-context Attention". Let's look at it from the bottom up , First of all, we will put source Input to "module for source" in , Then output each token Corresponding embedding, Every one here token Two... Will be output embedding, One is equivalent to attention Medium key, The other is equivalent to attention Medium value. meanwhile , You will also enter questions into "module for answer" in , Get one embedding, This is equivalent to attention Medium query. We will put the yellow one in the lower right corner query And the blue one in the lower left corner key Conduct match, After normalization , Each green value You get a weight . Weighted sum to get the upper dark blue eigenvector , Enter it into "module for answer" among , Get the final result . If there is only one answer token, It's a classification problem ; If the answer is one of them span, Then predict the beginning and end token The location of .
The most classic model of this architecture is 2015 Year of End-To-End Memory Networks.
The above architecture can be modified . Is that when "module for question" Output embedding There are many times , That is, there are many query When , What should we do ? We can put each query and key To make a match, That way, everyone value You get multiple weights , Let's take the biggest one .
A typical example of this approach is 2016 Year of Ask, Attend and Answer: Exploring Question-Guided
Spatial Attention for Visual Question Answering.
There is also an architecture , be called "context-to-query attention". such attention It's a source Every token Corresponding embedding As query,question Every token Corresponding embedding As key, Then the two do attention. And you end up with a vector , And then sum this vector with query At the time of the attention combination , Become the green vector in the figure below , Each blue token Will get a green vector , Then input all the green vectors "module for answer" in , Get the answer .
The architecture shown in the figure above is used in practice , May be right context do self-attention, The more classic ones are Gated Self-Matching Networks for Reading Comprehension and Question Answering
that query-to-context and context-to-query Who is the best of the two ? Since I can't tell clearly , We might as well use both .Bidirectional Attention Flow for Machine Comprehension, Dynamic Coattention Networks
For Question Answerin,QANet: Combining Local Convolution
with Global Self-Attention for Reading Comprehension These articles are used for both .
But today it seems , Everything mentioned above ,BERT There are all of them , This is also BERT Strong reasons , We just need one BERT That's enough .
2 Complex Question: Reasoning
complex question Today, it seems that the problem of has not yet found a suitable solution , So here is just a general introduction to what is complex question And existing solutions .
complex question and simple question The difference is that ,simple question The answer must be source Can be found directly in a passage of , and complex question Need more than one source Jump repeatedly between , Combine multiple information to get the answer .Qangaroo It's just one. complex question Data set of . For example, the question in the following figure is to ask "Hanging gardens of Mumbai" In which country , According to the first source I can only know it's in "Arabian Sea", but Arabian Sea It's not a country , We have to follow the following source To know in India.
Belong to complex question There are also Hoppot QA,DROP wait . Some of these problems even require the machine to be based on source To add, subtract, multiply and divide .
This kind of source The practice of skipping between can be shown as the following figure . First, according to the question match In a word , And then according to extract Content update for query, Do it again match and extract And updates , Until we find the answer we need . Generally, skipping several times is a preset super parameter .
In the network architecture , It can be represented as the following figure .source Each of the token Can become two embedding( Blue and orange ), respectively key and value. use query Of embedding To do attention, Combine the previous query Get new query. So again and again , The number of repetitions is preset by the person .
Of course, there are also models that let the model learn how many times to jump , See also ReasoNet.
This jumping method is very suitable for graph networks , So some people have also made use of graph network to solve complex question Attempts to , See also Dynamically Fused Graph Network for Multi-hop Reasoning. But others have found that , When training , hold BERT It's also going on finetune when , It's almost the same with or without the effect of the network , because BERT Inside self-attention It is a dense graph network , See also Is Graph Structure Necessary for Multi-hop Reasoning?.
3 Dialogue QA
Dialogue QA The difference from the previous two types of problems is ,Dialogue QA There will be multiple consecutive problems , The current question will need information from the previous question , Typical ones are CoQA. The answer must be in source In the middle of .
Some people who ask questions haven't seen it at all source The situation of , In this case , You may ask some unanswered questions , See also QuAC.
When solving this kind of problem , The problem of embedding Usually need to answer the previous question embedding Also do attention.
Currently for Dialogue QA Each model is in SQuAD The effect on is shown in the following figure , The top line is the human performance , The following ones rank top 5 The performance of the model , Before you can see 5 Our models surpass human beings . however , Is the model really so powerful ?
This is not the case , The generalization ability of the model is very poor , stay SQuAD On the training model , stay bAbI The performance was very poor .
Some even found that , When turning the question from "What did Tesla spend Astor’s money on" Reduced to "did" when , The model can even answer correctly , And the confidence level has increased . It can be seen that the model does not really learn semantics , Only the distribution of training set and verification set is very similar , The model learned a certain distribution .
When someone changed the distribution of the training set and the verification set , The model failed miserably .
It seems that they are studying QA The place of , Man has a long way to go .
边栏推荐
- [pytorch] kaggle image classification competition arcface + bounding box code learning
- LeetCode每日一题——890. 查找和替换模式
- Vscode configuration header file -- Take opencv and its own header file as an example
- 如何解决通过new Date()获取时间写出数据库与当前时间相差8小时问题【亲测有效】
- 华为设备配置CE双归属
- About the fact that I gave up the course of "Guyue private room course ROS manipulator development from introduction to actual combat" halfway
- The execution results of i+=2 and i++ i++ under synchronized are different
- [arithmetic, relation, logic, bit, compound assignment, self increasing, self decreasing and other] operators (learning note 4 -- C language operators)
- Day 1 of the 10 day smart lock project (understand the SCM stm32f401ret6 and C language foundation)
- [keras] generator for 3D u-net source code analysis py
猜你喜欢
![[pytorch] kaggle large image dataset data analysis + visualization](/img/b0/7b8aff44d6bedd7ca2c705f13a8556.jpg)
[pytorch] kaggle large image dataset data analysis + visualization

Record: how to solve the problem of "the system cannot find the specified path" in the picture message uploaded by transferto() of multipartfile class [valid through personal test]
![[work with notes] NDK compiles the open source library ffmpeg](/img/24/ed33e12a07e001fc708e0c023e479c.jpg)
[work with notes] NDK compiles the open source library ffmpeg

Use of Arduino series pressure sensors and detected data displayed by OLED (detailed tutorial)

柏瑞凱電子沖刺科創板:擬募資3.6億 汪斌華夫婦為大股東

Solution of depth learning for 3D anisotropic images
![[the fourth day of actual combat of stm32f401ret6 smart lock project in 10 days] voice control is realized by externally interrupted keys](/img/fc/f03c7dc4d5ee12aaa301f54e4cd3f4.jpg)
[the fourth day of actual combat of stm32f401ret6 smart lock project in 10 days] voice control is realized by externally interrupted keys
![[pytorch] kaggle image classification competition arcface + bounding box code learning](/img/1e/5e921987754da1e1750acdadb36849.jpg)
[pytorch] kaggle image classification competition arcface + bounding box code learning

Combining strings and numbers using ssstream

Why is "iFLYTEK Super Brain 2030 plan" more worthy of expectation than "pure" virtual human
随机推荐
LabVIEW大型项目开发提高质量的工具
Build MySQL environment under mac
C language compressed string is saved to binary file, and the compressed string is read from binary file and decompressed.
SQLserver2008 拒绝了对对象 '****' (数据库 '****',架构 'dbo')的 SELECT 权限
16 embedded C language interview questions (Classic)
ROS learning -5 how function packs with the same name work (workspace coverage)
[pytorch]fixmatch code explanation - data loading
反爬虫策略(ip代理、设置随机休眠时间、哔哩哔哩视频信息爬取、真实URL的获取、特殊字符的处理、时间戳的处理、多线程处理)
LeetCode每日一题——890. 查找和替换模式
Area of basic exercise circle ※
Record: how to solve the problem of "the system cannot find the specified path" in the picture message uploaded by transferto() of multipartfile class [valid through personal test]
[programming idea] communication interface of data transmission and decoupling design of communication protocol
json,xml,txt
Sensor: MQ-5 gas module measures the gas value (code attached at the bottom)
Luzhengyao, who has entered the prefabricated vegetable track, still needs to stop being impatient
[work notes] the problem of high leakage current in standby mode of dw7888 motor driver chip
Can't use typedef yet? C language typedef detailed usage summary, a solution to your confusion. (learning note 2 -- typedef setting alias)
华为设备配置私网IP路由FRR
The new wild prospect of JD instant retailing from the perspective of "hour shopping"
cin,cin. get(),cin. Summary of the use of getline() and getline()