当前位置:网站首页>Introduction to sakt method

Introduction to sakt method

2022-07-07 14:10:00 Try more, record more, accumulate more

Network architecture and embedded interpretation :
 Insert picture description here
SAKT The Internet : At each timestamp , The attention weight is estimated only for each of the preceding elements . key 、 Values and queries are extracted from the embedding layer shown below . When the first j The first element is the query element and the i When elements are key elements , Note that the weight is a i j a_{ij} aij.

 Insert picture description here

Embedded layer : Embed the current exercise the student is trying and his past interactions . At each mark t + 1 t+1 t+1 when , Use exercises to embed the current problem e t + 1 e_{t+1} et+1 Embedded in the query space , Use interaction to embed elements that will interact in the past x t x_t xt Embedded in key and value spaces .
The method is introduced in detail :
Model purpose : According to the students 1 To t moment Answer the exercises of ,( Interaction sequence X = x 1 , x 2 , . . . , x t X = x_1, x_2, ..., x_t X=x1,x2,...,xt,) Forecast on t + 1 t+1 t+1 moment , exercises e t + 1 e_{t+1} et+1 Response of ( That is, predict the real situation , The right probability ).

Interactive tuples : x t = ( e t , r t ) x_t = ( e_t, r_t) xt=(et,rt) t t t Time exercises e t e_t et Answer of r t r_t rt Composed of . x t x_t xt When numbering , Use both to express ,: y t = e t + r t × E y_t = e_t + r_t × E yt=et+rt×E , E E E Is the number of topics , You can see the interaction number , Wrong answer The time is the same as the title number y t = e t y_t = e_t yt=et, When the answer is correct , Number plus the total number of topics y t = e t + E y_t = e_t + E yt=et+E.

Embedded layer description :
The interaction sequence needs to be divided , Ensure that the length of all interaction sequences is consistent , Many are truncated , Short fill .
Therefore, the interaction sequence is composed of y = ( y 1 , y 2 , . . . , y t ) y = (y_1, y_2, ...,y_t) y=(y1,y2,...,yt) Turn into s = ( s 1 , s 2 , . . . , s n ) s = (s_1,s_2,...,s_n) s=(s1,s2,...,sn).
Train an interactive embedding matrix : M ∈ R 2 E × d M ∈ R^{2E×d} MR2E×d, among d It's a potential dimension , Used to get interactive embedding . s i s_i si The embedding of is expressed as M s i M_{s_i} Msi
Practice embedding a matrix : E ∈ R E × d E ∈ R^{E×d} ERE×d, Users get exercises embedded . e i e_i ei The embedding of is expressed as E e i E_{e_i} Eei

Location code :
In order to encode the sequence sequence , Introduce parameters P ∈ R n × d P ∈ R^{n×d} PRn×d, Add to interactive embedding , Form a new code . P i P_i Pi Add to section i i i An interactive embedding vector , Form an interactive embedding vector with position coding .
 Insert picture description here

From the attention level
 Insert picture description here
Q: Exercises embedded
K: Answer interactively embedded
V : Answer interactively embedded
 Insert picture description here

Using the attention mechanism of scaling dot product
The current exercise interacts with each previous answer Have a relationship , Calculate the attention weight .

long position
Capture information from different subspaces .

Causal relationship
Because of the sequence , Unable to know the information of the predicted topic , So use the causality layer to mask the weights learned from future interactions .

Feedforward layer
In order to add nonlinearity to the model and consider the interaction between different potential dimensions , We use a feedforward network .
 Insert picture description here
 Insert picture description here

Residual connection
Use low-level information

Prediction layer
The probability of getting the prediction
 Insert picture description here

Network training
Cross entropy

原网站

版权声明
本文为[Try more, record more, accumulate more]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071211572446.html