当前位置：网站首页>Introduction to sakt method

Introduction to sakt method

2022-07-07 14:10:00 【Try more, record more, accumulate more】

Network architecture and embedded interpretation ：
Insert picture description here
SAKT The Internet : At each timestamp , The attention weight is estimated only for each of the preceding elements . key 、 Values and queries are extracted from the embedding layer shown below . When the first j The first element is the query element and the i When elements are key elements , Note that the weight is $a_{ij}$ .

Insert picture description here

Embedded layer : Embed the current exercise the student is trying and his past interactions . At each mark $t + 1$ when , Use exercises to embed the current problem $e_{t+1}$ Embedded in the query space , Use interaction to embed elements that will interact in the past $x_t$ Embedded in key and value spaces .
The method is introduced in detail ：
Model purpose ： According to the students 1 To t moment Answer the exercises of ,（ Interaction sequence $X = x_1, x_2, ..., x_t$ ,） Forecast on $t + 1$ moment , exercises $e_{t+1}$ Response of （ That is, predict the real situation , The right probability ）.

Interactive tuples ： $x_t = ( e_t, r_t)$ ： $t$ Time exercises $e_t$ Answer of $r_t$ Composed of . $x_t$ When numbering , Use both to express ,： $y_t = e_t + r_t × E$ , $E$ Is the number of topics , You can see the interaction number , Wrong answer The time is the same as the title number $y_t = e_t$ , When the answer is correct , Number plus the total number of topics $y_t = e_t + E$ .

Embedded layer description ：
The interaction sequence needs to be divided , Ensure that the length of all interaction sequences is consistent , Many are truncated , Short fill .
Therefore, the interaction sequence is composed of $y = (y_1, y_2, ...,y_t)$ Turn into $s = (s_1,s_2,...,s_n)$ .
Train an interactive embedding matrix ： $M ∈ R^{2E×d}$ , among d It's a potential dimension , Used to get interactive embedding . $s_i$ The embedding of is expressed as $M_{s_i}$
Practice embedding a matrix ： $E ∈ R^{E×d}$ , Users get exercises embedded . $e_i$ The embedding of is expressed as $E_{e_i}$

Location code ：
In order to encode the sequence sequence , Introduce parameters $P ∈ R^{n×d}$ , Add to interactive embedding , Form a new code . $P_i$ Add to section $i$ An interactive embedding vector , Form an interactive embedding vector with position coding .
Insert picture description here

From the attention level
Insert picture description here
Q: Exercises embedded
K： Answer interactively embedded
V ： Answer interactively embedded

Using the attention mechanism of scaling dot product
The current exercise interacts with each previous answer Have a relationship , Calculate the attention weight .

long position
Capture information from different subspaces .

Causal relationship
Because of the sequence , Unable to know the information of the predicted topic , So use the causality layer to mask the weights learned from future interactions .

Feedforward layer
In order to add nonlinearity to the model and consider the interaction between different potential dimensions , We use a feedforward network .
Insert picture description here