当前位置：网站首页>【AI4Code】《Pythia: AI-assisted Code Completion System》（KDD 2019）

【AI4Code】《Pythia: AI-assisted Code Completion System》（KDD 2019）

2022-07-25 13:08:00 【chad_ lee】

Code completion

Complement attribute / Method , Recommend in a given set item, The easiest way is Alphabetical order , The disadvantage is that the time for the user to pull down the menu may be longer than the time for directly typing the code . Users can type more prefixes to help complete .

Insert picture description here

Model based code completion

Based on abstract syntax tree （AST）——Pythia etc.
Based on code text ——Deep TabNine 、Galois etc.

data ：AST And code text

AST It is an abstract representation of the syntax structure of the source code . It represents the syntax structure of programming language in the form of tree , Each node in the tree represents a structure in the source code . The reason why grammar is “ abstract ” Of , It's because the grammar here doesn't represent every detail in the real grammar . such as , Nested parentheses are implied in the structure of the tree , Not in the form of nodes ; And it's like if-condition-then Such conditional jump statements , You can use a node with three branches to represent .

Insert picture description here

One is to parse the code into an abstract syntax tree （AST）, Each node contains two attributes ：type and value, So each node needs two embedding. Then use depth first traversal to AST Each node of flatten In sequence .

One is to directly process the code into text , Include spaces 、 A newline 、 Indent, etc .

Pythia（KDD’19）

Pythia Collected Github On Stars front 2700 individual Python Project code , It includes 1600 m Method call As training data .

Insert picture description here

The task is to give a length of $T$ Code snippet of $C$ , Each of them token by $c_t$ , And a special token “.”, forecast token $m^{*}$ . So this task is to give a sequence , According to the characterization of this sequence, predict a token, It's very suitable for LSTM：
$\begin{aligned} x_{t} &=L c_{t} \\ h_{t} &=f\left(x_{t}, h_{t-1}\right) \\ P(m \mid C) &=y_{t}=\operatorname{softmax}\left(W h_{t}+b\right) \\ m^{*} &=\operatorname{argmax}(P(m \mid C)) \end{aligned}$
That is to say LSTM The output of is followed by a classifier . It's also used here tying embedding,LSTM The output of goes through a linear layer , Directly and in the candidate set token Of embedding Do inner product , Then do the result of inner product softmax.

Pythia Have done as VSCode A plug-in for ：

Insert picture description here

therefore Code completion task and session-based The recommended tasks and methods are the same , However, the candidate set of code completion task is smaller .

DeepTabNine and Galois

This kind of method and Pythia similar , But the data format and model are different from Pythia Different , Code text is used on input data , The model uses GPT（ Only Transformer Of Decoder, One less layer Attention） instead of LSTM：

Insert picture description here

But these two methods are paid plug-ins , There are no open source technical details and papers .

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251110593067.html

边栏推荐

猜你喜欢

随机推荐