当前位置:网站首页>【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)
【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)
2022-07-25 13:08:00 【chad_ lee】
Code completion
Complement attribute / Method , Recommend in a given set item, The easiest way is Alphabetical order , The disadvantage is that the time for the user to pull down the menu may be longer than the time for directly typing the code . Users can type more prefixes to help complete .

Model based code completion
- Based on abstract syntax tree (AST)——Pythia etc.
- Based on code text ——Deep TabNine 、Galois etc.
data :AST And code text
AST It is an abstract representation of the syntax structure of the source code . It represents the syntax structure of programming language in the form of tree , Each node in the tree represents a structure in the source code . The reason why grammar is “ abstract ” Of , It's because the grammar here doesn't represent every detail in the real grammar . such as , Nested parentheses are implied in the structure of the tree , Not in the form of nodes ; And it's like if-condition-then Such conditional jump statements , You can use a node with three branches to represent .

One is to parse the code into an abstract syntax tree (AST), Each node contains two attributes :type and value, So each node needs two embedding. Then use depth first traversal to AST Each node of flatten In sequence .
One is to directly process the code into text , Include spaces 、 A newline 、 Indent, etc .
Pythia(KDD’19)
Pythia Collected Github On Stars front 2700 individual Python Project code , It includes 1600 m Method call As training data .

The task is to give a length of T T T Code snippet of C C C , Each of them token by c t c_t ct, And a special token “.”, forecast token m ∗ m^{*} m∗. So this task is to give a sequence , According to the characterization of this sequence, predict a token, It's very suitable for LSTM:
x t = L c t h t = f ( x t , h t − 1 ) P ( m ∣ C ) = y t = softmax ( W h t + b ) m ∗ = argmax ( P ( m ∣ C ) ) \begin{aligned} x_{t} &=L c_{t} \\ h_{t} &=f\left(x_{t}, h_{t-1}\right) \\ P(m \mid C) &=y_{t}=\operatorname{softmax}\left(W h_{t}+b\right) \\ m^{*} &=\operatorname{argmax}(P(m \mid C)) \end{aligned} xthtP(m∣C)m∗=Lct=f(xt,ht−1)=yt=softmax(Wht+b)=argmax(P(m∣C))
That is to say LSTM The output of is followed by a classifier . It's also used here tying embedding,LSTM The output of goes through a linear layer , Directly and in the candidate set token Of embedding Do inner product , Then do the result of inner product softmax.
Pythia Have done as VSCode A plug-in for :

therefore Code completion task and session-based The recommended tasks and methods are the same , However, the candidate set of code completion task is smaller .
DeepTabNine and Galois
This kind of method and Pythia similar , But the data format and model are different from Pythia Different , Code text is used on input data , The model uses GPT( Only Transformer Of Decoder, One less layer Attention) instead of LSTM:

But these two methods are paid plug-ins , There are no open source technical details and papers .
边栏推荐
- 【历史上的今天】7 月 25 日:IBM 获得了第一项专利;Verizon 收购雅虎;亚马逊发布 Fire Phone
- 485 communication (detailed explanation)
- 如何用因果推断和实验驱动用户增长? | 7月28日TF67
- Shell常用脚本:获取网卡IP地址
- Eccv2022 | transclassp class level grab posture migration
- 卷积神经网络模型之——LeNet网络结构与代码实现
- Detailed explanation of switch link aggregation [Huawei ENSP]
- Substance designer 2021 software installation package download and installation tutorial
- 如何理解Keras中的指标Metrics
- Shell Basics (exit control, input and output, etc.)
猜你喜欢

Shell常用脚本:检测某域名、IP地址是否通

机器学习强基计划0-4:通俗理解奥卡姆剃刀与没有免费午餐定理

【AI4Code】《GraphCodeBERT: Pre-Training Code Representations With DataFlow》 ICLR 2021

cv2.resize函数报错:error: (-215:Assertion failed) func != 0 in function ‘cv::hal::resize‘

Cyberspace Security penetration attack and defense 9 (PKI)
![[CSDN year-end summary] end and start, always on the way -](/img/51/a3fc5eba0eeb22b600260ee81ff9e6.png)
[CSDN year-end summary] end and start, always on the way - "2021 summary of" 1+1= Wang "

Zero basic learning canoe panel (16) -- clock control/panel control/start stop control/tab control

Common operations for Yum and VIM

massCode 一款优秀的开源代码片段管理器

ESP32-C3 基于Arduino框架下Blinker点灯控制10路开关或继电器组
随机推荐
MLX90640 红外热成像仪测温传感器模块开发笔记(五)
【AI4Code】《CoSQA: 20,000+ Web Queries for Code Search and Question Answering》 ACL 2021
程序的内存布局
Word style and multi-level list setting skills (II)
I want to ask whether DMS has the function of regularly backing up a database?
How to understand metrics in keras
Can flinkcdc import multiple tables in mongodb database together?
Lu MENGZHENG's "Fu of broken kiln"
【GCN】《Adaptive Propagation Graph Convolutional Network》(TNNLS 2020)
[CSDN year-end summary] end and start, always on the way - "2021 summary of" 1+1= Wang "
Atcoder beginer contest 261e / / bitwise thinking + DP
How to use causal inference and experiments to drive user growth| July 28 tf67
The larger the convolution kernel, the stronger the performance? An interpretation of replknet model
A turbulent life
Machine learning strong foundation program 0-4: popular understanding of Occam razor and no free lunch theorem
Zero basic learning canoe panel (16) -- clock control/panel control/start stop control/tab control
如何理解Keras中的指标Metrics
Deep learning MEMC framing paper list
clickhouse笔记03-- Grafana 接入ClickHouse
[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)