当前位置：网站首页>Aiphacode is not a substitute for programmers, but a tool for developers

Aiphacode is not a substitute for programmers, but a tool for developers

2022-07-02 10:15:00 【AI technology base camp】

compile | Hemu wood

Produce | AI Technology base （ID:rgznai100）

DeepMind yes AI Research Laboratory , It introduces a deep learning model , Can generate software source code with significant effect . The model is called AIphaCode, Is based on Transformers,OpenAI The same architecture is used in its code generation model .

Programming is one of the promising applications of deep learning and large language model . The growing demand for programming talent has stimulated the competition to create tools , These tools can improve the efficiency of developers , And provide tools for non developers to create software .

And in this respect ,AIphaCode It's really impressive . It successfully solves complex programming challenges , These challenges usually require hours of planning 、 Coding and testing . It may be a good tool to turn problem descriptions into working code .

But it's not equivalent to any level of human programmer . This is a completely different way of creating software , Without human thinking and intuition , This method is incomplete .

Coding competition

Examples of coding challenge descriptions . The picture is from DeepMind

AIphaCode It's not the only one , But it accomplishes a very complex task . Other similar systems focus on generating short snippets of code , For example, functions or code blocks that perform small tasks （ for example , Set up Web The server , from API Extract information from the system ）. Although impressive , But when the language model is exposed to a large enough source code corpus , These tasks become insignificant .

On the other hand ,AIphaCode Designed to solve competitive programming problems . Participants in the coding challenge must read the challenge description , Understand the problem , Turn it into an algorithmic solution , In a common language , And evaluate a limited set of test cases . Last , Their results are evaluated based on the performance of hidden tests that are not available during implementation . Coding challenges can also have other conditions , For example, time and memory limitations .

Basically , The machine learning model involved in the coding challenge must generate a complete program , To solve its unprecedented problems .

Examples of coding challenge solutions . The picture is from DeepMind

Transformer And the power of large language models

AlphaCode It is another example of the progress made by large language models in solving complex problems .AlphaCode It is another example of the progress made by large language models in solving complex problems . This deep learning system is often called sequence to sequence model (Seq2seq).Seq2seq The algorithm takes a series of values （ Letter 、 Pixels 、 Numbers etc. ） As input , And generate another sequence of values . This is machine translation 、 Methods used in many natural language tasks such as text generation and speech recognition .

according to DeepMind The paper of ,AlphaCode An encoder is used - decoder Transformer framework . In recent years ,Transformer Become particularly popular , Because they can handle a large number of data sequences , And compared with its predecessor, cyclic neural network (RNN) And long and short term memory networks (LSTM) Much less memory and computing required .

Transformer The structure of the network

AlphaCode The encoder part of creates a digital representation for the natural language description of the problem . The decoder part obtains the embedded vector generated by the encoder , And try to generate the source code of the solution .

The fact proved that ,Transformer The model is good at such tasks , Especially when they are provided with enough training data and computing power . But in the opinion of researchers ,AlphaCode The real excellence of is not just the powerful function of putting raw data into super large neural networks , It's more about DeepMind The ingenuity of scientists in designing the training process and the algorithms that generate and filter it .

Unsupervised and supervised learning

In order to create AlphaCode,DeepMind Scientists combine unsupervised pre training with supervised fine-tuning . It is often called self supervised learning , This approach has become popular in expensive and time-consuming applications that do not have enough labeled data or data annotations .

In the pre training phase ,AlphaCode From the GitHub Extracted 715GB The data were unsupervised . Train the model by trying to predict the missing parts of the language or code fragment . The advantage of this approach is that it does not require any type of annotation , And by contacting more and more samples ,ML Models are better at creating numerical representations of the structure of text and source code .

Training and Application AlphaCode The algorithm of . The picture is from DeepMind

And then in CodeContests（DeepMind Annotated dataset created by the team ） Fine tune the pre training model . This dataset contains problem statements 、 Collection of test cases and errors from various sources , Include Codeforces、Description2Code and IBM Of CodeNet. The model has been trained , The text description of the challenge can be converted into the generated source code . Its results are evaluated through test cases , And compare it with the correct submission .

When creating a dataset , Researchers pay particular attention to avoiding training 、 Historical overlap between validation and test sets . This ensures that the ML The model will not produce memory results when facing coding challenges .

Code generation and filtering

once AlphaCode Trained , It will test for problems that have never been encountered before . When AlphaCode When dealing with a new problem , It will produce many solutions . then , It uses a filtering algorithm to select the best 10 Candidates and submit them to the competition . If at least one of them is correct , It is considered that the problem has been solved .

according to DeepMind The paper of ,AlphaCode Millions of samples can be generated for each problem , Although it usually generates thousands of solutions . Then filter the sample , Include only those samples that pass the tests included in the problem statement . According to the paper , This will delete about 99% Generated samples of , But there are still thousands of valid samples left .

In order to optimize the sample selection process , Use clustering algorithm to group solutions . According to the researchers , The clustering process tends to group work solutions together . This makes it easier to find a small number of candidates who may pass the competitive concealment test .

according to DeepMind That's what I'm saying , In fashion Codeforces When testing in the actual programming competition on the platform ,AlphaCode Top average 54%, Considering the difficulty of coding challenges , It's very impressive .

AI VS human beings

DeepMind My blog correctly points out that ,AlphaCode Is the first “ Achieve competitive performance levels in programming competitions ” Of AI Code generation system .

However , However, some people mistook this statement for artificial intelligence coding “ As good as human programmers ” It is fallacious to compare the narrow sense of artificial intelligence with the general ability of human beings to solve problems .

for example ,DeepBlue and AlphaGo, They are artificial intelligence systems that beat the world champions of chess and go . Although both systems are remarkable achievements in computer science and artificial intelligence , But they are only good at one task . They cannot compete with human rivals in any other task that requires careful planning and strategy , These are the skills that humans acquired before becoming masters of chess and go .

It can also be said about competitive programming . A programmer who has reached a competitive level in coding challenges has spent years learning . They can think abstractly , Solve simpler challenges , Write simple programs , And show many other skills that are taken for granted and not evaluated in programming competitions .

In short , These competitions are designed for human beings . You can be sure , Generally speaking , The person at the top of competitive programming is a good programmer . That's why many companies use these challenges to make recruitment decisions .

On the other hand ,AlphaCode Is a shortcut to competitive programming —— Although it's excellent . It creates novel code , Will not copy and paste from their training data . But it's not the same as ordinary programmers .

therefore , It's not about letting AlphaCode Compete with programmers , We should pay more attention to AlphaCode And other things like that AI More interested in what the system can do when working with human programmers . These tools can have a huge impact on programmer productivity . They may even change the programming culture , Turn human beings to formulate problems （ It is still a discipline in the field of human intelligence ） And let the AI system generate code .

But programmers will remain in control , They must learn to use the power and limitations of artificial intelligence to generate code .

Reference link ：

https://thenextweb.com/news/deepmind-alphacode-tool-not-replacement-for-human-programmers-syndication