当前位置:网站首页>The core battlefield of China US AI arms race: trillion level pre training model

The core battlefield of China US AI arms race: trillion level pre training model

2022-06-24 01:52:00 Data ape

In the field of artificial intelligence , The mainstream players are China and the United States . Overall, it shows that the United States is leading , China is catching up . Both China and the United States regard artificial intelligence as a strategic highland , Poured a lot of resources .

so to speak , The artificial intelligence industry competition between China and the United States , It has been very intense . In a way , China and the United States are developing artificial intelligence “ The arms race ”.

How is the current competition going ?

Artificial intelligence is a huge industry , It is difficult to have a comprehensive assessment . however , We can come from a typical field “ To see only one spot ”—— Super large scale pre training model .

The reason why the super large-scale pre training model , As an observer of the China US AI competition “ window ”, Because this field is more in line with several characteristics of the arms race :

First of all , Significant strategic position .

At this stage , Artificial intelligence technology has great limitations , A certain kind of model can only solve the problems in a specific domain , Model “ generalization ” Poor ability . General artificial intelligence is the ultimate pursuit of people , The current special artificial intelligence model obviously can not meet the requirements . A way of thinking to solve problems , Is to increase the parameters of the model , Increase the complexity of the model , Improve the generalization ability of the model . People expect a larger parameter scale , It can bring higher model accuracy , And a model to solve more domain problems .

Whether the super large-scale pre training model can realize general artificial intelligence , Not yet known. . But for now , This is the most promising way . Quantitative change causes qualitative change , Only “ The amount ” Enough is enough , There is the possibility of qualitative change . We can compare a set of data : The adult brain contains about 850-860 Billion neurons , Each neuron is associated with 3 Ten thousand synaptic connections , The number of synapses in the human brain is estimated 2500 Trillions .

How did human intelligence come from , It is essentially from these neurons 、 synaptic . The human brain is also a computer , These neurons 、 Synapse is the basic computing unit . If you want artificial intelligence to reach the human level , That reaches or even surpasses the human brain in the number and scale of basic computing units , Is a necessary condition .

So this is the way to think about it , Build a large-scale pre training model , Add model parameters , It is equivalent to adding the calculation unit of the model . Maybe , Artificial intelligence “ singularity ” Namely 2500 Trillion computing units . Of course , The parameters of the pre training model are different from the concept of the computing unit . however , Now there is no better way , We can only increase the parameter scale of the model to 2500 Trillions of magnitude , See what happens then , There may be a miracle .

From this point of view , Create a pre training model with a scale of billions of parameters , It is a super project of mankind , It may have a significant impact on the country and even human society . There are many super scientific projects in modern history , The Manhattan Project is inevitable 、 Apollo moon program 、 Human genome project, etc , These super projects have broadened the scope of human development “ The ceiling ”.

second , The results of the competition are easy to evaluate .

Evaluate which of the two pre training models is better , There are many indicators , But there is one key indicator , That is the parameter scale . Overall ,1000 100 million parameter pre training model , than 100 The 100 million parameter pre training model is more powerful .

It's a bit like naval armament , Evaluate the combat effectiveness of the two warships , An important indicator is the tonnage of warships . A ten thousand ton warship , The combat effectiveness is generally stronger than that of a kiloton warship . The total tonnage of all warships , It has also become a key indicator to measure the strength of the navies of the two countries .

Same thing , It depends on the artificial intelligence competition between China and the United States , The parameter scale of the pre training model , Is a good indicator .

Third , Huge investment of resources , It's a game of burning money .

Similar to the arms race , Super large scale pre training model , Not only technical ability is needed , Also needed “ Banknote ability ”. There are three core elements of artificial intelligence : Algorithm 、 data 、 Calculate the force . A successful large-scale pre training model , It takes a lot of genius to solve algorithmic problems , Need to accumulate massive amounts of data , Model training requires a lot of computational effort . in every particular , Need to be “ Banknote ability ” Support for .

therefore , Super large scale pre training model , It's a giant game . at present , There are only a few players in the world from China and the United States .

Fourth ,“ The war situation ” Intense , race each other .

I counted the major large-scale pre training models in China and the United States , Especially some models that keep breaking the record of parameter scale , A statistical chart is made as follows :

China US pre training model competition

Several features can be seen from the above figure :

(1)  The United States started early on the large-scale pre training model , And continue to evolve . from AI2 stay 2018 Only 9400 10000 parameter ELMO Start , Google 、 Microsoft 、 Ying Wei Da 、OpenAI Wait for American companies to relay , Keep breaking the parameter scale record . And China is in 2021 In, he began to work on the large-scale pre training model , Three years later than the United States .

(2)  The large-scale pre training model is just a game for a few players , Sino US , There are only a few players . That makes sense , Technology of pre training model 、 data 、 The threshold of calculating power is very high , Only giants can play this game .

(3)  China has an obvious advantage of backwardness . Although China is a few years behind the United States , But one shot will heighten the intensity of the competition . America is famous for GPT-3 The scale of the model is still on the order of hundreds of billions , Google's Swith Transformer Just beginning to reach the threshold of trillions . China's “ Trillion Club ” There are already two players , The parameter scale of Zhiyuan Research Institute has 1.75 One trillion , Over Google Swith Transformer Of 1.6 One trillion . Alibaba just released M6 The parameter scale of has broken through 10 One trillion .

It should be said , The reason why Chinese enterprises and institutions can catch up , It is inseparable from the development characteristics of the pre training model itself . The increase of the parameter size of the pre training model is not linear , It's exponential . The parameter scale of the next generation model , Not twoorthree times that of the previous generation , It is likely to be an order of magnitude higher . Just look at the development of the United States in the past few years , It also conforms to this law , The parameter scale has gradually increased from 100 million to 10 Billion 、 Ten billion 、 One hundred billion 、 One trillion .

therefore , The record set by Alibaba will soon be broken again . Google in America 、 Microsoft 、OpenAI、 The strength of NVIDIA, etc , Still strong . These companies are likely to break the record next time .

(4) China has also formed players “ legion ”. For a country , To catch up and surpass in a certain field , It is not safe to rely solely on one enterprise or institution , You need more than one player . In addition to the record breaking Alibaba in China , The strength of Zhiyuan Research Institute is also very strong . Huawei also participates in this game , Although only a pre training model with a scale of 100 billion parameters has been released at present , But with Huawei's nature , As well as its emphasis on artificial intelligence , I believe Huawei will never stop at the scale of 100 billion yuan .

Besides , Chinese players also have Tencent 、 Baidu 、 IFLYTEK, etc . For example, Baidu has ERNIE-M, Tencent has a big star , Although they failed to break the record at that time in terms of parameter scale , But they also have their own characteristics , Belong to “ Small and beautiful ” The existence of .

It's important to point out that , China and the United States are rivals , But in the face of nature , Another teammate . Let's look at another data : At present, the parameter scale of the pre training model is 10 One trillion , Human brain synapses are larger than 2500 One trillion . The parameter scale of artificial intelligence , And the size of synapses in the brain , It's not good 2 An order of magnitude . If we consider the model parameters and synapses in the brain “ Calculate the force ” The difference on , This gap will be even greater .

Comparison between pre training model and synaptic size of human brain

Increase the scale of model parameters to 2500 Trillions of magnitude , It is a common challenge facing mankind . Of course , Countries that have the capacity to address this challenge , Mainly China and the United States . The revolution has not yet succeeded , Comrades still need to work hard .

As mentioned above , Large scale pre training model is a money burning game . The larger the parameter size , The higher the training cost . Based on the parameter scale 1750 Billions of GPT-3 For example , The cost of one training is as high as 1200 Thousands of dollars . The parameter scale is 2500 Trillions of models , How much will the training cost ? Although the training cost does not increase linearly with the parameter size , But bigger models , It will definitely cost more money . If man designed 2500 Trillion parameter scale pre training model , The training cost may reach billions or even tens of billions .

It's a bit like a particle accelerator . In order to explore the physical laws under high energy conditions , Particle accelerators are getting bigger and bigger , More and more money . At present, the largest particle accelerator in the world is European LHC, This machine is joined by dozens of countries , It costs tens of billions . In the Chinese scientific community , There has always been an argument , Whether to spend tens or even hundreds of billions , Make a comparison LHC Higher level particle collider .

In the field of controlled nuclear fusion , There is a similar project , It's famous ITER( International thermonuclear fusion experimental reactor program ).ITER The device is a superconducting tokamak that can produce large-scale nuclear fusion reactions , Be commonly called “ Artificial sun ”. Even by 1998 The currency value of the year is , It costs money 50 Billion dollars , Dozens of countries are also involved .

In the field of super large scale pre training model , If we reach the end , It will cost billions or even tens of billions to find a model with tens of millions of parameters , Can you also refer to the above example , To engage in a global cooperation ? Of course , China and the United States are the main force , Other countries also make soy sauce .

Just imagine , Once put AI Parameter scale of the model , To the level of synapses in the human brain , Will there be “ singularity ”? I still have a little expectation .

writing : Gaze into the deep space  /  Data ape

原网站

版权声明
本文为[Data ape]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/11/20211111111022464g.html

随机推荐