当前位置:网站首页>First time in China! The language AI strength of this Chinese enterprise is recognized as No.2 in the world! Second only to Google

First time in China! The language AI strength of this Chinese enterprise is recognized as No.2 in the world! Second only to Google

2022-07-07 19:07:00 qubit

Jin Lei From the Aofei temple qubits | official account QbitAI

AI Sperm forming ,“ Drive mad ” The programmer ;AI Do high numbers , His grades exceed those of his doctor ;AI Write code , Successfully tune the agent ……

I've read too many such stories , Do you think ,AI It's too voluminous , It's time to go to heaven .

Return to the source today , Talk about something less mysterious .AI Why did it evolve ? There are no secrets at the bottom , It's nothing more than language 、 Several basic skills such as vision .

among , Language Ability pair AI The level of intelligence has a decisive impact . Vision Study how “ see ”, Language research “ listen ”、“ say ” and “ understand ”.

For humans ,“ listen ”、“ say ”、“ understand ” Add up , Basically equal to Thinking ability , Yes AI, The reason is similar .

lately , Advisory bodies Gartner Release 《 cloud AI Developer service critical capability report 》, For global cloud service providers AI Ability has been ranked .

Language AI This one , Not surprisingly, the first place is Google .

The second place is more surprising , yes Alibaba . This is since the release of the list , Chinese companies in this field for the first time Enter the top three in the world .

Top ten schools in the world , China's BAT Three seats , The achievements are remarkable .

Google scores 3.55, Ali scored 3.48

Language AI, Including voice 、 There are two major categories of semantics .

Voice is responsible for letting the machine learn “ listen ” and “ say ”; semantics , That is natural language processing (NLP), Be responsible for letting the machine learn “ understand ”.

First look at it. Gartner Report on the semantic of speech Criteria

The report examines cloud vendor languages AI Multiple subdivisions of service items , For example, speech recognition 、 Language understanding, etc , And rate the function realization degree of each service item .

Gartner Divide the degree of each function into 5 Level , They correspond to each other 1-5 branch , The higher the score, the stronger the strength .

Alibaba cloud AI Ability , It mainly includes :

Ali in speech recognition Natural language generation / speech synthesis Language understanding / Handle Text analysis The highest scores were obtained for these key abilities .

The report gives weight to each subdivision , Calculate the total score by combining the individual score and the project weight , Finally, Google's language AI With 3.55 The total score of ranks first ; Ali scored 3.48, The second .

But beyond that, the ability to be more detailed ,Gartner Our report does not describe in detail .

On the cloud of Dharma hall blessing AI

Or follow Gartner The report , hold “ Language AI” One is divided into two , Look at what is voice , What is semantics .

First of all voice Level of AI technology .

Application of voice , We are no strangers , Apple Siri、 Microsoft Xiaobing, etc AI assistant , By giving the machine voice ability , So as to interact with humans .

Every voice product , There is a set of voice technology software and hardware behind it .

Alibaba cloud relies on , yes Hands on the court In voice AI Deep accumulation in this field .

Dharma hall is in pronunciation AI The field first started with speech recognition technology , Technical capabilities cover speech recognition acoustic models and basic frameworks 、 The speaker distinguishes 、 Speech synthesis acoustic model and vocoder 、 Spoken language processing 、 Jointly optimized acoustic front end, etc .

2019 year , Ali voice AI Ever been MIT Selected as the “ Ten breakthrough technologies ”, The technical capability behind this , It comes from Dharma hall .

With Gartner The report evaluated Speech to text、 That's what we often say “ speech recognition ” Technology as an example .

The voice of Dharma hall AI, In conventional near-field speech recognition 、 Far field voice scene 、 Talk to many people “ Cocktail scene ” Beyond speech recognition skills , There are also some unique long tail skills , such as “ Freedom theory in China and Britain ”、“ Speak freely in dialect ”.

Take a chestnut

, Mix Chinese and English ——“ Lend it to you iPad Show me paper”, How can the machine understand this sentence ?

Popular end-to-end speech recognition in the industry  (End-to-End ASR)  technology , It works well on monolingual tasks , But once you switch to Multilingualism  (Code-Switch) scenario , Still not ideal .

For such problems , The speech Laboratory of Dharma hall draws lessons from the hybrid expert system (Mixture of Experts) Thought .

In the end-to-end speech recognition model , A sub network is designed for Chinese and English respectively , Finally, the output of each sub network is weighted by the gating module .

In order to reduce the amount of model parameters , in 、 The English sub network adopts bottom sharing , High level independent way . Finally make the model in Chinese 、 english 、 Good results can be achieved in both Chinese and English speaking scenes .

On this basis , Dharma hall integrates its self-developed end-to-end speech recognition technology SAN-M Network structure , Create a new generation of end-to-end Chinese English free speech recognition system .

The end result is : Ali's voice AI Without language information , Greatly improve the recognition performance in mixed Chinese and English speaking scenes .

 SAN-M Network structure framework

Learn from this set of model building ideas , Dharma hall is unlocked again “ Speak freely in dialect ” Skill , We have built an end-to-end dialect free speech recognition system .

Dialect is not required id Under the circumstances , One model can identify 14 Common dialects , And ensure that the recognition performance of pure Chinese relative to monolingual model is basically not degraded .

Dharma Hall AI Technology mainly provides services through Alibaba cloud , With “ Integrated ” The way , It is widely used in operators 、 Online retailers 、 logistics 、 Electricity and many other industries .

In addition to voice AI Beyond technology , Ali in semantics It also forms a strong technical system .

Language itself is “ The sound ” and “ The righteous ” The combination of ——“ hear ” Sincere and valuable ,“ understand ” The price is higher .

Human language is not difficult , A few years old children can easily master a language . But computers have their own programming language , It is extremely difficult for it to understand human language .

NLP The evolution of Technology , yes AI The premise of the evolution from perceptual intelligence to cognitive intelligence . In the past ten years ,NLP The most iconic event of technological evolution , Namely Large scale pre training language model Appearance .

Ali Dharma academy is one of the first teams in the industry to carry out large-scale model exploration ,2019 In, we began to develop a large-scale pre training language model system AliceMind, And take this as the technical base , Carry out internal and external technical services .

“ Pre big model era ”,NLP Technical solutions , It is to design the model separately for each task . Model development is often complex , Lack of numeracy 、 data 、 Small and medium-sized teams with technical strength are often difficult to afford .

After the emergence of the pre training language model ,AI The overall intelligence of is much higher than that in the past ,NLP The enabling method of technology has gradually become “ Preliminary training + fine-tuning ” normal form .

That is, based on the general pre training model , Add a simple task layer 、 Combined with a small amount of scene corpus , Train a high-quality task model at a lower cost .

Ali's large-scale pre training language model system of Dharma academy , Own reading 、 writing 、 translate 、 Question and answer 、 Search for 、 Summary generation 、 Dialogue and so on .

Large models are usually not directly used to solve application problems , But through specific tasks 、 Combination of application scenarios , Hatch layer by layer “ Middle model ”、“ Little model ”.

Based on the large model system , The Language Technology Laboratory of Dharma hall has incubated a series of “ Middle model ”, Include :

  • General pre training model StructBERT
  • Generative pre training model PALM
  • Multilingual pre training model VECO
  • Super large Chinese pre training model PLUG
  • Multimodal pre training model mPLUG
  • Structured pre training model StructuralLM
  • Pre training dialogue model SPACE
  • Form pre training model STAR etc.

These models have their own expertise ,StructBERT、mPLUG and StructuralLM Have the ability to mine text 、 Images 、 form “ structure ” The power of information , Monolingual generation model PALM、 Multilingual generation model VECO、 Super large Chinese pre training model PLUG All generate tasks for language (NLG) born .

for example StructBERT, It's Dharma hall in Google BERT The optimization model based on the model , It can make machines better master human grammar 、 Understand natural language .

StructBERT Once launched , At that time GLUE On the benchmark SOTA(89.0 branch ), And will also SQuAD v1.1 Answer the question F1 Score pushed to 93.0 The new height of .

Another example is the multi language pre training model VECO, Won the international authoritative Multilingual list XTREME No. 1 , The result is far better than Meta And models of international giants such as Microsoft .

Multimodal pre training model mPLUG In visual Q & A (VQA) For the first time, the mission surpassed human results . Dialogue pre training model SPACE stay 10 Several dialogues were made on international lists and datasets SOTA.

be based on AliceMind technology , Dharma hall has successively captured 35 A champion , The level in some fields has been very close to the level of human understanding of language . also , This technology has been open source for developers all over the world .

as everyone knows , The development cost of large-scale pre training model is extremely high , Players usually focus on head technology enterprises , But the new model enabling paradigm , Make more small and medium-sized teams 、 Individual developers can also share the dividends of large models .


according to the understanding of , At present, the research in the field of phonetic semantics of Ali Da Mo academy has 300 More than 100 papers have been collected by the international summit , Relevant research has been applied to medical 、 Electric power 、 E-commerce and other fields .

before ,IDC Release 《2021H2 China AI Cloud Service Market Research Report 》 in , Alibaba has achieved the first place in the voice and semantic market .

The history and future of phonetic semantics

In the long river of AI development , Speech semantics is one of the earliest technologies , It is also the cornerstone of artificial intelligence .

Voice technology can be traced back to 1952 year , Bell Labs Davis Et al. Developed the world's first recognizable 10 An experimental system of English digit pronunciation Audry, Since then, the development of speech recognition has been kicked off .

Semantic technology can be traced back to 1947 year , At that time, British and American scientists jointly proposed the idea of using computers for automatic language translation , The birth of machine translation also means that it has opened the door to the development of semantics .

therefore , Let the machine “ hear ”、“ understand ” Human language , During that time , It has become a technological highland for academic and industrial circles to compete for development .

People from all walks of life have invested , It also gave birth to many in the industry “ Epic ” Products , For example, apple is 2011 Published in Siri, And then Amazon 、 Google 、 Microsoft, etc Alexa、Google Assistant、Cortana etc. .

On the other hand , The technology behind this has also produced revolutionary iterative changes , For example, in recent years Transformer、Bert Wait for the explosion of Technology , It has greatly promoted the development of speech semantic technology .

Behind this general trend , More importantly, phonetic semantics is already common people “ Readily available ” Technology .

Take Ali for example , The machine translation technology of Dharma hall is provided for domestic every day 200 Thousands of small and medium-sized businesses translate hundreds of millions of words , So that businesses who do not understand English and small languages can also sell domestic products to the world .

Such technology has also been applied to “ Buy tickets ” scene .

Mid year last year , Beijing Capital Airport and Daxing airport have opened voice ticketing services , Just open your mouth and say your destination , You can do it in 1.6 Quickly complete the station selection within seconds .

in fact , In the future, any hardware terminal can integrate language AI technology , Such application space is huge , This is exactly what scholars at home and abroad 、 The reason why technology giants have made efforts here .

Just like the vice president of China Computer Society 、 Founder and concurrently of Lanzhou Technology CEO As Zhou Ming commented :

Natural language technology is the core technology in the field of artificial intelligence , The rise of pre training models in the past few years has made a qualitative leap in this technical field , It also accelerates the process of AI from perceptual intelligence to cognitive intelligence . This series of breakthroughs will bring great value to all walks of life and even personal life , I'm glad to see that Chinese technology companies represented by Alibaba have entered the world's first tier in this field .”

Just like Gartner In this report :

Enterprises are developing large-scale language models , To provide a wider range of language services . Major cloud service providers are using their cloud infrastructure to develop proprietary language models . Smaller vendors are taking advantage of open source software 、 Data and machine learning models compete .

But looking at the development of phonetic semantics , One thing has never changed , That is its ideal goal —— Talk to the machine , It's like communicating with human beings .

Not long ago, Google researchers revealed “AI Have personality ” The incident caused heated discussion in the science and technology circle , Although Google has refuted rumors about it later , But the fact behind it is that AI Is gradually approaching mankind .

So in the future , How will voice semantic technology subvert people's lives , It's worth looking forward to .


「 Artificial intelligence 」、「 Smart car 」 Wechat community invites you to join !

Welcome to AI 、 Smart car partners join us , And AI Practitioners exchange 、 Compare notes , Don't miss the latest industry development & Technological progress .

ps. Please note your name when adding friends - company - Position oh ~

Focus on me here , Remember to mark the star ~

One key, three links 「 Share 」、「 give the thumbs-up 」 and 「 Looking at 」

The frontier of science and technology meets day by day ~


本文为[qubit ]所创,转载请带上原文链接,感谢