当前位置：网站首页>Liang Yuqi, founder of aitalk: the link between image and virtual reality

Liang Yuqi, founder of aitalk: the link between image and virtual reality

2022-07-29 05:37:00 【Elastic computing Bai Xiaosheng】

chart ：2022 Alibaba cloud visual computing private sharing meeting site

5 month 11 Japan , stay “2022 Alibaba cloud visual computing private sharing meeting ” On ,AiTalk Founder Liang Yuqi I have brought you the title 《 Humanoid intelligent interaction ： Mirror image is the link between virtual and reality 》 Share the theme of . The following is based on his speech .

Real life , We often go to the airport 、 Some physical robots are seen in public places such as subway stations , There are also many, such as Xiaodu 、 Little ice 、 Xiaoai and other voice robots . But there are many obstacles in the interaction between such robots and humans , Often reduced to a decoration .

The essence of digital human is the same as that of the above robots , It is a form of robot . However Digital man as the core of the meta universe , Still lack real-time communication and communication skills .

At present, there are mainly several types of digital people on the market ：

The first category ： Like Liu Yexi 、AYAYI Wait through the traditional CG Animation production 、 Partial virtual idol products . The production method requires the team to have strong creative planning ability , But it is difficult to surpass avatar and the meta cosmic world created by avatar , For movie entertainment only .

The second category ： By Baidu 、 Shang Tang 、 Produced by companies like Xiangxin , Basically adopt a unified technical architecture , Cloud rendering and streaming are adopted . But its concurrency will be greatly limited , And it is difficult to adapt to the application environment of the terminal , For example, weak network 、 No Internet or extremely noisy environment . Besides , Because you need extra GPU ECs and live streaming bandwidth , It will add a lot of extra costs .

The third category ： Little ice 、 Small types of voice interaction products . From a purely computational point of view ,AI Far more than humans . But when such voice products interact with people , Never smooth 、 Talk to people without obstacles . Add multilayer neural network 、 Use a better deep learning model , It is still unable to solve the problem that voice products do not conform to people's daily habits and logic when interacting with people .

Because of the above points , We focus our research and development on giving digital people the ability of zero barrier communication and interaction . A lot of people are asking for numbers ： Be able to communicate with people “ answer fluently ”.

The core technology focuses on the following points ：

1.STEP Algorithm

We independently developed STEP Algorithm , Its principle is very simple , Any intention expressed by anyone can be placed in a specific scene , Around a number of topics , Achieve the desired purpose through specific matters , It can solve AI Habits and logical problems when interacting with people .

For example, ask Siri Where can I have coffee , He will directly push the search results . But the logic of normal people's communication should be ：A Say you want coffee ,B Maybe tell him which coffee shops are downstairs , What kinds of coffee are there . therefore ,step The key point of the algorithm is to solve the logic problem .

2. Identify interferences

When interacting with voice products , If there are multiple users in front of the screen , No matter whether the user is communicating with AI Talk or talk to friends , It will record the sound , And respond , However, this is not in line with normal living habits . therefore , It is necessary to identify and eliminate interference in multi person dialogue .

At present, we have been able to achieve AI There's a multiplayer conversation in front of you , He can judge whether the current user is talking to himself , Need to respond .

3. Conversation interruption

There are proper nouns that don't understand , Or not interested in the content , Will interrupt the conversation . After interruption, we will consider whether to restore .

4. No arousal word

When interacting with most voice intelligence products on the market , Usually you need to shout “ well siri” Or press the key to wake up AI. And offline 4S shop 、 Bank outlets 、 Bus stops and other places are connected with AI Interaction time , Users often cannot remember all AI The awakening words of , Therefore, it is necessary to deal with non wake-up words to facilitate users to use .

Solved the problem of humanoid interaction , Give Way A.I. No longer mentally retarded . We also need to digitize people's image . Different from other manufacturers “ Cloud + Push flow ” Technical framework , We insist on client-side real-time rendering , Solved the delay problem . Yes CPU The consumption of does not exceed 10%, And it's compatible with iOS9.0/android4.4, It means seven or eight years ago 1000 Android phones of about yuan can also run this program , Can not rely on the network .

Adopt the method of cloud separation , The advantage is that , No matter where it is , You can think of it as a “ people ”. When communicating and interacting , Whether it's 3D Holography 、 Intelligent interactive screen , Or the future brain computer interface or chip implantation in the cerebral cortex , Can create the effect of face-to-face chat .

Image processing 、 Interactions with characters are handled on the end ; The cloud is mainly used to strengthen the ability of communication and interaction , More for thinking and reasoning , Mainly data processing and training ： It's like people have to be trained and educated , Continuous charging 、 Perfect yourself .

AiTalk There are three types of standardized products for export :

The first category ： Humanoid interactive software . It is divided into two kinds ： One is SDK, For example, mobile phones App Or smart home products , Completely standardized , And the cost is very low . Like mobile phone. APP Of license Authorization may only need to be less than 0.1 element ; The other is for bank outlets 、4S shop 、 Shang Chao 、 Provide a complete set of software in public places , Usually carry XR Hardware .

The second category ：XR Hardware . It has the ability of multimodal perception , You can communicate with it on a visual level 、 Interaction at multiple levels such as auditory level , And it can be applied to weak terminals / No net and noise treatment .

The third category ： Supporting application services , It can handle interactive processes ,AI When interacting with people , Not just a simple chat , Instead, it can help enterprises or users complete relevant transactions and processes . Such as industrial and commercial registration , Digital people will push relevant processes , And assist in the handling of some processes , There are also VR/AR Application of , Enhance user immersive interaction .

Compared with other friends ,AiTalk Pay more attention to the exploration of humanoid interaction , The advantage lies in the following two aspects ：

1. Client real-time rendering . We are the only company that uses edge computing , There are no concurrency restrictions , And there's no delay , It can allow users of line 3456789 to use relevant products at zero cost . The accuracy of the model can reach 150 Ten thousand faces , Cost reduction 90% above .

2. Virtual digital human communication skills . The ability of communication and interaction is the soul of digital people , We use a lot of bionic technology , Let virtual digital people have the ability of zero barrier communication and interaction with people , No longer mentally retarded . Only by reaching this point , Digital person /AI In order to really enter the commercial application on a large scale .

That's all I share , Special thanks to Alibaba cloud for its invitation and the strong support of its partners for a long time , Thank you. .

Click on here , View the playback video of this private visual computing meeting .

原网站

版权声明
本文为[Elastic computing Bai Xiaosheng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290508035411.html

当前位置：网站首页>Liang Yuqi, founder of aitalk: the link between image and virtual reality

Liang Yuqi, founder of aitalk: the link between image and virtual reality

边栏推荐

猜你喜欢

随机推荐