当前位置:网站首页>Liang Yuqi, founder of aitalk: the link between image and virtual reality
Liang Yuqi, founder of aitalk: the link between image and virtual reality
2022-07-29 05:37:00 【Elastic computing Bai Xiaosheng】
chart :2022 Alibaba cloud visual computing private sharing meeting site
5 month 11 Japan , stay “2022 Alibaba cloud visual computing private sharing meeting ” On ,AiTalk Founder Liang Yuqi I have brought you the title 《 Humanoid intelligent interaction : Mirror image is the link between virtual and reality 》 Share the theme of . The following is based on his speech .

Real life , We often go to the airport 、 Some physical robots are seen in public places such as subway stations , There are also many, such as Xiaodu 、 Little ice 、 Xiaoai and other voice robots . But there are many obstacles in the interaction between such robots and humans , Often reduced to a decoration .

The essence of digital human is the same as that of the above robots , It is a form of robot . However Digital man as the core of the meta universe , Still lack real-time communication and communication skills .
At present, there are mainly several types of digital people on the market :
The first category : Like Liu Yexi 、AYAYI Wait through the traditional CG Animation production 、 Partial virtual idol products . The production method requires the team to have strong creative planning ability , But it is difficult to surpass avatar and the meta cosmic world created by avatar , For movie entertainment only .
The second category : By Baidu 、 Shang Tang 、 Produced by companies like Xiangxin , Basically adopt a unified technical architecture , Cloud rendering and streaming are adopted . But its concurrency will be greatly limited , And it is difficult to adapt to the application environment of the terminal , For example, weak network 、 No Internet or extremely noisy environment . Besides , Because you need extra GPU ECs and live streaming bandwidth , It will add a lot of extra costs .
The third category : Little ice 、 Small types of voice interaction products . From a purely computational point of view ,AI Far more than humans . But when such voice products interact with people , Never smooth 、 Talk to people without obstacles . Add multilayer neural network 、 Use a better deep learning model , It is still unable to solve the problem that voice products do not conform to people's daily habits and logic when interacting with people .

Because of the above points , We focus our research and development on giving digital people the ability of zero barrier communication and interaction . A lot of people are asking for numbers : Be able to communicate with people “ answer fluently ”.
The core technology focuses on the following points :
1.STEP Algorithm
We independently developed STEP Algorithm , Its principle is very simple , Any intention expressed by anyone can be placed in a specific scene , Around a number of topics , Achieve the desired purpose through specific matters , It can solve AI Habits and logical problems when interacting with people .
For example, ask Siri Where can I have coffee , He will directly push the search results . But the logic of normal people's communication should be :A Say you want coffee ,B Maybe tell him which coffee shops are downstairs , What kinds of coffee are there . therefore ,step The key point of the algorithm is to solve the logic problem .
2. Identify interferences
When interacting with voice products , If there are multiple users in front of the screen , No matter whether the user is communicating with AI Talk or talk to friends , It will record the sound , And respond , However, this is not in line with normal living habits . therefore , It is necessary to identify and eliminate interference in multi person dialogue .
At present, we have been able to achieve AI There's a multiplayer conversation in front of you , He can judge whether the current user is talking to himself , Need to respond .
3. Conversation interruption
There are proper nouns that don't understand , Or not interested in the content , Will interrupt the conversation . After interruption, we will consider whether to restore .
4. No arousal word
When interacting with most voice intelligence products on the market , Usually you need to shout “ well siri” Or press the key to wake up AI. And offline 4S shop 、 Bank outlets 、 Bus stops and other places are connected with AI Interaction time , Users often cannot remember all AI The awakening words of , Therefore, it is necessary to deal with non wake-up words to facilitate users to use .

Solved the problem of humanoid interaction , Give Way A.I. No longer mentally retarded . We also need to digitize people's image . Different from other manufacturers “ Cloud + Push flow ” Technical framework , We insist on client-side real-time rendering , Solved the delay problem . Yes CPU The consumption of does not exceed 10%, And it's compatible with iOS9.0/android4.4, It means seven or eight years ago 1000 Android phones of about yuan can also run this program , Can not rely on the network .

Adopt the method of cloud separation , The advantage is that , No matter where it is , You can think of it as a “ people ”. When communicating and interacting , Whether it's 3D Holography 、 Intelligent interactive screen , Or the future brain computer interface or chip implantation in the cerebral cortex , Can create the effect of face-to-face chat .
Image processing 、 Interactions with characters are handled on the end ; The cloud is mainly used to strengthen the ability of communication and interaction , More for thinking and reasoning , Mainly data processing and training : It's like people have to be trained and educated , Continuous charging 、 Perfect yourself .

AiTalk There are three types of standardized products for export :
The first category : Humanoid interactive software . It is divided into two kinds : One is SDK, For example, mobile phones App Or smart home products , Completely standardized , And the cost is very low . Like mobile phone. APP Of license Authorization may only need to be less than 0.1 element ; The other is for bank outlets 、4S shop 、 Shang Chao 、 Provide a complete set of software in public places , Usually carry XR Hardware .
The second category :XR Hardware . It has the ability of multimodal perception , You can communicate with it on a visual level 、 Interaction at multiple levels such as auditory level , And it can be applied to weak terminals / No net and noise treatment .
The third category : Supporting application services , It can handle interactive processes ,AI When interacting with people , Not just a simple chat , Instead, it can help enterprises or users complete relevant transactions and processes . Such as industrial and commercial registration , Digital people will push relevant processes , And assist in the handling of some processes , There are also VR/AR Application of , Enhance user immersive interaction .

Compared with other friends ,AiTalk Pay more attention to the exploration of humanoid interaction , The advantage lies in the following two aspects :
1. Client real-time rendering . We are the only company that uses edge computing , There are no concurrency restrictions , And there's no delay , It can allow users of line 3456789 to use relevant products at zero cost . The accuracy of the model can reach 150 Ten thousand faces , Cost reduction 90% above .
2. Virtual digital human communication skills . The ability of communication and interaction is the soul of digital people , We use a lot of bionic technology , Let virtual digital people have the ability of zero barrier communication and interaction with people , No longer mentally retarded . Only by reaching this point , Digital person /AI In order to really enter the commercial application on a large scale .
That's all I share , Special thanks to Alibaba cloud for its invitation and the strong support of its partners for a long time , Thank you. .
Click on here , View the playback video of this private visual computing meeting .
边栏推荐
猜你喜欢
[C language series] - print prime numbers between 100 and 200
[event preview] cloud digital factory and digital transformation and innovation forum for small and medium-sized enterprises
[C language series] - storage of deep anatomical data in memory (II) - floating point type
[C language series] - a recursive topic
Day 2
On Paradigm
牛客网编程题—【WY22 Fibonacci数列】和【替换空格】详解
ClickHouse学习(七)表查询优化
Introduction to C language array to proficiency (array elaboration)
PyQt5:第一章第1节:使用Qt组件创建一个用户界面-介绍
随机推荐
shell基本操作(下)
Pointer
Side effects and sequence points
Clickhouse learning (V) cluster operation
Day 5
Longest string without duplicate characters
The function of using wechat applet to scan code to log in to the PC web of the system
On Paradigm
ClickHouse学习(七)表查询优化
Clickhouse learning (IV) SQL operation
省市区三级联动(简单又完美)
Day 3
【C语言系列】— 打印100~200之间的素数
【TypeScript】深入学习TypeScript函数
HCIA-R&S自用笔记(27)综合实验
B - 识别浮点常量问题
【C语言系列】— 不创造第三个变量,实现两个数的交换
Talking about Servlet
【C语言系列】—文件操作详解(上)
167. Sum of two numbers II - enter an ordered array