当前位置:网站首页>Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)
Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)
2022-07-23 19:43:00 【Master Ma】
2020 year 9 month 25-26 Japan ,2020 The salon for young scientists of the China Science and Technology Summit Series of activities will usher in a new phase —“ AI academic ecology and industrial innovation ”. This activity is sponsored by China Association for science and Technology , Department of computer science, Tsinghua University 、AI TIME、 Wisdom spectrum ·AI undertake ; Complete video report of the Conference , Please be there. B Focus on “AI Time On the way ”, Or click below “ Read the original ”.
9 month 25 The morning of , The conference invited the chief architect of Kaike Technology , Mr. zhaokaiyong, vice president of R & D, made a project named 《 Robot decision system based on self-learning 》 Keynote speech of .
In his speech, , Mr. Zhao kaiyong mainly introduced that daki technology accelerates robot learning through cloud platform , And formed a set of traditional methods + Human experience + Methods of reinforcement learning .
Zhao kaiyong , Doctor , robot , Artificial intelligence , Senior practitioners in the field of high-performance computing , With many years of scientific and Technological Development , Team management and industry development experience, M & A experience . The current CloudMinds chief architect , Vice president of research and development , Responsible for leading AI and Navigation Department , Previously, he was the head of Dajiang Internet business department , Responsible for the company's Internet services and 3D The overall strategy for the application of Surveying and mapping industry . Dr. Zhao has long been engaged in high-performance computing
One 、 Problems faced by robot control 
Dahui technology is Huang Xiaoqing, former president of China Mobile Research Institute 2015 A company founded in , It is a cloud intelligent robot operator , Mainly engaged in cloud intelligent robot operation level security cloud computing network 、 Large scale hybrid artificial intelligence machine learning platform 、 And the research of safety intelligent terminal and robot controller technology .
Take a look at their main products , Including service robots in the cloud 、 Cloud security robot 、 Cleaning robots 、 Life robot and cloud access control , These terminal devices pass through the safe and high-speed optical fiber network (VBN), With the cloud robot operating system HARIX Connect . Many practical problems have been encountered in this process , During the research process, many academic related methods and research will be brought to the robot application development , This paper introduces the practical problems and solutions encountered in robot application or development .

The above figure shows several topics of the report , The first part explains the problems faced by robots in traditional control methods , Our service robot is a humanoid robot , How does this robot learn to move , The general traditional way is to plan the trajectory of each action of the robot , And code , This leads to the need to reprogram every time a new action is added .. The second part is to gradually improve the learning ability of the robot system , When a robot wants to learn a new action , You don't have to reprogram , Robots can learn new actions through machine learning , Gradually improve their learning 、 The ability to make decisions .. The third is to build a simulation platform , Digital twin platform . stay 20 When I was a robot years ago , It is not convenient to do robot training or learning on the simulation platform , Due to the improvement of computing power in recent years , It is easy to build a complete robot kinematics in the cloud or in a virtual environment 、 Dynamics and control system , In this way, a lot of training can be carried out in the simulation platform , Instead of having to build robot hardware and then do development . With such a simulation environment , The next step is to consider how to add traditional control methods and some existing biological experience to the simulation platform , Form a set of self-learning system . I will give a few examples later , One is how humanoid robots learn to dance , How do robots grasp , And gait learning of quadruped robot dog .

In the above two pictures , The left picture shows the robot learning to dance with Jasmine Music . At the beginning of the choreography, I specially asked the teacher of the Dance Academy for help , But how to make robot action softer 、 More anthropomorphic , Is a very challenging problem . The right picture shows the process of robot grasping . It can be seen that the grasping action of the service robot is quite different from that of the industrial robot in the factory , Because service robots need to work in unstructured spaces such as daily life , Therefore, the types of items grabbed 、 size 、 weight 、 The location cannot be determined , Moreover, obstacles may be encountered in the grasp planning to avoid obstacles , The process of grabbing is also a relatively complex process .

The picture on the left is quadruped robot dog , Gait planning of quadruped robot is an unsolved problem at present , The gait generated by current traditional methods is very different from that of real quadrupeds , And most of the traditional methods do not consider the difference of gait in different situations , Even now, different environments can be simulated in simulation , However, traditional methods still cannot generate flexible gait planning . The right picture shows the obstacle avoidance of robots in the community , The environment in the community is very complex , In the laboratory, there will be no children around the robot , Some children may cover the camera , Even laser radar , Or climb on the robot , These practical problems have tested the stability of robot planning and decision-making system . Generally speaking, it is similar to the action of the robot dog in front 、 Grab 、 Dancing has similar content , We abstract these processes , Put all the control processes , Defined as robot decision . Using bionic or reinforcement learning methods combined with traditional methods to realize robot decision control in the simulation environment .
Two 、 Robot decision system

Traditional motor control , Including current loop , Speed loop , Position loop . This is a mature process , I won't talk about this today . We define the control of each joint as the control of the base layer , With the control of the foundation layer , Multiple joints are combined to form some coordinated control . Through the previous videos , We can see , Traditional joint control , combined , It's multi joint linkage , It is generally two-dimensional or three-dimensional path planning or gait planning . We abstract the process of combination into a robot decision-making process , Basic action decision . Just like the balance decision of our cerebellum , Not just a simple planning problem .
3、 ... and 、 Number twin 
Inside the company , With the help of high-performance hardware and the improvement of Computing , A robot training simulation platform is constructed , Including cloud management and storage , It also contains AI Training platform . With the help of bionics principles and human and animal motion data , Then combine imitation learning 、 Strengthen learning, etc AI Algorithm , Thus, a set of basic action learning library is constructed . In the simulation platform , We model the joints of each robot in a way that is close to the real physical model , Robot training can be carried out in the simulation environment . In the simulation platform , It can also be controlled according to the requirements , Modify the parameters of hardware joints , Finally, put forward the requirements for the real production joints . This process can provide good help for the design of hardware .

This is an open platform for intelligent robots in the cloud , It has been applied in some universities . This platform is equivalent to a physical real robot , At the same time, there will be a digital twin system close to the real one , Simulate each robot . From the cloud , There will be a set first 3D Semantic environment of , Build a set of usage scenarios for robots , At the same time, put the robot model into this environment . At the same time, put the existing knowledge base or traditional movement skills into this system , Then develop the movement according to the requirements of training . Then according to 3 and 4 The collected data will be processed in a large amount AI Training . This is equivalent to using traditional experience and human experience to build a limited space and then go through AI Bionic learning and reinforcement learning methods for higher-level space search , It is divided into several layers for different cooperation and training . Is similar to alphago In the process of , Train through human chess scores , There is a foundation for training, and then we can fight left and right , More space and retrograde search .
Four 、 Robot control 
Summarize some past studies , You can see traditional control methods such as RRT、DMP etc. , It defines a control domain , But if we combine bionic learning and reinforcement learning , It is equivalent to searching for the optimal solution in a larger range or higher dimensional space .
The above figure is a schematic diagram of the cooperation training between a real robot and a digital twin robot in a virtual environment . For example, sensing world information through real robot sensors , Three dimensional reconstruction can be carried out in the virtual environment , Re pass AI Reasoning and decision making , Can produce behavior , And try to evaluate the action in the virtual environment , Download it to the real robot for execution after it is accurate .
If you want to learn a new action, you will first pass a video , In the video, the staff makes an action , Generate actions after real-time recognition according to each action . We caught a lot of videos from Tiktok , adopt 2D Video get 3D The attitude of the , These gestures are mapped into the robot joints . Of course, the mapping in this is not a simple mapping , If it is a simple mapping, there will be problems , For example, the joints and movements of the robot platform may not be consistent with those of the dancer , There may also be a collision . If you want to make the action generated by the robot more beautiful , More anthropomorphic , We need to learn from these data , And then generate data-driven behavior , In the process , The robot will generate actions as similar as possible according to its own structural characteristics and physical constraints , Make the robot dance as close to nature as possible .

The second scenario is crawling . On the left is the real scene , On the right is a virtual scene . In order to generate a more anthropomorphic grab action , First, people need to wear motion capture devices to record data , Combine in the simulation platform AI Do a lot of training , It can form a set of robot grasping knowledge base , This also avoids the new capture action to collect data again .

The picture above shows the robot dog , Use in the simulation environment MIT Model for control , For example, forward and backward , You may have seen it on the Internet .

The realization of this robot action is to combine the traditional control mode with deep learning 、 Bionics combined , Equivalent to the existing traditional search space , At the same time, some machine learning methods such as reinforcement learning are used to search for a larger space . Because traditional methods usually need modeling , Therefore, the control effect is often affected by the simplification of modeling , When combined with reinforcement learning, we can get a broader search space .

By comparing the two sides , It can be seen that bionic training is end-to-end training , There's no need for complicated design , But it's not flexible enough , Can't land now . The traditional method is more flexible , The robustness is also relatively strong , But action is not the best action , Just say every action , For example, when walking , The gait planning of quadruped robot is quite different from that of real animals . Combine the two , Energy consumption can be reduced , More stable .

This is the energy consumption curve of quadrupeds during walking . In the traditional control mode of robot , Each gait is a separate state , You can only instantly switch from one gait to another , But real animals are quite different . Recently, Google has a paper , It is carrying out such AI When training or searching , A large amount of data is captured online or collected externally . In fact, in this process, we are also aware of this problem , Because people or nature already have a lot of data , We need to combine such data , Form a data-driven robot action training method , You don't need to train an action completely from scratch , Especially for quadruped gait robot or robot grasping , Because there are already a lot of experience values . After using these empirical values , By defining some constraints and boundary conditions on the data , Search in a limited space , And achieve the desired effect faster , The energy consumed by these actions can be minimized .

This is the time to do different gait training for robot dogs , Train multiple robots , Add different parameters, such as different forces , Different circumstances .

This is a large-scale scene training , There are different states . Because this is a distributed platform , So the speed can be done very fast . The key point here is , We will get a limited search space with the help of traditional methods , At the same time, another search space is obtained by using empirical values , Combined with reinforcement learning AI Training can combine the two in a wider range to find the best . It's kind of like AlphaGO When learning chess score, first learn some information with human chess score , Of course, the learning here will be more artificially controlled , Put people's experience value into this .

This is the open platform of daki , It has been used in Colleges and universities . After this training platform and the whole training process are put online , More people will use this open platform , You can train your robot on this , Even build a robot system by yourself , You can put it on this to get the actual effect you want . Now when we design robots in-house , It has been different from the traditional way of designing robots , We will first design the characteristic requirements of the robot on the simulation platform . This is based on the current strong computing power , Get rid of the shackles of physical robots , So I will do training in the simulation platform first . Requirements for structure , Requirements of each link , Requirements of each joint , For example, quadruped robots can first carry out some gait training in it , Ask for hardware after training , This is also the purpose of our open platform .

边栏推荐
- 行业分析| 物流对讲
- Educational codeforces round 132 (rated for Div. 2) [competition record]
- 二、MFC窗口和消息
- SecureCRT乱码问题解决方法[通俗易懂]
- 华为云HCS解决方案笔记HUAWEI CLOUD Stack【面试篇】
- Eight common SQL misuses in MySQL
- R语言data.table包进行数据分组聚合统计变换(Aggregating transforms)、计算dataframe数据的分组最小值(min)
- 三维点云课程(六)——三维目标检测
- R语言使用tidyr包的gather函数将从宽表转化为长表(宽表转化为长表)、第一个参数指定原多个数据列名称生成的新数据列名称、第二个参数指定原表内容值、第三个和第四个参数指定不变的列名称列表
- R语言筛选dataframe指定的数据列、R语言排除(删除)dataframe中的指定数据列(变量)
猜你喜欢

C语言的查漏补缺(1)

According to the e-commerce written on the resume, how does redis realize inventory deduction and prevent oversold?

Analyse de l'industrie | interphone logistique

PowerCLi 管理VMware vCenter 一键批量部署OVF

固态硬盘的工作原理揭秘

【面试:并发篇22多线程:ReentrantLock】

Challenges of decentralized storage

Weights & biases (I)

小熊拍学习之LED灯的点亮

Lendingclub loan status business details current, charge off, issued, full paid, grace period
随机推荐
[machine learning] Wu Enda: lifelong learning
MySQL读写分离的三种实现方案
总结一些最近见到的 TRICK
项目实战第九讲--运营导入导出工具
R language mapping: coordinate axis setting
Understand chisel language. 21. Chisel sequential circuit (I) -- detailed explanation of chisel register
socat 使用「建议收藏」
MySQL中 8 种常见的 SQL 错误用法
Analyse de l'industrie | interphone logistique
[Nuxt 3] (九)服务器路由
Exch:POP3 和 IMAP4 操作指南
When does MySQL use table locks and row locks?
二叉树高度 [log2n]+1与log2(n+1)是否相等
移动语义和完美转发浅析
Latex(katex)csdn 希腊字母表示,数学符号,集合符号,特殊标记
PowerCLi 添加esxi主机到vCenter
Weights & biases (I)
PC性能监测工具,软件测试人员不可或缺的好帮手
[C language] program environment and preprocessing
【leetcode天梯】链表 · 206 反转链表