当前位置:网站首页>The newly born robot dog can walk by himself after rolling for an hour. The latest achievement of Wu Enda's eldest disciple
The newly born robot dog can walk by himself after rolling for an hour. The latest achievement of Wu Enda's eldest disciple
2022-07-01 15:39:00 【QbitAl】
bright and quick From the Aofei temple
qubits | official account QbitAI
Now? , Let the mechanical dog roll by itself An hour , It can learn to walk !

The gait looks quite modular :

Can also carry a big stick of the a mad hate :

Even if I fell in all directions , Turn over and stand up again :

So it seems , Training mechanical dogs is no different from training ordinary dogs .

This is it. UC The latest results from Berkeley University , Let the robot train and learn directly in the actual environment , No longer dependent on Simulator .
Apply this method , The researchers trained in a short time 4 A robot .

For example, what I saw at the beginning 1 A robot dog who learned to walk when he was young ;
also 2 Mechanical arms , stay 8-10 After hours of actual combat capture , Performance is close to Human level ;

And a small robot with computer vision , Groping by oneself 2 Hours later, , Can scroll smoothly to the specified position .

The study was conducted by Pieter Abbeel And so on ,Pieter Abbeel He was the first doctoral student of Wu Enda , Not long ago, he just got 2021 ACM Calculation Award (ACM Prize in Computing).
at present , All of the software infrastructure for this approach is open source .
A place called “ Dreamer ” The algorithm of
The of this method pipeline It can be roughly divided into 4 Step :

First step , First, put the robot in the real environment , collecting data .
The second step , Transfer the data to Replay Buffer. This step is to use historical data for training 、“ To sum up your experience ”, Use the collected samples efficiently .
The third step ,World Model Will learn from existing experience , then “ Cerebral repair ” Out of strategy .
Step four , And then actors and critics (Actor Critic) Algorithm to improve the performance of the strategy gradient method .
And then go back and forth , Apply the refined method to the robot , In the end, there is a “ Explore and learn by yourself ” The feeling of .
The specific term , The core link here is World Model.
World Models yes 2018 Year by year DAVID HA A kind of Fast unsupervised learning , To obtain the NIPS 2018 Of Oral Presentation.
Its core idea is that human beings are based on existing experience , Form a mental world model , Our decisions and actions are based on this internal model .
For example, when people play baseball , The speed of response is much faster than the visual information conveyed to the brain , Then the reason why the ball can be returned correctly in this case , Because the brain has made instinctive predictions .

before , be based on World Model such “ Cerebral repair ” Learning methods of , Google put forward Dreamer This scalable reinforcement learning approach .
The method proposed this time is based on this , be called DayDreamer.
( It seems that he can be called a visionary ?
)

The specific term ,World Model It is an agent model .
It includes a visual perception component , The image can be compressed into a low dimensional representation vector as model input .
There is also a memory component , Can be based on historical information , Predict the future representation vector .
Last , It also includes a decision component , It can be based on visual perception components 、 The representation vector of the decision component , Decide what action to take .

Now? , Let's go back to this UC The method proposed by Berkeley scholars .
It's not hard to find out , among World Model Learning Part of the logic is a process of experience accumulation ,Behavior Learning Part is a process of action output .

The method of this paper , It mainly solves the problems in robot training Two aspects The problem of :
Efficiency and accuracy .
Generally speaking , The conventional method of training robots is reinforcement learning , Adjust the operation of the robot through repeated experiments .
But this approach often requires Very large Test of , In order to achieve good results .
Not only is it inefficient , And the cost of training is not low .
later , Many people have proposed to train robots in simulators , It can increase efficiency and reduce cost .
But the author believes that , The simulator training method is accuracy The performance is still not good enough , Only the real environment can make the robot achieve the best effect .
From the results , In the process of training robot dogs , Only the flower 10 minute Time , Robot dogs can adapt to their own behavior .
and SAC Compare the methods , The effect has been significantly improved .

During the training of the manipulator , This new method also overcomes the challenges of visual location and sparse reward , The training results in a few hours are obviously better than other methods .

Research team
It is worth mentioning that , Members of the research team who brought new results this time , It is also very eye-catching .
among ,Pieter Abbeel He is wuenda's first disciple .

He is now UC Berkeley professor of electrical engineering and Computer Science , Director of Berkeley robotics learning laboratory , Berkeley AI Co director of the Institute , Once joined OpenAI.
Not long ago , He also got 2021 ACM Calculation Award (ACM Prize in Computing), In recognition of his contribution to robot learning .
meanwhile , He is still AI Robot company Covariant Co-founder of .

another Ken Goldberg, It's also AI Top experts in the field .

He is now UC Berkeley professor of Engineering , His research direction is reinforcement learning 、 Human computer interaction, etc .
2005 year , He was voted IEEE academician .
meanwhile ,Goldberg And an artist , yes UC Berkeley art 、 Founder of the Symposium on science, technology and culture .
Besides ,Philipp Wu、Alejandro Escontrela、Danijar Hafner Three people work together .
among Philipp Wu It's just UC A senior in Berkeley .
One More Thing
While watching the robot dog training video , We found that the researchers used Unitree Mechanical dog ,

This brand comes from Yushu technology, a Chinese enterprise , The Mavericks who have been on the Spring Festival Gala before , Also from his home .

and , Recently, Yushu robot dog conducted a collective Go1 Test video exposure , It is also popular abroad .
Address of thesis :
https://danijar.com/project/daydreamer/
Reference link :
https://worldmodels.github.io/
边栏推荐
- An intrusion detection model
- Gaussdb (for MySQL):partial result cache, which accelerates the operator by caching intermediate results
- The last picture is seamlessly connected with the first picture in the swiper rotation picture
- 【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(三)
- [target tracking] | template update time context information (updatenet) "learning the model update for Siamese trackers"
- Create employee data in SAP s/4hana by importing CSV
- [video memory optimization] deep learning video memory optimization method
- Qt+pcl Chapter 6 point cloud registration ICP Series 2
- Redis seckill demo
- Using swiper to make mobile phone rotation map
猜你喜欢

Pico,能否拯救消费级VR?

软件测试的可持续发展,必须要学会敲代码?

【显存优化】深度学习显存优化方法
![[stm32-usb-msc problem help] stm32f411ceu6 (Weact) +w25q64+usb-msc flash uses SPI2 to read out only 520kb](/img/ec/fa51b21468708609f998de1b2b84fe.jpg)
[stm32-usb-msc problem help] stm32f411ceu6 (Weact) +w25q64+usb-msc flash uses SPI2 to read out only 520kb

Qt+pcl Chapter 6 point cloud registration ICP Series 2

MySQL审计插件介绍

《QT+PCL第九章》点云重建系列2

厦门灌口镇田头村特色农产品 甜头村特色农产品蚂蚁新村7.1答案

Deep operator overloading (2)

Photoshop插件-HDR(二)-脚本开发-PS插件
随机推荐
VIM from dislike to dependence (22) -- automatic completion
【ROS进阶篇】第五讲 ROS中的TF坐标变换
Recommendation of data acquisition tools and detailed graphic process of data acquisition list
GaussDB(for MySQL) :Partial Result Cache,通过缓存中间结果对算子进行加速
What are the test items of juicer ul982
Flink 系例 之 TableAPI & SQL 与 Kafka 消息插入
TS reports an error don't use 'object' as a type The `object` type is currently hard to use
Reading notes of top performance version 2 (V) -- file system monitoring
[video memory optimization] deep learning video memory optimization method
Photoshop插件-HDR(二)-脚本开发-PS插件
Zhang Chi Consulting: lead lithium battery into six sigma consulting to reduce battery capacity attenuation
《性能之巅第2版》阅读笔记(五)--file-system监测
如何写出好代码 - 防御式编程指南
Implementation of deploying redis sentry in k8s
Intelligent operation and maintenance practice: banking business process and single transaction tracking
她就是那个「别人家的HR」|ONES 人物
Qt+pcl Chapter 6 point cloud registration ICP Series 5
TensorFlow团队:我们没被抛弃
采集数据工具推荐,以及采集数据列表详细图解流程
微信小程序03-文字一左一右显示,行内块元素居中