当前位置:网站首页>Ut2017 learning notes
Ut2017 learning notes
2022-07-03 10:31:00 【IFI_ rccsim】
1. Domain description
Compared with the previous years ,2017 year RoboCup 3D The main change of the simulated League is to cancel Congestion rules .
before , Too many players crowded around the ball will cause players to be sent off to stand on the sideline . The implementation of congestion rules is mainly to reduce the number of collisions between robots , Because multiple collisions at the same time will reduce the speed of the simulator , And may cause it to collapse .
The existing touch rule is , If 3 Name or 3 More than players touch each other , A player will be brought to the sideline ,2016 Increased fouls in , Punish the players who hit the opponent , So it was decided that congestion rules were no longer needed .
2. Technical challenges
For the fourth consecutive year, there is a comprehensive technical challenge , There are three different league challenges : Freedom challenges , Passing and scoring challenges ,Gazebo Running challenge . For every league challenge , A team that participated in the challenge was awarded integral points based on the following equation :
2.1 Freedom challenges
UT The free challenge submission introduced the team's fast walking kick in the first 3 Section discusses . Besides ,UT The submission of the free challenge revealed the preliminary work of expressing the kick strategy as a neural network , And use deep learning and trust region strategy optimization algorithm to learn longer kicks .
The team provided details of the optimization framework they created ,magmaOffenburg The team talked about what they used to test the team's strategy layer 2D Simulator ,AIUT3D Team oriented 3D Simulation League agent introduces a motion editor .
2.2 Passing and scoring challenges
In the process of passing and scoring , Four players of a team try to pass between them , So each player touches the ball at least once - Then score in as little time as possible . At the beginning of the challenge , The ball is placed in the center of the field , The agent must start at a distance of at least three meters , Along X Axis . The initial position of the agent does not conform to the rules , Team rewards 85 branch .
At the end of the challenge , A goal , The ball leaves the court , or 80 Seconds have passed . Play for every different player , Judge that the ball moves freely at least 2.5 After meters, I was kicked , Deduct one point from the score . If goal is scored , Points will be deducted 1 branch . If the goal is scored after the ball is kicked by all four players , Then the score is the time from the beginning of the trial to the scoring event ( In seconds ).
The goal of the challenge is to get the lowest possible score .
UT The starting position and strategy for passing and scoring challenges are shown in the figure 3 Shown . No matter which agent is closest to the ball , Pass the ball to a position about one meter in front of the agent farthest from the target , Pictured 3 The yellow arrow in . Once the ball passes forward in turn between agents , And the agent closest to the target receives the ball , The agent will play in the target , Pictured 3 Medium pink arrow . When the agent is not the closest agent to the ball , They just stood there .
2.3 Gazebo Running challenge
RoboCup What the community is doing is developing plug-ins 6 be used for Gazebo Robot simulator to support RoboCup 3D Simulation alliance . therefore , Held a challenge , The robot tries to walk forward as fast as possible in Gazebo Keep in the simulator 20 Seconds without falling .
UT Austin Villa The fast walking parameters are optimized for Gazebo The omnidirectional walking engine of the team in the simulator , Use CMA-ES Algorithm . Walking engine parameter needle 300 generation CMA-ES, The population size is 150.Gazebo The results of the running challenge are shown in table 7. Each team participated in 4 Run attempts , Average score walking speed in their three best attempts .
- Other supplements
Biped walking optimization based on agent task
* Target task and proxy task
* Sometimes, in reinforcement learning, the overhead of sampling is too large , At this point, you can optimize the target task by running another task , The actually running task is called agent task
* The method introduced in this paper achieves two goals simultaneously in the process of learning through agent tasks : First, optimize the target tasks , The second is to make the agent task more and more reflect the target task
* The target task in this article is to Robocup3D Walking in the race (`SoccerGameplay`), The agent task is to walk on a specially designed road (`ObstacleCourse`), To optimize 25 Parameters
* Specific task introduction
* `SoccerGameplay`: Play a game with a benchmark team
* Definition reward function $reward_{SoccerGameplay}(goalsFor−goalsAgainst)∗ \frac{1}
{2} FieldLength
+avgBallXPosition$
* `ObstacleCourse`: Walking training of a robot
* WAYPOINT: Walk to the target and time it
* STOP: stop it
* $reward_{WAYPOINT}=d_{target}\frac{t_{total}}{t_{taken}}
−Fall$
* $reward_{STOP}=−d_{moved}−Fall$
* Fall by 5( Fall down ) or 0( No fall )
* optimization algorithm
* Every time N Perform the following operations once per cycle :
* call generateNewBasisTasks() Generate a new batch `ObstacleCourse` Task joining B in , send B Double the number
* Use parameter group set P Run each set of parameters in `SoccerGameplay` And every one of them `ObstacleCourse` Mission
* call rankBasisTasks() Calculate each `ObstacleCourse` The tasks and `SoccerGameplay` Task similarity ( That is, the evaluation value of each group of parameters Spearman The correlation coefficient ), Eliminate the half with lower similarity
* Use `SoccerGameplay` Task evaluation update P Parameter group in
* The rest of each cycle uses the parameter group set P Run each group of parameters in the task set B And update the parameter group P
* Parameter group P Initialization and update of CMA-ES Algorithm
* result
* Accelerated training , But yes `ObstacleCourse` The update effect of is not great
边栏推荐
- [C question set] of Ⅵ
- 权重衰退(PyTorch)
- Label Semantic Aware Pre-training for Few-shot Text Classification
- Ut2013 learning notes
- What did I read in order to understand the to do list
- 20220602 Mathematics: Excel table column serial number
- 4.1 Temporal Differential of one step
- 2018 y7000 upgrade hard disk + migrate and upgrade black apple
- Hands on deep learning pytorch version exercise solution - 2.5 automatic differentiation
- Leetcode刷题---283
猜你喜欢
丢弃法Dropout(Pytorch)
神经网络入门之模型选择(PyTorch)
[LZY learning notes -dive into deep learning] math preparation 2.1-2.4
Policy gradient Method of Deep Reinforcement learning (Part One)
Leetcode - 5 longest palindrome substring
GAOFAN Weibo app
Simple real-time gesture recognition based on OpenCV (including code)
Configure opencv in QT Creator
一步教你溯源【钓鱼邮件】的IP地址
openCV+dlib實現給蒙娜麗莎換臉
随机推荐
Hands on deep learning pytorch version exercise solution - 2.4 calculus
Leetcode刷题---278
20220606数学:分数到小数
20220608 other: evaluation of inverse Polish expression
Leetcode刷题---374
Hands on deep learning pytorch version exercise solution - 2.3 linear algebra
Leetcode-112:路径总和
3.1 Monte Carlo Methods & case study: Blackjack of on-Policy Evaluation
Out of the box high color background system
Codeup: word replacement
Leetcode - 1172 plate stack (Design - list + small top pile + stack))
权重衰退(PyTorch)
Leetcode刷题---202
Configure opencv in QT Creator
A super cool background permission management system
SQL Server Management Studio cannot be opened
Pytorch ADDA code learning notes
20220531数学:快乐数
【毕业季】图匮于丰,防俭于逸;治不忘乱,安不忘危。
2.1 Dynamic programming and case study: Jack‘s car rental