当前位置：网站首页>Ut2017 learning notes

Ut2017 learning notes

2022-07-03 10:31:00 【IFI_ rccsim】

1. Domain description

Compared with the previous years ,2017 year RoboCup 3D The main change of the simulated League is to cancel Congestion rules .

before , Too many players crowded around the ball will cause players to be sent off to stand on the sideline . The implementation of congestion rules is mainly to reduce the number of collisions between robots , Because multiple collisions at the same time will reduce the speed of the simulator , And may cause it to collapse .

The existing touch rule is , If 3 Name or 3 More than players touch each other , A player will be brought to the sideline ,2016 Increased fouls in , Punish the players who hit the opponent , So it was decided that congestion rules were no longer needed .

2. Technical challenges

For the fourth consecutive year, there is a comprehensive technical challenge , There are three different league challenges ： Freedom challenges , Passing and scoring challenges ,Gazebo Running challenge . For every league challenge , A team that participated in the challenge was awarded integral points based on the following equation ：

2.1 Freedom challenges

UT The free challenge submission introduced the team's fast walking kick in the first 3 Section discusses . Besides ,UT The submission of the free challenge revealed the preliminary work of expressing the kick strategy as a neural network , And use deep learning and trust region strategy optimization algorithm to learn longer kicks .

The team provided details of the optimization framework they created ,magmaOffenburg The team talked about what they used to test the team's strategy layer 2D Simulator ,AIUT3D Team oriented 3D Simulation League agent introduces a motion editor .

2.2 Passing and scoring challenges

In the process of passing and scoring , Four players of a team try to pass between them , So each player touches the ball at least once - Then score in as little time as possible . At the beginning of the challenge , The ball is placed in the center of the field , The agent must start at a distance of at least three meters , Along X Axis . The initial position of the agent does not conform to the rules , Team rewards 85 branch .

At the end of the challenge , A goal , The ball leaves the court , or 80 Seconds have passed . Play for every different player , Judge that the ball moves freely at least 2.5 After meters, I was kicked , Deduct one point from the score . If goal is scored , Points will be deducted 1 branch . If the goal is scored after the ball is kicked by all four players , Then the score is the time from the beginning of the trial to the scoring event （ In seconds ）.

The goal of the challenge is to get the lowest possible score .

UT The starting position and strategy for passing and scoring challenges are shown in the figure 3 Shown . No matter which agent is closest to the ball , Pass the ball to a position about one meter in front of the agent farthest from the target , Pictured 3 The yellow arrow in . Once the ball passes forward in turn between agents , And the agent closest to the target receives the ball , The agent will play in the target , Pictured 3 Medium pink arrow . When the agent is not the closest agent to the ball , They just stood there .

2.3 Gazebo Running challenge

RoboCup What the community is doing is developing plug-ins 6 be used for Gazebo Robot simulator to support RoboCup 3D Simulation alliance . therefore , Held a challenge , The robot tries to walk forward as fast as possible in Gazebo Keep in the simulator 20 Seconds without falling .

UT Austin Villa The fast walking parameters are optimized for Gazebo The omnidirectional walking engine of the team in the simulator , Use CMA-ES Algorithm . Walking engine parameter needle 300 generation CMA-ES, The population size is 150.Gazebo The results of the running challenge are shown in table 7. Each team participated in 4 Run attempts , Average score walking speed in their three best attempts .

Other supplements

Biped walking optimization based on agent task

* Target task and proxy task

* Sometimes, in reinforcement learning, the overhead of sampling is too large , At this point, you can optimize the target task by running another task , The actually running task is called agent task

* The method introduced in this paper achieves two goals simultaneously in the process of learning through agent tasks ： First, optimize the target tasks , The second is to make the agent task more and more reflect the target task

* The target task in this article is to Robocup3D Walking in the race (`SoccerGameplay`), The agent task is to walk on a specially designed road (`ObstacleCourse`), To optimize 25 Parameters

* Specific task introduction

* `SoccerGameplay`: Play a game with a benchmark team

* Definition reward function $reward_{SoccerGameplay}(goalsFor−goalsAgainst)∗ \frac{1}

{2} FieldLength

+avgBallXPosition$

* `ObstacleCourse`: Walking training of a robot

* WAYPOINT: Walk to the target and time it

* STOP： stop it

* $reward_{WAYPOINT}=d_{target}\frac{t_{total}}{t_{taken}}

−Fall$

* $reward_{STOP}=−d_{moved}−Fall$

* Fall by 5( Fall down ) or 0( No fall )

* optimization algorithm

* Every time N Perform the following operations once per cycle ：

* call generateNewBasisTasks() Generate a new batch `ObstacleCourse` Task joining B in , send B Double the number

* Use parameter group set P Run each set of parameters in `SoccerGameplay` And every one of them `ObstacleCourse` Mission

* call rankBasisTasks() Calculate each `ObstacleCourse` The tasks and `SoccerGameplay` Task similarity （ That is, the evaluation value of each group of parameters Spearman The correlation coefficient ）, Eliminate the half with lower similarity

* Use `SoccerGameplay` Task evaluation update P Parameter group in

* The rest of each cycle uses the parameter group set P Run each group of parameters in the task set B And update the parameter group P

* Parameter group P Initialization and update of CMA-ES Algorithm

* result

* Accelerated training , But yes `ObstacleCourse` The update effect of is not great

原网站

版权声明
本文为[IFI_ rccsim]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202150535550267.html