当前位置：网站首页>[deep learning] - maze task learning I (to realize the random movement of agents)

[deep learning] - maze task learning I (to realize the random movement of agents)

2022-06-29 06:10:00 【electrochemjy】

The maze of deep reinforcement learning I

- Build a maze
- The implementation of agent

This document is used for learning records of deep reinforcement learning , First, learn the basic idea of reinforcement learning process through maze task

【 Maze task advanced 】
Stage 1 ： Implement an agent , The agent searches randomly in the maze and moves towards the target
Stage two ： Make the agent move towards the goal directly ( Strategy iteration )
Stage three ： Value iteration ( Give value to the state and action of the agent ), Seek the most valuable action and state ( Get the right value )
PS： First, record the learning of stage one

Build a maze

#  I'm going to use the function 
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation
%matplotlib inline

# Draw the initial state of the maze 
def plot():
    fig=plt.figure(figsize=(5,5))
    ax=plt.gca()
    #  Draw walls 
    plt.plot([1,1],[0,1],color='red',linewidth=3)
    plt.plot([1,2],[2,2],color='red',linewidth=2)
    plt.plot([2,2],[2,1],color='red',linewidth=2)
    plt.plot([2,3],[1,1],color='red',linewidth=2)
    #  Painting state 
    plt.text(0.5,2.5,'S0',size=14,ha='center')
    plt.text(1.5,2.5,'S1',size=14,ha='center')
    plt.text(2.5,2.5,'S2',size=14,ha='center')
    plt.text(0.5,1.5,'S3',size=14,ha='center')
    plt.text(1.5,1.5,'S4',size=14,ha='center')
    plt.text(2.5,1.5,'S5',size=14,ha='center')
    plt.text(0.5,0.5,'S6',size=14,ha='center')
    plt.text(1.5,0.5,'S7',size=14,ha='center')
    plt.text(2.5,0.5,'S8',size=14,ha='center')
    plt.text(0.5,2.5,'S0',size=14,ha='center')
    plt.text(0.5,2.3,'START',ha='center')
    plt.text(2.5,0.3,'END',ha='center')
    #  Set the drawing range 
    ax.set_xlim(0,3)
    ax.set_ylim(0,3)
    plt.tick_params(axis='both',which='both',bottom='off',top='off',labelbottom='off',right='off',left='off',labelleft='off')
    #  The current position S0 With green circles 
    line,=ax.plot([0.5],[2.5],marker="o",color='g',markersize=60)
    #  Display diagram 
    plt.show()
# Function detection ( This is a static display )
fig=plot()

Insert picture description here

The implementation of agent

The rules that define the behavior of agents are called policies , signify “ In state s Take action a The probability of follows by the parameter theta Defined strategy pi”
In the task , state s It refers to the position of the agent in the maze , action a Refers to the operations that an agent can perform in this state ( Such as upward 、 towards the right 、 Down and left ), Parameters theta Means in a state of s The probability of using this action .
Therefore, the initial state of the maze task can be transformed into a matrix

# The initial state of the maze ,1 Indicates that the direction can be advanced ,np.nan It means there are walls and you can't go forward ,[ Up , towards the right , Down , towards the left ]
theta_0=np.array([[np.nan,1,1,np.nan], #S0
                  [np.nan,1,np.nan,1], #S1
                  [np.nan,np.nan,1,1], #S2
                  [1,1,1,np.nan], #S3
                  [np.nan,np.nan,1,1], #S4
                  [1,np.nan,np.nan,np.nan], #S5
                  [1,np.nan,np.nan,np.nan], #S6
                  [1,1,np.nan,np.nan],  #S7
                  ])   # S8 A goal   No strategy

# Will correspond to the forward direction theta Values are converted to percentages as probabilities 
def simple_convert_into_pi_from_theta(theta):
    '''  Simply calculate the ratio '''
    [m,n]=theta.shape #  Read theta matrix 
    pi=np.zeros((m,n))
    for i in range(0,m):
        pi[i,:]=theta[i,:]/np.nansum(theta[i,:]) #  Calculate the ratio 
    pi=np.nan_to_num(pi) #  take nan Convert to 0, Because the probability of moving towards the wall is 0
    return pi
# Initial strategy 
pi_0=simple_convert_into_pi_from_theta(theta_0)
print(" The initial strategy is pi_0=",pi_0)

 The initial strategy is pi_0= [[0.         0.5        0.5        0.        ]
 [0.         0.5        0.         0.5       ]
 [0.         0.         0.5        0.5       ]
 [0.33333333 0.33333333 0.33333333 0.        ]
 [0.         0.         0.5        0.5       ]
 [1.         0.         0.         0.        ]
 [1.         0.         0.         0.        ]
 [0.5        0.5        0.         0.        ]]

# The random movement of the agent is realized according to the state of the agent 
#  Set the status index , Find the state after one step of movement s
def get_next_s(pi,s):
    direction = ["up", "right", "down", "left"]
    next_direction=np.random.choice(direction,p=pi[s,:]) #  from direction With probability p, Random direction selection ,s Is the agent state （0-8）
    #  Determine the next step according to the action 
    if next_direction=='up':
        s_next=s-3 #  Move up   Number of States -3
    if next_direction=="right":
        s_next = s + 1
    if next_direction=="down":
        s_next = s + 3
    if next_direction=="left":
        s_next = s - 1
    return s_next

# The definition of the function that the agent continues to move and reach the goal 
def goal_maze(pi):
    s=0
    state_history=[0]# Create a list to record the moving track of the agent 
    while (1):
        next_s=get_next_s(pi,s)
        state_history.append(next_s)# Record the history of the moving track of the agent 
        if next_s==8:
            break
        else:
            s=next_s
    return state_history
# An agent consists of states s0 Target state reached s8 Track history moved 
state_history=goal_maze(pi_0)
print("s0-s8 Moving records ",state_history)# Changing 
print("s0-s8 Move steps ",len(state_history))# Changing 
# Because agents move randomly according to probability , Therefore, the state change trajectory may be different for each execution

s0-s8 Moving records  [0, 3, 6, 3, 4, 7, 8]
s0-s8 Move steps  7

The above is the implementation process of maze task phase I

原网站

版权声明
本文为[electrochemjy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160907598108.html

当前位置：网站首页>[deep learning] - maze task learning I (to realize the random movement of agents)

[deep learning] - maze task learning I (to realize the random movement of agents)

The maze of deep reinforcement learning I

Build a maze

The implementation of agent

边栏推荐

猜你喜欢

随机推荐