Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Termordle - a terminal based wordle clone in python

Termordle - a terminal based wordle clone in python

2 Feb 08, 2022
Software Design | Spring 2020 | Classic Arcade Game

Breakout Software Design Final Project, Spring 2020 Team members: Izumi, Lilo For our Interactive Visualization, we implemented the classic arcade gam

Lilo Heinrich 1 Jul 26, 2022
The Sinclair ZX Spectrum BASIC compiler!

ZX BASIC Copyleft (K) 2008, Jose Rodriguez-Rosa (a.k.a. Boriel) http://www.boriel.com All files in this project are covered under the GPLv3 LICENSE ex

Jose Rodriguez 143 Dec 13, 2022
Minimalistic generic chess variant GUI using pyffish and PySimpleGUI, based on the PySimpleGUI Chess Demo

FairyFishGUI Minimalistic generic chess variant GUI using pyffish and PySimpleGUI, based on the PySimpleGUI Chess Demo. Supports all chess variants su

Fabian Fichter 6 Dec 20, 2022
Cocos2d-x is a suite of open-source, cross-platform, game-development tools used by millions of developers all over the world.

cocos2d-x Win32 Others cocos2d-x is a multi-platform framework for building 2d games, interactive books, demos and other graphical applications. It is

cocos2d 16.7k Jan 04, 2023
Datamining of 15 Days of (free) Games at the Epic Games Store (EGS).

EGS: 15 Days of Games This repository contains Python code to data-mine the 15 Days of (free) Games at the Epic Games Store (EGS). Requirements Instal

Wok 9 Dec 27, 2022
QuizGame is a quiz with different topics. You can choose a topic and take the quiz

QuizGame is a quiz with different topics. You can choose a topic and take the quiz. In the end you will get your result. The program is under active development, so there may be errors or flaws in it

Lev Likhachev 2 Nov 12, 2021
Typing test and practice on command line without the need of any internet connection

Terminal-Typing-Test Typing test and practice on command line without the need of any internet connection About CLI based typing test and practice tha

Angad Deep Singh 3 Oct 19, 2022
A tool for the creation of rooms used in maps in the game Wastelands

Wastelands Room Data editor A tool for the creation of rooms used in maps in the game Wastelands Creates .wrd files, that get loaded by the map genera

Avant 6 Jul 12, 2021
Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Using Deep Q-Network to Learn How To Play Flappy Bird 7 mins version: DQN for flappy bird Overview This project follows the description of the Deep Q

Yen-Chen Lin 6.4k Dec 30, 2022
KBYD - Simple Bulls and Cows Game

KBYD KBYD - Simple Bulls and Cows Game How to Play KBYD is a simple Bulls and Cows Game. When the game starts, the computer randomly generates 3 to 5

1 Dec 04, 2021
Made by Ashish and Avinash-sord12k. Powered by pygame

Spook_alle About -Made by Ashish (Github: Ashish-Github193) and Avinash-sord12k Version - BETA v_1.0 /1-11-2021/ (game is at its base version more ite

Ashish Kumar Jha 1 Nov 01, 2021
Wordle-prophecy - The comprehensive list of all Wordle answers, past and future

About This repo contains the comprehensive list of all Wordle answers, past and

Hayden Moritz 2 Dec 15, 2022
The zero player Darwinism simulation game as described by Conway (demonstrates Turing Completeness)

Conway's Game of Life The zero player Darwinism simulation game as described by Conway (demonstrates Turing Completeness). I created this script after

jachinema 1 Feb 13, 2022
Frets on Fire X: a fork of Frets on Fire with many added features and capabilities

Frets on Fire X - FoFiX This is Frets on Fire X, a highly customizable rhythm game supporting many modes of guitar, bass, drum, and vocal gameplay for

FoFiX 377 Jan 02, 2023
A Street Fighter game in Pygame

What is Street Fighter? Street Fighter, commonly abbreviated as SF or スト, is a Japanese competitive fighting video game franchise developed and publis

Sameer Sahu 3 Aug 20, 2022
HTTP API for FGO game data. Transform the raw game data into something a bit more manageable.

FGO game data API HTTP API for FGO game data. Transform the raw game data into something a bit more manageable. View the API documentation here: https

Atlas Academy 51 Dec 26, 2022
Game of life, with python code.

Game of Life The Game of Life, also known simply as Life, is a cellular automaton. It is a zero-player game, meaning that its evolution is determined

Mohammad Dori 3 Jul 15, 2022
Blackjack Game made using Python

Blackjack Game made using Python Blackjack is a popular card game played in most of the casino.This is an intuition to replicate the same card game us

SUHASJAGADISH 1 Nov 28, 2021
The repository that hosts the code that teaches a reinforcement learning - based bot to play 2048

The repository that hosts the code that teaches a reinforcement learning - based bot (based on policy gradients method) to play 2048

Maxim Rud 1 Dec 16, 2021