An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Basic auth for Django.

Basic auth for Django.

bichanna 2 Mar 25, 2022
Pingo provides a uniform API to program devices like the Raspberry Pi, BeagleBone Black, pcDuino etc.

Pingo provides a uniform API to program devices like the Raspberry Pi, BeagleBone Black, pcDuino etc. just like the Python DBAPI provides an uniform API for database programming in Python.

Garoa Hacker Clube 12 May 22, 2022
Django-registration (redux) provides user registration functionality for Django websites.

Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor

Andrew Cutler 920 Jan 08, 2023
Login-python - Login system made in Python, using native libraries

login-python Sistema de login feito 100% em Python, utilizando bibliotecas nativ

Nicholas Gabriel De Matos Leal 2 Jan 28, 2022
Simple Login - Login Extension for Flask - maintainer @cuducos

Login Extension for Flask The simplest way to add login to flask! How it works First, install it from PyPI: $ pip install flask_simplelogin Then, use

Flask Extensions 181 Jan 01, 2023
Implementation of Supervised Contrastive Learning with AMP, EMA, SWA, and many other tricks

SupCon-Framework The repo is an implementation of Supervised Contrastive Learning. It's based on another implementation, but with several differencies

Ivan Panshin 132 Dec 14, 2022
Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.

Welcome to django-allauth! Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (soc

Raymond Penners 7.7k Jan 03, 2023
Python module for generating and verifying JSON Web Tokens

python-jwt Module for generating and verifying JSON Web Tokens. Note: From version 2.0.1 the namespace has changed from jwt to python_jwt, in order to

David Halls 210 Dec 24, 2022
Luca Security Concept

Luca Security Concept This is the document source of luca's security concept. Please go here for the HTML version: https://luca-app.de/securityconcept

luca 43 Oct 22, 2022
Two factor authentication system using azure services and python language and its api's

FUTURE READY TALENT VIRTUAL INTERSHIP PROJECT PROJECT NAME - TWO FACTOR AUTHENTICATION SYSTEM Resources used: * Azure functions(python)

BHUSHAN SATISH DESHMUKH 1 Dec 10, 2021
python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

1k Dec 22, 2022
Google Auth Python Library

Google Auth Python Library This library simplifies using Google's various server-to-server authentication mechanisms to access Google APIs. Installing

Google APIs 598 Jan 07, 2023
Social auth made simple

Python Social Auth Python Social Auth is an easy-to-setup social authentication/registration mechanism with support for several frameworks and auth pr

Matías Aguirre 2.8k Dec 24, 2022
Per object permissions for Django

django-guardian django-guardian is an implementation of per object permissions [1] on top of Django's authorization backend Documentation Online docum

3.3k Jan 01, 2023
Mock authentication API that acceccpts email and password and returns authentication result.

Mock authentication API that acceccpts email and password and returns authentication result.

Herman Shpryhau 1 Feb 11, 2022
Cack facebook tidak login

Cack facebook tidak login

Angga Kurniawan 5 Dec 12, 2021
Auth for use with FastAPI

FastAPI Auth Pluggable auth for use with FastAPI Supports OAuth2 Password Flow Uses JWT access and refresh tokens 100% mypy and test coverage Supports

David Montague 95 Jan 02, 2023
Django CAS 1.0/2.0/3.0 client authentication library, support Django 2.0, 2.1, 2.2, 3.0 and Python 3.5+

django-cas-ng django-cas-ng is Django CAS (Central Authentication Service) 1.0/2.0/3.0 client library to support SSO (Single Sign On) and Single Logou

django-cas-ng 347 Dec 18, 2022
Basic auth for Django.

easy-basicauth WARNING! THIS LIBRARY IS IN PROGRESS! ANYTHING CAN CHANGE AT ANY MOMENT WITHOUT ANY NOTICE! Installation pip install easy-basicauth Usa

bichanna 2 Mar 25, 2022
python implementation of JSON Web Signatures

python-jws 🚨 This is Unmaintained 🚨 This library is unmaintained and you should probably use For histo

Brian J Brennan 57 Apr 18, 2022