Skip to content

Implementation (TF2) of proximal policy optimization (PPO) for a continuous control task

Notifications You must be signed in to change notification settings

dstuemk/simple-ppo-continuous

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Continuous PPO with Tensorflow 2.0

A minimalistic implementation of OpenAI's proximal policy optimization algorithm. It learns to swing up a pendulum (from OpenAI Gym). There is much room for performance improvement, so far training computation happens on only 1 GPU even if more ressources are available.

Usage:

  • To train the agent run python main.py train
  • To run an episode (after training) run python main.py enjoy

Generalized Advantage Estimation

The advantage value is calculated using the Generalized Advantage Estimation (GAE) Method. When reading through the source code one might wonder about the usge of LinearOperatorToeplitz. This operator lets us calculate the GAE values with a simple Matrix-Vector multiplication:

Matrix-Vector multiplication

The delta terms are the temporal difference errors (TD-errors):

td-error

About

Implementation (TF2) of proximal policy optimization (PPO) for a continuous control task

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages