Reinforcement-Learning

Value-based

This part was dedicated to some standard methods for value-based tabular reinforcement learning namely Q-learning, SARSA, n-step Q-learning and Monte-Carlo methods. In addition, these algorithms were compared to classic dynamic-programming approach. After that hyperparameter tuning, exploration/exploitation tradeoff was discussed, on/off-policy algorithms comparison experiments and their analysis were done.

Deep Q-Learning

DQL report link

As the systems in Reinforcement Learning become high-dimensional, continuous and have exponential branching factor, the methodology moves away from the tabular approach into implementing deep learning methods to reach solutions. We consider the simple CartPole-v1 environment in Gym and implement a Deep Q-Network (DQN) to perform Q-value iteration to solve the environment, along with an automated parallelizable framework for solving reinforcement learning problems with HPO and subsequent analysis. We present the methodology for the network architecture, the hyperparameters, and optimization strategies, and perform and discuss the results of an ablation study comparing the performance impact of various features of the DQN such as exploration strategies, experience replay and a target network.

Policy-based

PB report link

In Reinforcement Learning problems that have high-dimensional or continuous action spaces, the value based approach fails to replicate its performance in problems with smaller dimensions. Therefore, instead of learning the Q-value function and having a separate algorithm to pick the optimal next action, it is possible to learn instead the policy. This work considers the simple CartPole-v1 environment in Gym and implements the REINFORCE and actor-critic algorithms to perform gradient-based policy search to solve the environment. Additionally, an ablation study is performed to compare the REINFORCE algorithm to the more sophisticated actor-critic approach, and to investigate the performance impact of bootstrapping and baseline-subtraction in the latter.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DQN		DQN
PB		PB
Tabular_RL		Tabular_RL
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement-Learning

Value-based

Deep Q-Learning

Policy-based

About

Releases

Packages

Languages

License

doctorblinch/Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning

Value-based

Deep Q-Learning

Policy-based

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages