Repository contains codes for the course CS780: Deep Reinforcement Learning


CS780 Assignments

Assignment 1

  • Implementation of Bernoulli and Gaussian Bandit environment using Gymnasium library and simulating them for different combinations of hyper parameters
  • Implementation of different learning strategies like pureExploitation, pureExploration, epsilonGreedyExploration, decayingEpsilonGreedyExploration, softmaxExploration and UCBExploration methods and their corresponding simulations on both environments along with tuning hyper parameters for different environments.
  • Implementation of Random Walk Environment, creation of trajectory using generateTrajectory function for simulation
  • Implementation of MonteCarloPrediction (both FVMC and EVMC) and TemporalDifferencePrediction for calculation of state values in the environment
  • Plotting the evolution of state values over episodes, log scale episodes, seed averaged plots for effective noise removal
  • Analysing the variation of target values for a particular state for the case of both environments

Assignment 2

  • Implementation of control algorithms like MonteCarloControl, SARSAControl, Q learning, double Q learning, SARSA($\lambda$) with eligibility traces, Q($\lambda$) with traces
  • Implementation of model based algorithms like Dyna-Q and Trajectory Sampling for optimal policy calculation and values for each of the states in Random Maze Environment
  • Comparison between different off-policy and on-policy control algorithms for this environment

Assignment 3

This assignment primarily includes the implementation of 5 Value Based Deep RL models namely:

  • Neural Fitted Q Iteration (NFQ)
  • Deep Q Network (DQN)
  • Double Deep Q Network (DDQN)
  • Dueling Double Deep Q Network (D3QN)
  • Dueling Double Deep Q Network with Prioritized Experience Replay (D3QN-PER)

and 2 Policy Based Deep RL models namely:

  • Vanilla Policy Gradient (VPG)

on two different OpenAI gym environments like Cartpole-v0 and MountainCar-v1 respectively.

Assignment 4

This assignment primiarily includes implementation of 3 Deep RL models for continuous action spaces namely:

  • Deep Deterministic Policy Gradient (DDPG)
  • Twin Delayed Deep Deterministic Policy Gradient (TD3)
  • Proximal Policy Optimization (PPO)

on three different OpenAI gym environments like Pendulum-v1, Hopper-v4 and HalfCheetah-v1 respectively.


  • Implementation of Random Maze Environment and its simulations
  • Implementation of Policy Iteration and Value Iteration for optimal policy calculation and values for each of the states in the environment and its comparative analyses.
  • Implementation of Monte Carlo, Temporal Difference-n step, TD($\lambda$) algorithm for calculation of values for each states using optimal policies and its comparative analyses.