This repository showcases the implementation of a Double Deep Q-Learnig algorithm for the FrozenLake environment from Open AI's gym library.
The main idea behind Q-learning is that if we had a function
But this not scalable, as we must compute
For our training update rule, we'll use the fact that every
The difference between the two sides of the equality is known as the temporal difference error,
To minimise this error, we will use the Huber loss. The Huber loss acts like the mean squared error when the error is small, but like the mean absolute error when the error is large - this makes it more robust to outliers when the estimates of
We will implement Double Deep Q-Learning here. Double Deep Q-Learning is used to reduce the maximaztion bias in Q-Learning. This entails using two separate