Reinforcement learning exercises from R.S. Sutton & A. Barto's "Reinforcement Learning: An Introduction" (1992)
Finding optimal strategy for Jack which gives optimal reward (please refer to the book for details of the problem).
where the heatmaps are through Day 0 ~ 5.
The colors represent the number of cars to be moved from lot 1 to 2.
Uses Sarsa on-policy TD algorithm to find the quickest route to the goal when wind is blowing upwards.
The color represents steps.
Using TD(
After 10 episodes
After 100 episodes
After 1000 episodes
Value function (after 100 episodes)