Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of R seems to be different in this implementation compared to original article #2

Open
joabim opened this issue May 31, 2016 · 0 comments

Comments

@joabim
Copy link

joabim commented May 31, 2016

Hi,

When comparing your code to the pseudocode snippet in the article, you seem to use the bootstrapped value R_t for every iteration where you append values to the R array (currently line 255). Shouldn't we append something in the line of `rewards[i] + GAMMA*R[i] where the first element is either 0.0 or V(s_t, Theta'_t) depending on if s_t was terminal or not?

Kind regards,
joabim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant