-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thesis discussion: Why can the end-to-end algorithm work properly? #22
Comments
Thank you for your question. Since |
SPIN/spin/alignment/trainer.py Line 405 in e84b7be
In spin_loss difinition, at steps 0, the loss value starts with a fixed value 0.6931, when p_theta equals to p_theta_t |
1 similar comment
SPIN/spin/alignment/trainer.py Line 405 in e84b7be
In spin_loss difinition, at steps 0, the loss value starts with a fixed value 0.6931, when p_theta equals to p_theta_t |
I know that the initial loss is not equal to 0 in the actual code, but this is caused by the actual calculation method, but it is undeniable that the value of this formula is 0 in the paper, isn't it? |
First of all, a monotonically decreasing and convex function |
In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the figure.
However, I have a problem with this end-to-end method. Let's say we've got the Opponent Player model for round t, and now we're going to learn the Opponent Player model for round t+1 through an end-to-end algorithm, but since$P_{\theta}$ and $P_{\theta_{t}}$ are one model, isn't the resultant loss 0? This means that we can't get $P_{\theta_{t+1}}$ that has any progress , is my understanding wrong?
The text was updated successfully, but these errors were encountered: