Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis discussion: Why can the end-to-end algorithm work properly? #22

Open
nomadlx opened this issue Mar 1, 2024 · 5 comments
Open

Comments

@nomadlx
Copy link

nomadlx commented Mar 1, 2024

In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the figure.

image

However, I have a problem with this end-to-end method. Let's say we've got the Opponent Player model for round t, and now we're going to learn the Opponent Player model for round t+1 through an end-to-end algorithm, but since $P_{\theta}$ and $P_{\theta_{t}}$ are one model, isn't the resultant loss 0? This means that we can't get $P_{\theta_{t+1}}$ that has any progress , is my understanding wrong?

@angelahzyuan
Copy link
Collaborator

angelahzyuan commented Apr 7, 2024

In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the figure.

image

However, I have a problem with this end-to-end method. Let's say we've got the Opponent Player model for round t, and now we're going to learn the Opponent Player model for round t+1 through an end-to-end algorithm, but since Pθ and Pθt are one model, isn't the resultant loss 0? This means that we can't get Pθt+1 that has any progress , is my understanding wrong?

Thank you for your question. Since $\ell$ here is a monotonically decreasing and convex function (logistic loss in our paper $\ell(t) = \log(1 + \exp(-t))$, the gradient is nonzero when $P_{\theta}$ and $P_{\theta_t}$ are the same. Let us know if you have further questions

@wnzhyee
Copy link

wnzhyee commented Apr 8, 2024

def spin_loss(

In spin_loss difinition, at steps 0, the loss value starts with a fixed value 0.6931, when p_theta equals to p_theta_t

1 similar comment
@wnzhyee
Copy link

wnzhyee commented Apr 8, 2024

def spin_loss(

In spin_loss difinition, at steps 0, the loss value starts with a fixed value 0.6931, when p_theta equals to p_theta_t

@nomadlx
Copy link
Author

nomadlx commented Apr 9, 2024

def spin_loss(

在 spin_loss 定义中,在步骤 0 处,当 p_theta 等于 p_theta_t 时,损失值从固定值 0.6931 开始

I know that the initial loss is not equal to 0 in the actual code, but this is caused by the actual calculation method, but it is undeniable that the value of this formula is 0 in the paper, isn't it?
image

@angelahzyuan
Copy link
Collaborator

angelahzyuan commented Apr 9, 2024

First of all, a monotonically decreasing and convex function $\ell$ is required in the algorithm. The value of $\ell(0) = \log(2) \approx 0.6931$. Therefore, the value of this formula is not 0 at 0. Secondly, the progress of $\theta_t$ is not dependent on the value, but on the gradient. Let us know if there are any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants