About reproducing the paper #86

ngoductuanlhp · 2024-06-10T20:10:21Z

Thank you for your excellent work.

I have a question regarding the training pipeline. I'm currently trying to reproduce the results in Table 3 of your paper. When I trained the model from scratch on the Kubric dataset, the best evaluation result on the Tapvid Davis dataset is as follows:

"occlusion_accuracy": 0.8503666396802487
"average_jaccard": 0.5575681919643163
"average_pts_within_thresh": 0.7087581437592014
These results are significantly lower than those obtained with your provided checkpoint. I'm using Torch 2.1.0 with CUDA 12.3, and trained the model on 8 A100 GPUs with 200000 iterations, and accumulate gradient of 4 to mimic your setting.

Do you think the issue could be due to mismatched library versions, or might I be missing something else? I appreciate any guidance you can provide.

Thank you.

nikitakaraevv · 2024-06-12T16:57:31Z

Hi @ngoductuanlhp, I don't think there could be such a big gap due to mismatched library versions.

We either train it on 32 GPUs for 50k iterations or on 8 GPUs for 200k. I obtained similar performance with both settings, but 32 GPUs is slightly better. So, have you tried to train the model on 8 GPUs for 200k without gradient accumulations?

Also, how do you evaluate the model?

ngoductuanlhp · 2024-06-12T18:49:49Z

I haven't tried training the model with 200k iterations without gradient accumulations. But I did train the model with 50k iterations on 8gpus with the same learning rate of 0.0005 and the results are not good.

I use your evaluate script to evaluate on Tapvid-davis first/strided, and the dynamic replica validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About reproducing the paper #86

About reproducing the paper #86

ngoductuanlhp commented Jun 10, 2024 •

edited

Loading

nikitakaraevv commented Jun 12, 2024

ngoductuanlhp commented Jun 12, 2024

About reproducing the paper #86

About reproducing the paper #86

Comments

ngoductuanlhp commented Jun 10, 2024 • edited Loading

nikitakaraevv commented Jun 12, 2024

ngoductuanlhp commented Jun 12, 2024

ngoductuanlhp commented Jun 10, 2024 •

edited

Loading