Large differences in experimental results when BATCH_SIZE = 16 and EPOCH=500 #9

Xiyan-Xu · 2023-10-30T17:59:17Z

Thanks for sharing your great work!
I have trained the model myself with respect to your readme guideline, but set BATCH_SIZE = 16 and EPOCH=500 due to the lack of computing resources. In this setting, my trained model has much worse performance compared with the evaluation results presented in the paper. I am wondering if it is essential to have exact same training setting to make the model have similar performance to paper's model. Besides, could you kindly release the checkpoint that exclusively trained on the training set? I think that would be really helpful for me!
Thanks for your time and patience!

tr3e · 2023-11-02T12:59:56Z

Sorry for that, there are some typos in evaluator.py. We have already fixed that.
please make sure your code is up to date.

Xiyan-Xu · 2023-11-02T15:09:04Z

Thanks for reply. I am sure my code is up to date.
Can you release the checkpoint that exclusively trained on the training set? That would be really helpful.

pabloruizponce · 2023-12-04T17:13:35Z

I have trained for 1500 epochs with a batch size of 16 and I have a 12.9409 in FID compared to the 5.9 reported in the paper. Is there any reason for such a difference? All the rest of the parameters in the configs files were the ones used in the training of the model reported in the paper?

Thanks :)

tr3e · 2023-12-05T03:49:16Z

I am figuring it out. I will contact you as soon as possible.

pabloruizponce · 2023-12-21T17:58:37Z

@tr3e Any news on the issue? I have trained a model with same configuration as the one in your repo (except the batch size)

GENERAL:
  EXP_NAME: IG-S-8
  CHECKPOINT: ./checkpoints
  LOG_DIR: ./log

TRAIN:
  LR: 1e-4
  WEIGHT_DECAY: 0.00002
  BATCH_SIZE: 16
  EPOCH: 2000
  STEP: 1000000
  LOG_STEPS: 10
  SAVE_STEPS: 20000
  SAVE_EPOCH: 100
  RESUME: #checkpoints/IG-S/8/model/epoch=99-step=17600.ckpt
  NUM_WORKERS: 2
  MODE: finetune
  LAST_EPOCH: 0
  LAST_ITER: 0

But these are my results using your evaluation script:

========== MM Distance Summary ==========
---> [ground truth] Mean: 3.7844 CInterval: 0.0012
---> [InterGen] Mean: 3.8818 CInterval: 0.0017
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4306 CInt: 0.0070;(top 2) Mean: 0.6110 CInt: 0.0086;(top 3) Mean: 0.7092 CInt: 0.0060;
---> [InterGen](top 1) Mean: 0.2517 CInt: 0.0071;(top 2) Mean: 0.3818 CInt: 0.0048;(top 3) Mean: 0.4662 CInt: 0.0046;
========== FID Summary ==========
---> [ground truth] Mean: 0.2966 CInterval: 0.0085
---> [InterGen] Mean: 10.7803 CInterval: 0.1791
========== Diversity Summary ==========
---> [ground truth] Mean: 7.7673 CInterval: 0.0440
---> [InterGen] Mean: 7.8075 CInterval: 0.0274
========== MultiModality Summary ==========
---> [InterGen] Mean: 1.5340 CInterval: 0.0615

As you can observe, the results are very distant from the ones provided in the paper. I am in an ongoing research using your dataset, but in order to make a fair comparison, we need to be able to replicate your results.

Hope you find what's going on :)

tr3e · 2023-12-23T09:26:19Z

Hello!
I have run the newest training code exactly in this repo with a batch size of 64 (32 for each of 2 GPUs) for 1500 epochs.
The results are like this:

========== MM Distance Summary ==========
---> [ground truth] Mean: 3.7847 CInterval: 0.0007
---> [InterGen] Mean: 4.1817 CInterval: 0.0009
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4248 CInt: 0.0046;(top 2) Mean: 0.6036 CInt: 0.0044;(top 3) Mean: 0.7026 CInt: 0.0047;
---> [InterGen](top 1) Mean: 0.3785 CInt: 0.0052;(top 2) Mean: 0.5163 CInt: 0.0040;(top 3) Mean: 0.6350 CInt: 0.0032;
========== FID Summary ==========
---> [ground truth] Mean: 0.2981 CInterval: 0.0057
---> [InterGen] Mean: 5.8447 CInterval: 0.0735
========== Diversity Summary ==========
---> [ground truth] Mean: 7.7516 CInterval: 0.0163
---> [InterGen] Mean: 7.8750 CInterval: 0.0324
========== MultiModality Summary ==========
---> [InterGen] Mean: 1.5634 CInterval: 0.0334

We suggest that you can update to the newest code, and kindly increase the batch size.

pabloruizponce · 2024-01-08T11:25:24Z

@tr3e I am still unable to replicate the results. Can you provide me with some contact method to talk with you and not fill this issue?

Xiyan-Xu · 2024-01-08T16:22:43Z

@tr3e I am still unable to replicate the results. Can you provide me with some contact method to talk with you and not fill this issue?

me too.

tr3e · 2024-01-09T02:23:55Z

my email is [email protected] :)

szqwu · 2024-05-06T01:25:01Z

Hello! I have run the newest training code exactly in this repo with a batch size of 64 (32 for each of 2 GPUs) for 1500 epochs. The results are like this:

========== MM Distance Summary ========== ---> [ground truth] Mean: 3.7847 CInterval: 0.0007 ---> [InterGen] Mean: 4.1817 CInterval: 0.0009 ========== R_precision Summary ========== ---> [ground truth](top 1) Mean: 0.4248 CInt: 0.0046;(top 2) Mean: 0.6036 CInt: 0.0044;(top 3) Mean: 0.7026 CInt: 0.0047; ---> [InterGen](top 1) Mean: 0.3785 CInt: 0.0052;(top 2) Mean: 0.5163 CInt: 0.0040;(top 3) Mean: 0.6350 CInt: 0.0032; ========== FID Summary ========== ---> [ground truth] Mean: 0.2981 CInterval: 0.0057 ---> [InterGen] Mean: 5.8447 CInterval: 0.0735 ========== Diversity Summary ========== ---> [ground truth] Mean: 7.7516 CInterval: 0.0163 ---> [InterGen] Mean: 7.8750 CInterval: 0.0324 ========== MultiModality Summary ========== ---> [InterGen] Mean: 1.5634 CInterval: 0.0334

We suggest that you can update to the newest code, and kindly increase the batch size.

Hi, I found that the MMDist here is lower than what is presented in the paper. When I am reproducing your work as well as my model, this MMDist is always around 4. Is there any mistake in the calculation?

RunqiWang77 · 2024-06-21T03:04:25Z

The R_precision of InterGen that I reproduced is always higher than that of GT. Does anyone know the reason for this? Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large differences in experimental results when BATCH_SIZE = 16 and EPOCH=500 #9

Large differences in experimental results when BATCH_SIZE = 16 and EPOCH=500 #9

Xiyan-Xu commented Oct 30, 2023

tr3e commented Nov 2, 2023

Xiyan-Xu commented Nov 2, 2023 •

edited

Loading

pabloruizponce commented Dec 4, 2023

tr3e commented Dec 5, 2023

pabloruizponce commented Dec 21, 2023

tr3e commented Dec 23, 2023

pabloruizponce commented Jan 8, 2024

Xiyan-Xu commented Jan 8, 2024

tr3e commented Jan 9, 2024

szqwu commented May 6, 2024

RunqiWang77 commented Jun 21, 2024

Large differences in experimental results when BATCH_SIZE = 16 and EPOCH=500 #9

Large differences in experimental results when BATCH_SIZE = 16 and EPOCH=500 #9

Comments

Xiyan-Xu commented Oct 30, 2023

tr3e commented Nov 2, 2023

Xiyan-Xu commented Nov 2, 2023 • edited Loading

pabloruizponce commented Dec 4, 2023

tr3e commented Dec 5, 2023

pabloruizponce commented Dec 21, 2023

tr3e commented Dec 23, 2023

pabloruizponce commented Jan 8, 2024

Xiyan-Xu commented Jan 8, 2024

tr3e commented Jan 9, 2024

szqwu commented May 6, 2024

RunqiWang77 commented Jun 21, 2024

Xiyan-Xu commented Nov 2, 2023 •

edited

Loading