Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt #24

yinyueqin · 2024-03-12T22:49:47Z

Hi,

Thank you for your work. We're re-evaluating experiments using an updated SFT ckpt from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full and using lm-evaluation-harness v0.4.0 for evaluation. We've noticed a significant performance drop in GSM8k. We trained the model for 6 epochs in each iteration. Have you observed this issue or have insights into potential causes?

junkangwu · 2024-03-13T08:49:54Z

It could be related to the version of lm-evaluation-harness. For more details, see #12 (comment).

Additionally, after updating the SFT checkpoint from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full, the relative improvement between iteration 0 and iteration 1 appears to be marginal. Are there any new parameter settings being recommended?

yinyueqin · 2024-03-15T04:22:07Z

I use lm-evaluation-harness v0.4.0 for evaluation, which is consistent with the evaluation version used by the author. In addition, the results displayed above are obtained using num_train_epochs=6 for training.

AGTSAAA · 2024-05-10T00:02:35Z

Hi @yinyueqin Have you reproduced the preformance? I also can not reproduce the preformance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt #24

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt #24

yinyueqin commented Mar 12, 2024 •

edited

Loading

junkangwu commented Mar 13, 2024

yinyueqin commented Mar 15, 2024

AGTSAAA commented May 10, 2024 •

edited

Loading

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt #24

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt #24

Comments

yinyueqin commented Mar 12, 2024 • edited Loading

junkangwu commented Mar 13, 2024

yinyueqin commented Mar 15, 2024

AGTSAAA commented May 10, 2024 • edited Loading

yinyueqin commented Mar 12, 2024 •

edited

Loading

AGTSAAA commented May 10, 2024 •

edited

Loading