Discrepancy on WER benchmark result in Tedlium dataset #135

MLMonkATGY · 2024-06-04T03:26:26Z

Hi.

I am unable to reproduce the benchmark results in the paper for test split in distil-whisper/tedlium using model distil-whisper/distil-large-v2 when using run_eval.py. However, I am able to achieve reasonable benchmark in all others dataset benchmark reported in the paper (< 1% difference). Any idea what could have caused this discrepencies ?

I followed the suggestions in issue 131 which suggested usage of EnglishTextNormalizer instead of BasicTextNormalizer .

Reported WER from paper: 9.6%
Achieved WER : 12.69%
Difference : 3.09%

Command :

python run_eval.py \
  --model_name_or_path "distil-whisper/distil-large-v2" \
  --dataset_name "distil-whisper/tedlium" \
  --dataset_config_name "release3" \
  --dataset_split_name "test" \
  --text_column_name "text" \
  --batch_size 64 \
  --dtype "bfloat16" \
  --generation_max_length 256 \
  --language "en" \
  --attn_implementation "flash_attention_2"

Modification : Used EnglishTextNormalizer as text normalizer

Thanks in advance.

The text was updated successfully, but these errors were encountered:

bryanyzhu · 2024-06-04T05:27:31Z

I'm facing the same issue, only tedium has this discrepancy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy on WER benchmark result in Tedlium dataset #135

Discrepancy on WER benchmark result in Tedlium dataset #135

MLMonkATGY commented Jun 4, 2024

bryanyzhu commented Jun 4, 2024

Discrepancy on WER benchmark result in Tedlium dataset #135

Discrepancy on WER benchmark result in Tedlium dataset #135

Comments

MLMonkATGY commented Jun 4, 2024

bryanyzhu commented Jun 4, 2024