stt_en_conformer_ctc_large_ls.nemo WER 0.51 with common voice 13.0 #13229

jhss · 2025-04-24T09:13:42Z

Transcribing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [01:12<00:00, 14.07it/s]
[NeMo I 2025-04-24 18:10:21 transcribe_speech:420] Model time for iteration 0: 74.435
[NeMo I 2025-04-24 18:10:21 transcribe_speech:425] Model time avg: 74.435
[NeMo I 2025-04-24 18:10:21 transcribe_speech:431] Finished transcribing from manifest file: datasets/en/mozilla-foundation/common_voice_13_0/en/test/test_mozilla-foundation_common_voice_13_0_manifest.json
[NeMo I 2025-04-24 18:10:21 transcribe_speech:436] Writing transcriptions into file: tmp
[NeMo I 2025-04-24 18:10:23 transcribe_speech:459] Finished writing predictions to tmp!
Backend tkagg is interactive backend. Turning interactive mode on.
[NeMo I 2025-04-24 18:11:12 transcribe_speech:477] Writing prediction and error rate of each sample to tmp!
[NeMo I 2025-04-24 18:11:12 transcribe_speech:478] {'samples': 16372, 'tokens': 152585, 'wer': 0.5112756824065275, 'ins_rate': 0.053124487990300485, 'del_rate': 0.023324704263197563, 'sub_rate': 0.43482649015302943}

I ran speech_to_text_eval.py with the model stt_en_conformer_ctc_large_ls.nemo
I think WER seems high. Is it usual?

The text was updated successfully, but these errors were encountered:

nithinraok · 2025-04-28T15:31:14Z

Yes, those are old models and might not be suitable for all sets. Could you try with latest https://huggingface.co/nvidia/parakeet-tdt_ctc-110m using ctc decoder. You can swap the decoder to ctc before running inference using:
asr_model.change_decoding_strategy(decoder_type='ctc')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stt_en_conformer_ctc_large_ls.nemo WER 0.51 with common voice 13.0 #13229

stt_en_conformer_ctc_large_ls.nemo WER 0.51 with common voice 13.0 #13229

jhss commented Apr 24, 2025 •

edited

Loading

nithinraok commented Apr 28, 2025

stt_en_conformer_ctc_large_ls.nemo WER 0.51 with common voice 13.0 #13229

stt_en_conformer_ctc_large_ls.nemo WER 0.51 with common voice 13.0 #13229

Comments

jhss commented Apr 24, 2025 • edited Loading

nithinraok commented Apr 28, 2025

jhss commented Apr 24, 2025 •

edited

Loading