You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transcribing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [01:12<00:00, 14.07it/s]
[NeMo I 2025-04-24 18:10:21 transcribe_speech:420] Model time for iteration 0: 74.435
[NeMo I 2025-04-24 18:10:21 transcribe_speech:425] Model time avg: 74.435
[NeMo I 2025-04-24 18:10:21 transcribe_speech:431] Finished transcribing from manifest file: datasets/en/mozilla-foundation/common_voice_13_0/en/test/test_mozilla-foundation_common_voice_13_0_manifest.json
[NeMo I 2025-04-24 18:10:21 transcribe_speech:436] Writing transcriptions into file: tmp
[NeMo I 2025-04-24 18:10:23 transcribe_speech:459] Finished writing predictions to tmp!
Backend tkagg is interactive backend. Turning interactive mode on.
[NeMo I 2025-04-24 18:11:12 transcribe_speech:477] Writing prediction and error rate of each sample to tmp!
[NeMo I 2025-04-24 18:11:12 transcribe_speech:478] {'samples': 16372, 'tokens': 152585, 'wer': 0.5112756824065275, 'ins_rate': 0.053124487990300485, 'del_rate': 0.023324704263197563, 'sub_rate': 0.43482649015302943}
I ran speech_to_text_eval.py with the model stt_en_conformer_ctc_large_ls.nemo
I think WER seems high. Is it usual?
The text was updated successfully, but these errors were encountered:
Yes, those are old models and might not be suitable for all sets. Could you try with latest https://huggingface.co/nvidia/parakeet-tdt_ctc-110m using ctc decoder. You can swap the decoder to ctc before running inference using: asr_model.change_decoding_strategy(decoder_type='ctc')
I ran
speech_to_text_eval.py
with the modelstt_en_conformer_ctc_large_ls.nemo
I think WER seems high. Is it usual?
The text was updated successfully, but these errors were encountered: