How to convert the torch_dist ckpt to the nemo file? #11761
Unanswered
Kamizato-Ayaka
asked this question in
Q&A
Replies: 1 comment
-
Hi, thanks for the question and apologies for the late reply. I see that you are using the NeMo 1.0 codepath. We have introduced NeMo 2.0, which has more seamless compatibility with HuggingFace. We have some documentation here. Any supported model can be exported to HF format using the export_ckpt API. We recommend upgrading to NeMo 2.0 and testing out the HuggingFace conversion there. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description:
I have trained a MegatronGPTSFT model using NeMo. After the training, I only obtained the torch_dist-organized checkpoint directory but did not get a .nemo file. I now need the .nemo file to convert the model to the Hugging Face format. However, I’m unable to figure out how to convert the torch_dist checkpoint into a .nemo file.
I have tried using the following scripts provided by NeMo:
NeMo/scripts/checkpoint_converters/convert_zarr_to_torch_dist.py
NeMo/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py
Both methods fail at the load_checkpoints step. The error log is as follows:
Steps I Tried:
I verified the checkpoints in multiple Docker environments, but the issue persists.
I reviewed the NeMo documentation and examples but could not find a resolution.
Question:
Is there a better or more robust way to convert torch_dist checkpoints into a .nemo file? Any suggestions or best practices would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions