ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). #136

maxin-cn · 2025-04-08T04:51:13Z

import torch
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel

# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.5.safetensors"
transformer = LTXVideoTransformer3DModel.from_single_file(
  single_file_url, torch_dtype=torch.bfloat16
)
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
pipe = LTXImageToVideoPipeline.from_pretrained(
  "Lightricks/LTX-Video", transformer=transformer, vae=vae, torch_dtype=torch.bfloat16
)

# ... inference code ...

The above code will encounter the following error:

Traceback (most recent call last):
  File "./ltx-video/1.py", line 9, in <module>
    vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
  File "/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/python3.10/site-packages/diffusers/loaders/single_file_model.py", line 355, in from_single_file
    unexpected_keys = load_model_dict_into_meta(
  File "/python3.10/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

It seems that the 0.9.5 version is not integrated into the diffusers.

The text was updated successfully, but these errors were encountered:

Zh1ym · 2025-04-16T04:02:06Z

+1 ,the same problem

XuWuLingYu · 2025-04-21T07:46:29Z

same problem, only v0.9.1 is supported for training.

nitinmukesh · 2025-04-21T08:21:09Z

@maxin-cn @Zh1ym
0.9.5 for inference is integrated in diffusers. I am not sure about training.

I have integrated it here (All 3 inference types are supported)
https://github.com/newgenai79/sd-diffuser-webui

You can refer this file for code
https://github.com/newgenai79/sd-diffuser-webui/blob/main/modules/video_generation/tab_ltx095.py

rayli09 · 2025-04-25T22:09:45Z

any updates on this? running into the same problem, can't use 0.9.5 for diffusers inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). #136

ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). #136

maxin-cn commented Apr 8, 2025

Zh1ym commented Apr 16, 2025

XuWuLingYu commented Apr 21, 2025

nitinmukesh commented Apr 21, 2025 •

edited

Loading

rayli09 commented Apr 25, 2025

ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). #136

ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). #136

Comments

maxin-cn commented Apr 8, 2025

Zh1ym commented Apr 16, 2025

XuWuLingYu commented Apr 21, 2025

nitinmukesh commented Apr 21, 2025 • edited Loading

rayli09 commented Apr 25, 2025

nitinmukesh commented Apr 21, 2025 •

edited

Loading