You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deployment of the mistralai/Ministral-8B-Instruct-2410 model using Text Generation Inference (TGI) via Google Cloud Vertex AI fails during model shard initialization due to tokenizer loading error.
📍 Environment
Platform: Google Cloud Vertex AI (Endpoints)
Container: ghcr.io/huggingface/text-generation-inference:1.4
Model ID: mistralai/Ministral-8B-Instruct-2410
CUDA: NVIDIA T4 (capability 7.5), same for A100, same for L4
HF Token: Verified & working
Model download: Succeeds (all .safetensors files retrieved)
docker tag ghcr.io/huggingface/text-generation-inference:1.4 \
us-central1-docker.pkg.dev/YOUR_PROJECT_ID/mistral-container-repo/mistral8b-inference:latest
record your Model ID you will need it in next steps
###create Vertex Endpoint
gcloud ai endpoints create \
--region=us-central1 \
--display-name="mistral-8b-endpoint"
response
Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [****************]...done.
Created Vertex AI endpoint: projects/********/locations/us-central1/endpoints/*********************.
Please note endpoint ID after /endpoints/.. You will need it in ext steps
Replace the Endpoint ID and Model ID with real numeric values you have
After a while the command line will produce a link to logs where you can see all details
❌ Error Observed (see attached log file for details )
TGI fails on model initialization with:
Exception: data did not match any variant of untagged enum ModelWrapper at line 1217944 column 3
Full stack trace includes:
tokenization_llama_fast.py
TokenizerFast.from_file(...)
AutoTokenizer.from_pretrained(...)
See full log snapshot here:
Exception: data did not match any variant of untagged enum ModelWrapper at line 1217944 column 3
(Available in full in attached log)
📂 Model Download Logs (Success)
/data/models--mistralai--Ministral-8B-Instruct-2410/snapshots/.../model-0000X-of-00004.safetensors
consolidated.safetensors downloaded successfully
Logs confirm: "Successfully downloaded weights."
🧪 Additional Verifications
✅ AutoTokenizer.from_pretrained(...) works locally on CPU
✅ huggingface-cli login and transformers-cli env confirmed
⚠️ TGI tokenizer loading fails only inside container/GCP
🤔 Hypothesis
The tokenizer JSON or fast tokenizer files (likely tokenizer.json) contain an unexpected enum tag or structure not properly parsed by tokenizers or TGI.
Possible schema incompatibility between tokenizer JSON and Rust-based tokenizers used inside TGI.
Ministral-8B-Instruct-2410 tokenizer may include experimental or malformed constructs not validated in TGI’s inference pipeline.
The tokenizer file has likely grown large (line 1217944) and might be malformed or partially truncated during sync/cache in GCP.
docker tag ghcr.io/huggingface/text-generation-inference:1.4 \
us-central1-docker.pkg.dev/YOUR_PROJECT_ID/mistral-container-repo/mistral8b-inference:latest
record your Model ID you will need it in next steps
###create Vertex Endpoint
gcloud ai endpoints create \
--region=us-central1 \
--display-name="mistral-8b-endpoint"
response
Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [****************]...done.
Created Vertex AI endpoint: projects/********/locations/us-central1/endpoints/*********************.
Please note endpoint ID after /endpoints/.. You will need it in ext steps
TGI should deserialize the tokenizer using the Hugging Face tokenizers Rust backend.
If the tokenizer format includes fields outside the expected schema, TGI should either:
Gracefully handle the mismatch (as transformers does), or
Emit a clearer error message and suggest workarounds
The Rust-based deserializer in TGI currently fails with:
Exception: data did not match any variant of untagged enum ModelWrapper at line 1217944 column 3
This likely originates from the tokenizers Rust crate where .json deserialization is handled via serde. The ModelWrapper enum expects well-defined variants such as BPE, WordPiece, Unigram, or WordLevel.
Python-based transformers uses flexible class instantiation with fallback logic and can tolerate schema drift or missing fields, which TGI cannot.
Since the model is public, loads fine in transformers, and includes all expected files, the user expects it to work in text-generation-inference as well — especially since TGI is positioned as a standard HF-hosted inference backend.
The text was updated successfully, but these errors were encountered:
pavlonator
changed the title
Tokenizer loading fails for mistralai/Ministral-8B-Instruct-2410 using TGI on GCP Vertex AI
Tokenizer loading fails for mistralai/Mistral-8B-Instruct-2410 using TGI on GCP Vertex AI
Apr 10, 2025
pavlonator
changed the title
Tokenizer loading fails for mistralai/Mistral-8B-Instruct-2410 using TGI on GCP Vertex AI
Tokenizer loading fails for mistralai/Ministral-8B-Instruct-2410 using TGI on GCP Vertex AI
Apr 10, 2025
System Info
❗ Summary
Deployment of the mistralai/Ministral-8B-Instruct-2410 model using Text Generation Inference (TGI) via Google Cloud Vertex AI fails during model shard initialization due to tokenizer loading error.
📍 Environment
Platform: Google Cloud Vertex AI (Endpoints)
Container: ghcr.io/huggingface/text-generation-inference:1.4
Model ID: mistralai/Ministral-8B-Instruct-2410
CUDA: NVIDIA T4 (capability 7.5), same for A100, same for L4
HF Token: Verified & working
Model download: Succeeds (all .safetensors files retrieved)
✅ Steps to Reproduce
configure gcloud client, create project in gcloud
###cretae docker compose
configure docker
tag docker image
###push docker image
upload the model to VertexAI
Note your numeric Model ID by running list models
record your Model ID you will need it in next steps
###create Vertex Endpoint
response
Please note endpoint ID after /endpoints/.. You will need it in ext steps
Deploy Vertex AI Endpoint via:
OR for T4
OR for A100
Replace the Endpoint ID and Model ID with real numeric values you have
After a while the command line will produce a link to logs where you can see all details
❌ Error Observed (see attached log file for details )
TGI fails on model initialization with:
Full stack trace includes:
tokenization_llama_fast.py
TokenizerFast.from_file(...)
AutoTokenizer.from_pretrained(...)
See full log snapshot here:
(Available in full in attached log)
📂 Model Download Logs (Success)
/data/models--mistralai--Ministral-8B-Instruct-2410/snapshots/.../model-0000X-of-00004.safetensors
consolidated.safetensors downloaded successfully
Logs confirm: "Successfully downloaded weights."
🧪 Additional Verifications
✅ AutoTokenizer.from_pretrained(...) works locally on CPU
✅ huggingface-cli login and transformers-cli env confirmed
🤔 Hypothesis
The tokenizer JSON or fast tokenizer files (likely tokenizer.json) contain an unexpected enum tag or structure not properly parsed by tokenizers or TGI.
Possible schema incompatibility between tokenizer JSON and Rust-based tokenizers used inside TGI.
Ministral-8B-Instruct-2410 tokenizer may include experimental or malformed constructs not validated in TGI’s inference pipeline.
The tokenizer file has likely grown large (line 1217944) and might be malformed or partially truncated during sync/cache in GCP.
📌 Suggested Cross-References
MistralAI: mistralai/Ministral-8B-Instruct-2410
TGI GitHub Repository
Consider syncing with HuggingFace and Mistral teams to validate tokenizer.json file against expected schema.
🧷 Suggested Fix Paths
Validate and lint tokenizer config (tokenizer.json) for malformed entries.
Test deployment with a smaller/older version like Mistral-7B-Instruct-v0.2 (this worked).
Provide fallback or validation inside AutoTokenizer.from_pretrained() to raise meaningful error.
📎 Log File I’ve attached the full TGI shard log from GCP as evidence:
downloaded-logs-20250409-170852.json
Information
Tasks
Reproduction
✅ Steps to Reproduce
configure gcloud client, create project in gcloud
###cretae docker compose
configure docker
tag docker image
###push docker image
upload the model to VertexAI
Note your numeric Model ID by running list models
record your Model ID you will need it in next steps
###create Vertex Endpoint
response
Please note endpoint ID after /endpoints/.. You will need it in ext steps
Deploy Vertex AI Endpoint via:
OR for T4
OR for A100
Replace the Endpoint ID and Model ID with real numeric values you have
After a while the command line will produce a link to logs where you can see all details
Expected behavior
✅ Expected Behavior
The model
mistralai/Ministral-8B-Instruct-2410
should load successfully in TGI when provided with:MODEL_ID
set in environmentTGI should load the tokenizer from the model repository using the files provided:
tokenizer_config.json
tokenizer.json
(the fast tokenizer graph)special_tokens_map.json
tokenizer.model
(if used for SentencePiece or BPE)This should match the behavior of the following Python code, which works correctly:
TGI should deserialize the tokenizer using the Hugging Face
tokenizers
Rust backend.If the tokenizer format includes fields outside the expected schema, TGI should either:
transformers
does), orThe Rust-based deserializer in TGI currently fails with:
This likely originates from the
tokenizers
Rust crate where.json
deserialization is handled viaserde
. TheModelWrapper
enum expects well-defined variants such asBPE
,WordPiece
,Unigram
, orWordLevel
.Python-based
transformers
uses flexible class instantiation with fallback logic and can tolerate schema drift or missing fields, which TGI cannot.Since the model is public, loads fine in
transformers
, and includes all expected files, the user expects it to work intext-generation-inference
as well — especially since TGI is positioned as a standard HF-hosted inference backend.The text was updated successfully, but these errors were encountered: