You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experiencing a token count discrepancy when using the Qwen2.5-VL model.
I uploaded 5 images with a resolution of 2048 x 1365 under the setting max_pixels=16384, and the total token count reported was around 71,000 tokens. However, when I use the Qwen2.5-VL processor locally to preprocess the same images, the total token count is only around 18,000 tokens.
This significant difference is causing inference failures on hosted endpoints due to token limits.
Could you please help clarify:
Why the token count is so much higher via the endpoint?
Is there a recommended way to align the endpoint token count with local processing?
Any way to reduce the token usage on multi-image inputs?
System Info
I'm experiencing a token count discrepancy when using the Qwen2.5-VL model.
I uploaded 5 images with a resolution of 2048 x 1365 under the setting max_pixels=16384, and the total token count reported was around 71,000 tokens. However, when I use the Qwen2.5-VL processor locally to preprocess the same images, the total token count is only around 18,000 tokens.
This significant difference is causing inference failures on hosted endpoints due to token limits.
Could you please help clarify:
Why the token count is so much higher via the endpoint?
Is there a recommended way to align the endpoint token count with local processing?
Any way to reduce the token usage on multi-image inputs?
Thanks in advance!
Information
Tasks
Reproduction
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "text",
"text": "Describe this image in one sentence."
}
]
}
],
top_p=None,
temperature=None,
max_tokens=150,
stream=True,
seed=None,
stop=None,
frequency_penalty=None,
presence_penalty=None
)
Expected behavior
Total token is about 18,000.
The text was updated successfully, but these errors were encountered: