huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.2k
Star 10.1k

Code
Issues 247
Pull requests 29
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

Beta

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

247 Open 1,268 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Gemma3: CUDA error: an illegal memory access was encountered

#3227 opened May 14, 2025 by sebastianliebscher

2 of 4 tasks

Quantized Qwen3

#3226 opened May 14, 2025 by stevaras2

1 of 4 tasks

Launching a container with an unprivileged user

#3225 opened May 12, 2025 by chyundunovDatamonsters

Docker Image that works on RTX 5090

#3219 opened May 9, 2025 by shawnrushefsky

Model request: RecurrentGemma

#3216 opened May 9, 2025 by redbrain

2 tasks done

Phi 4 Reasoning Not able to Start

#3215 opened May 8, 2025 by zacksiri

2 of 4 tasks

Strange output when using Structured output with Gemma 3 12b it

#3214 opened May 8, 2025 by zacksiri

2 of 4 tasks

Whether it supports Huawei Atlas300 graphics card?

#3213 opened May 8, 2025 by fxb392

1 of 4 tasks

Upgrade torch to 2.7 to support Blackwell GPUs

#3209 opened May 5, 2025 by CrashTimeV

Can I use TGI in a Supercomputer?

#3208 opened May 3, 2025 by sleepingcat4

Conflicting short argument -p

#3205 opened May 1, 2025 by leejuyuu

1 of 4 tasks

Different result between /chat/completions and /generate endpoint

#3203 opened May 1, 2025 by ibndias

2 of 4 tasks

Qwen 3 support

#3199 opened Apr 29, 2025 by maziyarpanahi

2 tasks done

Failure to install https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

#3195 opened Apr 27, 2025 by vishnuchalla

3 of 4 tasks

Extremely high calculated token count for VLM (Qwen 2.5 VL

#3191 opened Apr 24, 2025 by boris-lok-pentadoc

2 of 4 tasks

Does tgi support RTX 5090 yet?

#3190 opened Apr 24, 2025 by bino282

Zero config not working for VLMs

#3181 opened Apr 16, 2025 by loganlebanoff

2 of 4 tasks

deepseek-r1-awq overflow

#3180 opened Apr 16, 2025 by taishan1994

3 of 4 tasks

Token count discrepancy when using Qwen2.5-VL with multiple images

#3177 opened Apr 15, 2025 by JjjFangg

4 tasks

RuntimeError on CUDA capture with FP8 when deploying Llama-4-Maverick on TGI 3.2.3 with using H100 GPUS

#3175 opened Apr 15, 2025 by nskpro-cmd

Extended queuing time when using guidance (structured generation) and grammar

#3173 opened Apr 15, 2025 by jazken

2 of 4 tasks

Llama Inference using TGI

#3168 opened Apr 13, 2025 by Akhil-ender

4 tasks

BatchPrefillWithPagedKVCacheWrapper.plan() got an unexpected keyword argument 'head_dim'

#3165 opened Apr 11, 2025 by ruckc

Tokenizer loading fails for mistralai/Ministral-8B-Instruct-2410 using TGI on GCP Vertex AI

#3163 opened Apr 10, 2025 by pavlonator

2 of 4 tasks

Depricated system cuda?

#3159 opened Apr 9, 2025 by SquadUpSquid

2 of 4 tasks

Previous 1 2 3 4 5 … 9 10 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly