You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running TGI on a kubernetes pod with the following arguments:
image: ghcr.io/huggingface/text-generation-inference:3.1.1
args: ["--model-id", "/models/llama3_3_70b_instruct", "--dtype", "bfloat16"]
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Steps to reproduce behaviour:
Use AsyncClient (text_generation) with generate and grammar Parameters
Expected queue time to be similar to normal inference or slightly longer.
Behaviour using same container without guidance or structured generation gets the following:
validation_time="5.188854ms" queue_time="129.181µs"
The text was updated successfully, but these errors were encountered:
System Info
Running TGI on a kubernetes pod with the following arguments:
image: ghcr.io/huggingface/text-generation-inference:3.1.1
args: ["--model-id", "/models/llama3_3_70b_instruct", "--dtype", "bfloat16"]
Information
Tasks
Reproduction
Steps to reproduce behaviour:
class Citation(BaseModel):
chunk_id: uuid.UUID
citation_id: int
extract: str
url: HttpUrl
class Sentence(BaseModel):
content: str
citation_ids: List[int]
class Paragraph(BaseModel):
header: str
sentences: List[Sentence]
class Generation(BaseModel):
paragraphs: List[Paragraph]
citations: List[Citation]
logs:

Key issues:
validation_time="5.669107ms" queue_time="161.454300714s"
Expected behavior
Expected queue time to be similar to normal inference or slightly longer.
Behaviour using same container without guidance or structured generation gets the following:
validation_time="5.188854ms" queue_time="129.181µs"
The text was updated successfully, but these errors were encountered: