`random_seed` seems to be ignored (or at least inconsistent) for inflight_batcher_llm

### System Info

I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input, but I've found that if I make several requests concurrently, several will have identical outputs.

I'm setting `top_p=1, top_k=1024, temperature=1.0, beam_width=1`, and generating a unique random seed for each request. The requests are being made over the gRPC API, and I'm using v0.9.0 of TensorRT-LLM and tensorrtllm_backend.

### Who can help?

@byshiue 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

1. Serve a model (essentially following this guide, with some settings changes: https://developer.nvidia.com/blog/turbocharging-meta-llama-3-performance-with-nvidia-tensorrt-llm-and-nvidia-triton-inference-server/)
2. Make 5 gRPC requests concurrently

### Expected behavior

I expect each request with a different seed to yield a different response

### actual behavior

Several of the 5 responses are consistently identical

### additional notes

I changed the script I'm using for testing to wait for a response before sending another request, and this results in all 5 outputs being distinct, so it seems like the concurrency/inflight batching really is the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`random_seed` seems to be ignored (or at least inconsistent) for inflight_batcher_llm #468

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

random_seed seems to be ignored (or at least inconsistent) for inflight_batcher_llm #468

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`random_seed` seems to be ignored (or at least inconsistent) for inflight_batcher_llm #468