You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The easiest way is to set a breakpoint here and waitfor the assistant message:
https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L270
The text was updated successfully, but these errors were encountered:
I noticed this char before, I always just assumed it was a spurious prompt print (since most templates end with >, but I see now that it's repeating the last processed token of the template.
Name and Version
version: 5327 (27ebfca)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
nvidia
Models
all
Problem description & steps to reproduce
After the user prompt is provided, the code enters this branch:
llama.cpp/tools/main/main.cpp
Line 716 in 0cf6725
No new tokens are generated.
However, the following code assumes that there is a new token and it is inserted in the assistant response:
llama.cpp/tools/main/main.cpp
Line 824 in 0cf6725
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: