Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5322
llama : one-off chat template fix for Mistral-Small-2503 (#13398) * llama : one-off chat template fix for Mistral-Small-2503 * update readme * add mistral-v7-tekken
b5321
rpc : add rpc_msg_set_tensor_hash_req (#13353) * rpc : add rpc_msg_set_tensor_hash_req Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which makes the code cleaner. * fix
b5320
vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf: GGML_ASSERT(nei0 * nei1 <= 3072); The tensor is 8 x 512. Increase this array size to accommodate.
b5318
ci : limit write permission to only the release step + fixes (#13392) * ci : limit write permission to only the release step * fix win cuda file name * fix license file copy on multi-config generators
b5317
mtmd : Expose helper_decode_image_chunk (#13366) * mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free * Slim down * Cleanups
b5315
server : (webui) revamp the input area, plus many small UI improvemen…
b5313
mtmd : fix the calculation of n_tokens for smolvlm (#13381) Co-authored-by: Taichi Nishimura <[email protected]>
b5311
context : remove logits_all flag (#13284) * context : remove logits_all flag ggml-ci * llama : remove logits_all flag + reorder llama_context_params ggml-ci
b5310
ci : move release workflow to a separate file (#13362)
b5309
llama : print size and type of overridden tensors (#13364)