Releases · ggml-org/llama.cpp

09 May 10:08

3f96aef

b5322

llama : one-off chat template fix for Mistral-Small-2503 (#13398)

* llama : one-off chat template fix for Mistral-Small-2503

* update readme

* add mistral-v7-tekken

Assets 20

09 May 08:43

github-actions

b5321

b486ba0

b5321

rpc : add rpc_msg_set_tensor_hash_req (#13353)

* rpc : add rpc_msg_set_tensor_hash_req

Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.

* fix

Assets 20

09 May 08:22

github-actions

b5320

02115dc

b5320

vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326)

This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:

GGML_ASSERT(nei0 * nei1 <= 3072);

The tensor is 8 x 512. Increase this array size to accommodate.

Assets 20

08 May 22:02

github-actions

b5318

15e0328

b5318

ci : limit write permission to only the release step + fixes (#13392)

* ci : limit write permission to only the release step

* fix win cuda file name

* fix license file copy on multi-config generators

Assets 20

08 May 18:57

github-actions

b5317

f05a6d7

b5317

mtmd : Expose helper_decode_image_chunk (#13366)

* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free

* Slim down

* Cleanups

Assets 20

08 May 14:24

github-actions

b5315

8c83449

b5315

server : (webui) revamp the input area, plus many small UI improvemen…

Assets 20

08 May 14:23

github-actions

b5313

0ccc121

b5313

mtmd : fix the calculation of n_tokens for smolvlm (#13381)

Co-authored-by: Taichi Nishimura <[email protected]>

Assets 20

08 May 13:04

github-actions

b5311

51fb96b

b5311

context : remove logits_all flag (#13284)

* context : remove logits_all flag

ggml-ci

* llama : remove logits_all flag + reorder llama_context_params

ggml-ci

Assets 20

08 May 12:57

github-actions

b5310

70a6991

b5310

ci : move release workflow to a separate file (#13362)

Assets 20

08 May 15:03

github-actions

b5309

f061021

b5309

llama : print size and type of overridden tensors (#13364)

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5322

b5321

b5320

b5318

b5317

b5315

b5313

b5311

b5310

b5309