examples : add configuration presets #10932

ggerganov · 2024-12-21T09:10:47Z

Description

I was recently looking for ways to demonstrate some of the functionality of the llama.cpp examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim FIM server:

llama-server \
    -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
    --log-file ./service-vim.log \
    --host 0.0.0.0 --port 8012 \
    --ctx-size 0 \
    --cache-reuse 256 \
    -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1

It would be much cleaner if I could just run, for example:

llama-server --cfg-fim-7b

Or if I could turn this embedding server command into something simpler:

# llama-server \
#     --hf-repo ggml-org/bert-base-uncased \
#     --hf-file          bert-base-uncased-Q8_0.gguf \
#     --port 8033 -c 512 --embeddings --pooling mean

llama-server --cfg-embd-bert --port 8033

Implementation

There is already an initial example of how we can create such configuration presets:

llama-tts --tts-oute-default -p "This is a TTS preset"

# equivalent to
# 
# llama-tts \
#    --hf-repo   OuteAI/OuteTTS-0.2-500M-GGUF \
#    --hf-file          OuteTTS-0.2-500M-Q8_0.gguf \
#    --hf-repo-v ggml-org/WavTokenizer \
#    --hf-file-v          WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"

https://github.com/ggerganov/llama.cpp/blob/5cd85b5e008de2ec398d6596e240187d627561e3/common/arg.cpp#L2208-L2220

This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.

Goal

The goal of this issue is to create such presets for various common tasks:

Run a basic TTS generation (see above)
Start a chat server with a commonly used model
Start a speculative-decoding-enabled chat server with a commonly used model
Start a FIM server for plugins such as llama.vim
Start an embedding server with a commonly used embedding model
Start a reranking server with a commonly used reranking model
And many more ..

The list of configuration presets would require curation and proper documentation.

I think this is a great task for new contributors to help and to get involved in the project.

The text was updated successfully, but these errors were encountered:

sramichetty20019 · 2024-12-22T00:46:34Z

Hi! I'm interested in contributing to this issue as a first-time contributor I'd like to work on implementing the chat server preset for commonly used models.

ngxson · 2024-12-22T17:37:49Z

IMO having a flag --preset can be much more intuitive for most users. For example:

llama-server --preset qwen-fim-7b
llama-server --preset embd-bert
...

Or we can even introduce positional parameters (I steal the idea from ollama):

llama-server launch qwen-fim-7b

This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggml-org#10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: #10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder 1.5B model. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with the following command: ```console $ llama.vim --vim-qwen2.5-Coder-1.5B-default ``` Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: #10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: ggml-org#10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: ggml-org#10932

* common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: ggml-org#10932

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: ggml-org#10932

vahedshaik · 2025-05-07T00:08:54Z

Hi! I'd like to contribute by adding a preset for a chat server using the Mistral-7B model (TheBloke/Mistral-7B-Instruct-v0.2-GGUF). I'll include the model, file, and recommended flags in a config wrapper to simplify llama-server usage.

Planning to name it --preset mistral-chat-7b. Please let me know if that works.

ngxson · 2025-05-07T21:41:08Z

@vahedshaik tbh I'm not comfortable adding normal language models to this preset list (for now, we only add models having very particular config like FIM). Allowing adding more models will only open a flood gate of people wants to promote their model by adding these to our list, basically pollute it.

btw @ggerganov I'm thinking about adding positional argument which basically make it like llama-run, we could have implement this presets as "catalog", so user can do something like llama-server qwen-fim-7b. The catalog can be stored in a dedicated catalog.h file which contains a mention about our policy for adding models in to the list, preventing abusing it. WDYT about this?

ggerganov added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers examples labels Dec 21, 2024

ggerganov pinned this issue Dec 21, 2024

This comment was marked as off-topic.

Sign in to view

danbev mentioned this issue Feb 5, 2025

common : add default embeddings presets #11677

Merged

danbev mentioned this issue Feb 18, 2025

common : add llama.vim preset for Qwen2.5 Coder #11945

Merged

ca1yz mentioned this issue May 7, 2025

common: add default reranker presets #13352

Open

ngxson mentioned this issue May 8, 2025

arg : add model catalog #13385

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples : add configuration presets #10932

examples : add configuration presets #10932

ggerganov commented Dec 21, 2024 •

edited by danbev

Loading

sramichetty20019 commented Dec 22, 2024 •

edited

Loading

ngxson commented Dec 22, 2024

This comment was marked as off-topic.

vahedshaik commented May 7, 2025

ngxson commented May 7, 2025 •

edited

Loading

examples : add configuration presets #10932

examples : add configuration presets #10932

Comments

ggerganov commented Dec 21, 2024 • edited by danbev Loading

Description

Implementation

Goal

sramichetty20019 commented Dec 22, 2024 • edited Loading

ngxson commented Dec 22, 2024

This comment was marked as off-topic.

vahedshaik commented May 7, 2025

ngxson commented May 7, 2025 • edited Loading

ggerganov commented Dec 21, 2024 •

edited by danbev

Loading

sramichetty20019 commented Dec 22, 2024 •

edited

Loading

ngxson commented May 7, 2025 •

edited

Loading