Skip to content

Eval bug: Regex #13347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ritanlisa opened this issue May 7, 2025 · 1 comment
Open

Eval bug: Regex #13347

Ritanlisa opened this issue May 7, 2025 · 1 comment

Comments

@Ritanlisa
Copy link

Name and Version

with official-complied ollama 0.6.6 (dont know exact version of llama.cpp)

Operating systems

Windows

GGML backends

CUDA

Hardware

11th Gen Intel(R) Core(TM) i7-11800H
NVIDIA GeForce RTX 3050 Laptop

Models

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Problem description & steps to reproduce

(created an issue in ollama project, they pointed out that the regex part came from this project)
reges express \s and \S will not be transfered correctly
also, it cracks when got (?:ExpA|ExpB) Regex Expression

First Bad Commit

No response

Relevant log output

**All following output comes from ollama**
\S expression:

parse: error parsing grammar: unknown escape at \S\s]* "\n</think>\n") "\"" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}


root ::= "\"" ("<think>\n" [\S\s]* "\n<\/think>\n") "\"" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}

llama_grammar_init_impl: failed to parse grammar

 (?:ExpA|ExpB) got simple  in console and ended ollama
@gauravyad86
Copy link

in llama-grammar.cpp

1. Add support for \s and \S

  1. Open up src/llama-grammar.cpp

  2. Scroll to the function called:

    static std::pair<uint32_t, const char *> parse_char(const char * src)
  3. Inside that function, you'll see a switch (src[1]). After the case 'n': or 't':, add:

        case 's':
            return std::make_pair(' ', src + 2); // treat \s as space for now
        case 'S':
            return std::make_pair(0xFFFF, src + 2); // placeholder for non-space logic

You can later replace ' ' and 0xFFFF with more precise logic if needed, but this works fine for most LLM-style grammars.


2. Skip over ?: in group parsing

  1. In the same file, look for this line:

    else if (*pos == '(') {
  2. Just a few lines below that, you'll find:

    pos = parse_space(pos + 1, true);
  3. Right after that line, add:

        if (pos[0] == '?' && pos[1] == ':') {
            pos += 2; // skip over the "?:" part
        }

It should no longer see “unknown escape at \S” or complaints about “(?:…).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants