Skip to content

Regex with the backreference to a non-greedy group fails unexpectedly #127291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Frimaire opened this issue Nov 26, 2024 · 3 comments
Closed

Regex with the backreference to a non-greedy group fails unexpectedly #127291

Frimaire opened this issue Nov 26, 2024 · 3 comments
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error

Comments

@Frimaire
Copy link

Frimaire commented Nov 26, 2024

Bug report

Bug description:

Hello,

There is a problem in the regex module. The regular expression with the backreference to a non-greedy group fails (matches nothing) on some inputs unexpectedly.

For example:

import re

r1 = re.compile('(a+)+\\1')
print(r1.search('a' * 28))
# OK, matches the whole string

r2 = re.compile('(a+)+?\\1')
print(r2.search('a' * 28))
# runs very slowly and eventually returns None

This problem seems to exist even in Python 2.7, which has been fixed in a version of Python 3.11. Since version 3.9 is still under maintenance, would this problem be fixed in the following release? (or is the pull request about this problem accepted?)

Thank You!

CPython versions tested on:

3.9

Operating systems tested on:

Linux, Windows

@Frimaire Frimaire added the type-bug An unexpected behavior, bug, or error label Nov 26, 2024
@hugovk
Copy link
Member

hugovk commented Nov 26, 2024

I can reproduce on macOS with Python 3.9 and 3.10, but 3.11-3.14 match the whole string quickly.

3.9-3.11 are only receiving security updates, whereas 3.12+ get bug fixes:

https://devguide.python.org/versions/

I don't think this is a security issue? I recommend upgrading to a newer Python version.

@hugovk hugovk added the pending The issue will be closed if no feedback is provided label Nov 26, 2024
@wjssz
Copy link

wjssz commented Nov 26, 2024

I tested. It was fixed in PR #12427, the change for JUMP_REPEAT_ONE_2.

You can use regex module, it has no known bug. Just:

import regex as re

@ericvsmith
Copy link
Member

I don't think this is a security issue, and we can close this.

@ericvsmith ericvsmith closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants