Question about \p script matching #351

elirnm · 2017-03-28T10:32:43Z

The docs just say that \p{Blah} matches "Unicode character class (general category or script)." I took this to mean that it would match against anything listed under scripts on http://www.unicode.org/charts/, but there's a number of those that it doesn't recognize, including some ones I would have expected by default such as CJK (or CJK Unified Ideographs).

Is there some documentation of which scripts are supported in \p matching?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2017-03-28T11:16:08Z

Is there some documentation of which scripts are supported in \p matching?

The documentation is intended to be provided by Unicode.

More specifically, the scripts available in this crate are generated directly from this file: http://www.unicode.org/Public/UNIDATA/Scripts.txt

I don't know what the correspondence is between the link you provided and the Scripts.txt file. :-/ It seems like there is a lot of overlap, and maybe there is just a naming mismatch. For example, Scripts.txt contains a Han script, but I don't see a granular breakdown of CJK as listed in your chart page.

elirnm · 2017-03-28T11:26:44Z

Ok, thanks. It looks like Unicode explains the discrepancy on the help page for charts.:

Script and symbol groups are arranged in an order that provide the best fit for the table. The order of presentation may change as additional code charts are added. The names of blocks may be an abbreviated or otherwise modified form of the official block name. In some cases, a subhead may simply be a name which covers the content of several closely related blocks. The names of ranges are freely chosen and may change.

BurntSushi · 2017-03-28T11:33:22Z

@elirnm Well that seems... a little annoying. Maybe we should provide the full list in the regex docs somewhere if Unicode isn't going to make it more accessible themselves.

elirnm · 2017-03-29T09:38:59Z

Or at least provide a link to that scripts file so users can get some idea of what's supported?

saona-raimundo · 2024-01-03T17:55:57Z

Hi!
I would like to revive this thread (I can open a new one if needed).
I was having trouble understanding the extent of the \p notation.
Here are my thoughts.

The crate-level documentation has a list. The first element links to UNICODE.md, while the second to last element talks about the \p notation.
Since elements of a list are at the same level, I suggest bringing UNICODE.md before the whole list.
The documentation in UNICODE.md includes a list of "all properties supported by the regex crate", but the list is missing "Greek", which is part of the examples in the documentation.
Is the list not updated? would you like the list to be generated from the implementation?

BurntSushi · 2024-01-03T18:46:46Z

@saona-raimundo While your comment is tangentially related, please don't bump issues closed years ago. If you have questions, you should open a new Discussion. If you click the "new issue" button, there is even an explicit category for "ask a question":

I copied your second bullet point into a distinct question here and tried to answer it: #1144

I don't understand your first bullet point. Please open a new discussion question and fill out the question more.

BurntSushi added the question label Mar 28, 2017

elirnm closed this as completed Mar 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about \p script matching #351

Question about \p script matching #351

elirnm commented Mar 28, 2017 •

edited

Loading

BurntSushi commented Mar 28, 2017

Uh oh!

elirnm commented Mar 28, 2017 •

edited

Loading

Uh oh!

BurntSushi commented Mar 28, 2017

Uh oh!

elirnm commented Mar 29, 2017

Uh oh!

saona-raimundo commented Jan 3, 2024

Uh oh!

BurntSushi commented Jan 3, 2024

Uh oh!

Question about \p script matching #351

Question about \p script matching #351

Comments

elirnm commented Mar 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BurntSushi commented Mar 28, 2017

Uh oh!

elirnm commented Mar 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented Mar 28, 2017

Uh oh!

elirnm commented Mar 29, 2017

Uh oh!

saona-raimundo commented Jan 3, 2024

Uh oh!

BurntSushi commented Jan 3, 2024

Uh oh!

elirnm commented Mar 28, 2017 •

edited

Loading

elirnm commented Mar 28, 2017 •

edited

Loading