Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

What is the true behavior of the regex "\b" #1091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TOETOE55 opened this issue Sep 25, 2023 · 0 comments
Closed

What is the true behavior of the regex "\b" #1091

TOETOE55 opened this issue Sep 25, 2023 · 0 comments

Comments

@TOETOE55
Copy link

"\b" represents word boundaries in regex. But it seems to behave differently in different implementations:

In regex crate, which "can't" was separated

fn main() {
    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    
    let re = Regex::new(r"\b").unwrap();
    let res = re.split(s).collect::<Vec<&str>>();
    // ["", "The", " ", "quick", " (\"", "brown", "\") ", "fox", " ", "can", "'", "t", " ", "jump", " ", "32", ".", "3", " ", "feet", ", ", "right", "?"]
    println!("{res:?}");
}

in swift 5.7

let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?"

let words = s.split(separator: /\b/)
// ["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox", " ", "can\'t", " ", "jump", " ", "32.3", " ", "feet", ",", " ", "right", "?"]
print(words)
// 

In unicode-segmentation crate, it's same as regex in swift, according to the [Unicode Standard Annex #29(http://www.unicode.org/reports/tr29/) rules.

fn main() {
    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    
    let res = s.split_word_bounds().collect::<Vec<&str>>();
    // ["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox", " ", "can't", " ", "jump", " ", "32.3", " ", "feet", ",", " ", "right", "?"]
    println!("{res:?}");
}
@rust-lang rust-lang locked and limited conversation to collaborators Sep 25, 2023
@BurntSushi BurntSushi converted this issue into discussion #1092 Sep 25, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant