Skip to content

str::splitn(N, ' ') is slower than it ought to be #82471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jeffsmith82 opened this issue Feb 24, 2021 · 11 comments
Open

str::splitn(N, ' ') is slower than it ought to be #82471

jeffsmith82 opened this issue Feb 24, 2021 · 11 comments
Labels
A-str Area: str and String C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jeffsmith82
Copy link

jeffsmith82 commented Feb 24, 2021

I wrote a small benchmark and was surprised at how much slower it was using str::splitn rather than using bytes directly.

splitn
Time elapsed in parse_response() is: 2.290853462s

as bytes
Time elapsed in parse_response() is: 661.947666ms

Is it possible to do something similar to #46693 where if splitting on a single byte.

https://users.rust-lang.org/t/performance-comparison/56041/2

use std::time::{Duration, Instant};
fn main() {
    let response = String::from("HTTP/1.1 418 I'm a teapot\r\n");
    let mut res: (&str, &str, &str) = ("", "", "");
    let start = Instant::now();
    for _ in 0..100_000_000 {
        res = match parse_http(&response) {
            Ok(data) => data,
            Err(_) => {
                continue;
            }
        };
    }
    let duration = start.elapsed();

    println!("version:{}\ncode:{}\ndescription:{}\n", res.0, res.1, res.2);
    println!("Time elapsed in parse_response() is: {:?}", duration);
    
    let start2 = Instant::now();
    for _ in 0..100_000_000 {
        res = match parse_http2(&response) {
            Ok(data) => data,
            Err(_) => {
                continue;
            }
        };
    }
    let duration2 = start2.elapsed();

    println!("version:{}\ncode:{}\ndescription:{}\n", res.0, res.1, res.2);
    println!("Time elapsed in parse_response() is: {:?}", duration2);
}

fn parse_http(s: &str) -> Result<(&str, &str, &str), &str> {
    let mut parts = s.splitn(3, ' ');
    let version = parts.next().ok_or("No Version")?;
    let code = parts.next().ok_or("No status code")?;
    let description = parts.next().ok_or("No description")?;
    Ok((version, code, description))
}

fn parse_http2(s: &str) -> Result<(&str, &str, &str), &str> {
    let mut bytes = s.as_bytes().iter();
    let i = bytes.position(|b| *b == b' ').ok_or("No Version")?;
    let j = bytes.position(|b| *b == b' ').ok_or("No status code")? + i + 1;

    let version = &s[..i];
    let code = &s[(i + 1)..j];
    let description = &s[(j + 1)..];
    Ok((version, code, description))
}
@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 24, 2021
@the8472 the8472 added the A-str Area: str and String label Feb 8, 2022
@lolbinarycat lolbinarycat added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Apr 11, 2025
@bikbov
Copy link

bikbov commented Apr 24, 2025

Hi. I would like to take this issue.

@mati865
Copy link
Member

mati865 commented Apr 24, 2025

@rustbot assign @bikbov

You can use @[DELETE_THIS]rustbot claim (obviously, without [DETLETE_THIS]) to assign yourself.

@rustbot
Copy link
Collaborator

rustbot commented Apr 24, 2025

Error: Parsing assign command in comment failed: ...'ign bikbov' | error: user should start with @ at >| '

You can '...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@mati865
Copy link
Member

mati865 commented Apr 24, 2025

@rustbot assign @bikbov

@bikbov
Copy link

bikbov commented Apr 25, 2025

@rustbot claim

@bikbov
Copy link

bikbov commented Apr 25, 2025

@rustbot assign @bikbov

You can use @[DELETE_THIS]rustbot claim (obviously, without [DETLETE_THIS]) to assign yourself.

thank you

@bend-n
Copy link
Contributor

bend-n commented May 25, 2025

its worth mentioning that the changes made in #46735 (at least in 2025) already modify this code path; large gains cant really be expected, at least for this extremely small string.

(and besides, memchr isnt really that good for extremely small strings, and rustc's memchr is not that good at all.)

jieyouxu added a commit to jieyouxu/rust that referenced this issue May 26, 2025
speed up charsearcher for ascii chars

attempt at fixing rust-lang#82471

this implementation should be valid because ascii characters are always one byte and there are no continuation bytes that overlap with ascii characters

im not completely sure that this is _always_ an improvement but it seems to be an improvement for this case and i dont think it can significantly regress any cases
@bikbov
Copy link

bikbov commented May 26, 2025

@rustbot release-assignment

rust-timer added a commit that referenced this issue May 26, 2025
Rollup merge of #141516 - bend-n:okay, r=workingjubilee

speed up charsearcher for ascii chars

attempt at fixing #82471

this implementation should be valid because ascii characters are always one byte and there are no continuation bytes that overlap with ascii characters

im not completely sure that this is _always_ an improvement but it seems to be an improvement for this case and i dont think it can significantly regress any cases
@bend-n
Copy link
Contributor

bend-n commented May 27, 2025

@rustbot close

@rustbot
Copy link
Collaborator

rustbot commented May 27, 2025

Error: The feature close is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

github-actions bot pushed a commit to rust-lang/miri that referenced this issue May 27, 2025
speed up charsearcher for ascii chars

attempt at fixing rust-lang/rust#82471

this implementation should be valid because ascii characters are always one byte and there are no continuation bytes that overlap with ascii characters

im not completely sure that this is _always_ an improvement but it seems to be an improvement for this case and i dont think it can significantly regress any cases
@RalfJung
Copy link
Member

RalfJung commented May 29, 2025

Reopening as #141516 is getting reverted.

@RalfJung RalfJung reopened this May 29, 2025
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue May 30, 2025
speed up charsearcher for ascii chars

attempt at fixing rust-lang#82471

this implementation should be valid because ascii characters are always one byte and there are no continuation bytes that overlap with ascii characters

im not completely sure that this is _always_ an improvement but it seems to be an improvement for this case and i dont think it can significantly regress any cases
rust-bors bot added a commit that referenced this issue May 31, 2025
… r=<try>

faster charsearcher

attempt to do #141516 better
resolves #82471

r? `@workingjubilee`
rust-bors bot added a commit that referenced this issue Jun 2, 2025
… r=<try>

faster charsearcher

attempt to do #141516 better
resolves #82471

r? `@workingjubilee`
rust-bors bot added a commit that referenced this issue Jun 3, 2025
… r=<try>

faster charsearcher

attempt to do #141516 better
resolves #82471

r? `@workingjubilee`
rust-bors bot added a commit that referenced this issue Jun 4, 2025
… r=<try>

faster charsearcher

attempt to do #141516 better
resolves #82471

r? `@workingjubilee`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-str Area: str and String C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants