-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Optimize is_ascii_digit() and is_ascii_hexdigit() #67143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current implementation of these two functions does redundant checks that are currently not optimized away by LLVM. The is_digit() method on char only matches ASCII digits. It does not do the extra check of whether the character is in ASCII range. This change optimizes the code in both debug and release mode on various major architectures. It removes all conditional branches on x86 and all conditional branches and returns on ARM. Pseudocode version of LLVM's optimized output: old version: ```rust if c >= 128 { return false; } c -= 48; if c > 9 { return false; } true ``` ```rust c -= 48; if c < 10 { return true; } false ``` The is_ascii_hexdigit() change similarly shortens the emitted code in release mode. The optimized version uses bit manipulation to remove one comparison and conditional branch. It would be possible to add a similar change to the is_digit() code, but that has not been done yet. Godbolt comparison between the two implementations: https://godbolt.org/z/gDwfSF
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
I forgot to invert the constant to 0x20 after putting an inversion operator in front of it.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors try @rust-timer queue |
Awaiting bors try build completion |
Optimize is_ascii_digit() and is_ascii_hexdigit() The current implementation of these two functions does redundant checks that are currently not optimized away by LLVM. The is_digit() method on char only matches ASCII digits. It does not do the extra check of whether the character is in ASCII range. This change optimizes the code in both debug and release mode on various major architectures. It removes all conditional branches on x86 and all conditional branches and returns on ARM. Pseudocode version of LLVM's optimized output: old version: ```rust if c >= 128 { return false; } c -= 48; if c > 9 { return false; } true ``` new version: ```rust c -= 48; if c < 10 { return true; } false ``` The is_ascii_hexdigit() change similarly shortens the emitted code in release mode. The optimized version uses bit manipulation to remove one comparison and conditional branch. It would be possible to add a similar change to the is_digit() code, but that has not been done yet. Godbolt comparison between the two implementations: https://godbolt.org/z/gDwfSF
☀️ Try build successful - checks-azure |
Queued 0d5360f with parent e862c01, future comparison URL. |
Finished benchmarking try commit 0d5360f, comparison URL. |
@@ -1245,7 +1245,14 @@ impl char { | |||
#[stable(feature = "ascii_ctype_on_intrinsics", since = "1.24.0")] | |||
#[inline] | |||
pub fn is_ascii_hexdigit(&self) -> bool { | |||
self.is_ascii() && (*self as u8).is_ascii_hexdigit() | |||
if !self.is_ascii() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not change this like is_ascii_digit to self.is_digit(16)
? Does that end up with worse codegen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, currently is_digit(x) checks two separate alphabetic ranges instead of one bit-manipulated one, but i could change that if desired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems like it'd be best to fix is_digit.
Ping from triage: @DarkKirb could you add the change mentioned on the review? |
Pinging again from triage: |
Pinging again from triage - |
The current implementation of these two functions does redundant checks
that are currently not optimized away by LLVM. The is_digit() method
on char only matches ASCII digits. It does not do the extra check of
whether the character is in ASCII range.
This change optimizes the code in both debug and release mode on various
major architectures. It removes all conditional branches on x86 and all
conditional branches and returns on ARM.
Pseudocode version of LLVM's optimized output:
old version:
new version:
The is_ascii_hexdigit() change similarly shortens the emitted code in
release mode. The optimized version uses bit manipulation to remove one
comparison and conditional branch. It would be possible to add a similar
change to the is_digit() code, but that has not been done yet.
Godbolt comparison between the two implementations:
https://godbolt.org/z/gDwfSF