Skip to content

Add warnings about UTF-16 vs UTF-8 strings #1416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 5, 2019

Conversation

alexcrichton
Copy link
Contributor

This commit aims to address #1348 via a number of strategies:

  • Documentation is updated to warn about UTF-16 vs UTF-8 problems
    between JS and Rust. Notably documenting that as_string and handling
    of arguments is lossy when there are lone surrogates.

  • A JsString::is_valid_utf16 method was added to test whether
    as_string is lossless or not.

The intention is that most default behavior of wasm-bindgen will
remain, but where necessary bindings will use JsString instead of
str/String and will manually check for is_valid_utf16 as
necessary. It's also hypothesized that this is relatively rare and not
too performance critical, so an optimized intrinsic for is_valid_utf16
is not yet provided.

Closes #1348

@alexcrichton alexcrichton requested a review from Pauan April 1, 2019 18:19
@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from 588a54c to c8fdfac Compare April 2, 2019 20:54
@alexcrichton alexcrichton requested a review from Pauan April 2, 2019 20:54
Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also still think as_string should be changed to into)

@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from c8fdfac to d212fac Compare April 3, 2019 15:23
@alexcrichton alexcrichton requested a review from Pauan April 3, 2019 15:23
Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few nits, this is looking really good! I like how this turned out.

@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from d212fac to 2452d25 Compare April 4, 2019 20:38
@alexcrichton
Copy link
Contributor Author

👍 Thanks for the thorough review!

Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more nits, and then this is ready to merge!

This commit aims to address rustwasm#1348 via a number of strategies:

* Documentation is updated to warn about UTF-16 vs UTF-8 problems
  between JS and Rust. Notably documenting that `as_string` and handling
  of arguments is lossy when there are lone surrogates.

* A `JsString::is_valid_utf16` method was added to test whether
  `as_string` is lossless or not.

The intention is that most default behavior of `wasm-bindgen` will
remain, but where necessary bindings will use `JsString` instead of
`str`/`String` and will manually check for `is_valid_utf16` as
necessary. It's also hypothesized that this is relatively rare and not
too performance critical, so an optimized intrinsic for `is_valid_utf16`
is not yet provided.

Closes rustwasm#1348
@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from 2452d25 to 44738e0 Compare April 5, 2019 15:11
@alexcrichton alexcrichton merged commit 16745ed into rustwasm:master Apr 5, 2019
@alexcrichton alexcrichton deleted the js-string-valid-utf16 branch April 5, 2019 15:12
@alexcrichton
Copy link
Contributor Author

Thanks @Pauan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants