Skip to content

Commit 071b4b6

Browse files
committed
correct the primitive char doc's use of bytes and code points
Previously the docs suggested that '❤️' doesn't fit in a char because it's 6 bytes. But that's misleading. 'a̚' also doesn't fit in a char, even though it's only 3 bytes. The important thing is the number of code points, not the number of bytes. Clarify the primitive char docs around this.
1 parent 17d284b commit 071b4b6

File tree

1 file changed

+11
-21
lines changed

1 file changed

+11
-21
lines changed

src/libstd/primitive_docs.rs

Lines changed: 11 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -50,27 +50,30 @@ mod prim_bool { }
5050
/// [`String`]: string/struct.String.html
5151
///
5252
/// As always, remember that a human intuition for 'character' may not map to
53-
/// Unicode's definitions. For example, emoji symbols such as '❤️' are more than
54-
/// one byte; ❤️ in particular is six:
53+
/// Unicode's definitions. For example, emoji symbols such as '❤️' can be more
54+
/// than one Unicode code point; this ❤️ in particular is two:
5555
///
5656
/// ```
5757
/// let s = String::from("❤️");
5858
///
59-
/// // six bytes times one byte for each element
60-
/// assert_eq!(6, s.len() * std::mem::size_of::<u8>());
59+
/// // we get two chars out of a single ❤️
60+
/// let mut iter = s.chars();
61+
/// assert_eq!(Some('\u{2764}'), iter.next());
62+
/// assert_eq!(Some('\u{fe0f}'), iter.next());
63+
/// assert_eq!(None, iter.next());
6164
/// ```
6265
///
63-
/// This also means it won't fit into a `char`, and so trying to create a
64-
/// literal with `let heart = '❤️';` gives an error:
66+
/// This means it won't fit into a `char`. Trying to create a literal with
67+
/// `let heart = '❤️';` gives an error:
6568
///
6669
/// ```text
6770
/// error: character literal may only contain one codepoint: '❤
6871
/// let heart = '❤️';
6972
/// ^~
7073
/// ```
7174
///
72-
/// Another implication of this is that if you want to do per-`char`acter
73-
/// processing, it can end up using a lot more memory:
75+
/// Another implication of the 4-byte fixed size of a `char`, is that
76+
/// per-`char`acter processing can end up using a lot more memory:
7477
///
7578
/// ```
7679
/// let s = String::from("love: ❤️");
@@ -79,19 +82,6 @@ mod prim_bool { }
7982
/// assert_eq!(12, s.len() * std::mem::size_of::<u8>());
8083
/// assert_eq!(32, v.len() * std::mem::size_of::<char>());
8184
/// ```
82-
///
83-
/// Or may give you results you may not expect:
84-
///
85-
/// ```
86-
/// let s = String::from("❤️");
87-
///
88-
/// let mut iter = s.chars();
89-
///
90-
/// // we get two chars out of a single ❤️
91-
/// assert_eq!(Some('\u{2764}'), iter.next());
92-
/// assert_eq!(Some('\u{fe0f}'), iter.next());
93-
/// assert_eq!(None, iter.next());
94-
/// ```
9585
mod prim_char { }
9686

9787
#[doc(primitive = "unit")]

0 commit comments

Comments
 (0)