Skip to content

do not claim that transmute is like memcpy #99614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 3, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 16 additions & 13 deletions library/core/src/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1207,29 +1207,32 @@ extern "rust-intrinsic" {

/// Reinterprets the bits of a value of one type as another type.
///
/// Both types must have the same size. Neither the original, nor the result,
/// may be an [invalid value](../../nomicon/what-unsafe-does.html).
/// Both types must have the same size. Compilation will fail if this is not guaranteed.
///
/// `transmute` is semantically equivalent to a bitwise move of one type
/// into another. It copies the bits from the source value into the
/// destination value, then forgets the original. It's equivalent to C's
/// `memcpy` under the hood, just like `transmute_copy`.
/// destination value, then forgets the original. Note that source and destination
/// are passed by-value, which means if `T` or `U` contains padding, that padding
/// might *not* be preserved by `transmute`.
Copy link
Member

@workingjubilee workingjubilee Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding". I also think the "bitwise move" may throw people off. People's deep intuition is that move means no innate effect on the object unless it is hurled with great force.

Like, I might imagine the Rust AM executes something that looks like

// Like a "real" machine register, but contains potentially any number of bytes.
struct Register([u8]);

// Not a pointer or a reference, but simply the location of something in the AM.
// This can even include something being held in a register.
struct Address;

pub const unsafe extern "rust-abstract-machine" fn transmute<T,U>(arg: T) -> U {
    let mut src_addr = Address::load_address_from(arg);
    let value = Register::load_as_type::<U>(&mut src_addr);
    src_addr.destroy_range_of::<T>(arg); // Burn our bridges behind us.
    return value
}

The catch here is that this is a "move of the bits", but my understanding is what we are really doing is creating a new value that was derived from the original argument. If the Rust AM knows an arbitrary bit must be set or unset in the new type U, the effect of Register::load_as_type::<U> may be to automatically always set or unset that bit. Or it may be preserved exactly as-is. Or it may check if the bit is correctly set or unset in the original T and then, if it is, finish creating U with the appropriate bytes, or if it isn't, pull the bytes from getrandom() instead. "It's UB, I ain't gotta explain shit!"

...In other words, with mem::transmute, we are in fact hurling things with great force, but then, if it was a valid transmutation, we find there's ahem padding at the end, enough to soften the landing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".

Neither do I, so I am not sure what you are alluding to here.

I am not quite sure what to make of your comment -- do you have some wording suggestions?

Note that nothing here is even specific to transmute. Any by-value passing of arguments works this way.

what we are really doing is creating a new value that was derived from the original argument

We are serializing the argument to memory using one format (T), and then deserializing using another format (U).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".

Neither do I, so I am not sure what you are alluding to here.

Ah, I mostly meant that with this wording I think the transformation created in #96140 would still be surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I should have looked more carefully at the issue I am linking. oops

Yeah that is just complaining about UB code not doing what they expect it to do. We already say

    /// Both types must have the same size. Neither the original, nor the result,
    /// may be an [invalid value](../../nomicon/what-unsafe-does.html).

Do you think that needs to be clarified somehow?

Copy link
Member

@workingjubilee workingjubilee Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think the main detail is that programmers often think of UB in an unhelpful way that is more misleading than informative, and here we have an especially UB-prone function, which people are trying to expect things of, so it might help to reiterate the usual concerns of UB just to set expectations more.

Because... compiling while risking UB is not quite "aha, I detect UB, gotcha! nasal demons!" Yet I think that's the folkloric understanding. It's more "for all possible traces of control flow through this function, I may select a machine encoding of this function that produces the correct results assuming mem::transmute's invariants were upheld, and may go wildly wrong if they were not." This is... the "same thing" to logicians, yes? But programmers are often not logicians, even when they are comfortable with logic.

So since this is a Wildly Unsafe function that does Wildly Unsafe transformations yet is nonetheless "necessary", in a certain sense, at least for now, I think it might be helpful to reiterate some form of the usual "these invariants must be upheld, and the compiler may 'help' by inflicting them on your program in a way it deems appropriate, such as (but not limited to) replacing invalid values with valid ones, or removing code that would have resulted in producing an invalid value (for example, the entire function body that contains an invalid call to mem::transmute, which may include your entire program if the compiler has also made certain inlining decisions)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have expanded the wording a bit. I didn't want to go into quite as much length as you did though -- the docs link to the reference page on UB, so if necessary such clarification should be added there, IMO.

///
/// Both the argument and the result must be [valid](../../nomicon/what-unsafe-does.html) at
/// their given type. Violating this condition leads to [undefined behavior][ub]. The compiler
/// will generate code *assuming that you, the programmer, ensure that there will never be
/// undefined behavior*. It is therefore your responsibility to guarantee that every value
/// passed to `transmute` is valid at both types `T` and `U`. Failing to uphold this condition
/// may lead to unexpeced and unstable compilation results. This makes `transmute` **incredibly
/// unsafe**. `transmute` should be the absolute last resort.
///
/// Transmuting pointers to integers in a `const` context is [undefined behavior][ub].
/// Any attempt to use the resulting value for integer operations will abort const-evaluation.
///
/// Because `transmute` is a by-value operation, alignment of the *transmuted values
/// themselves* is not a concern. As with any other function, the compiler already ensures
/// both `T` and `U` are properly aligned. However, when transmuting values that *point
/// elsewhere* (such as pointers, references, boxes…), the caller has to ensure proper
/// alignment of the pointed-to values.
///
/// `transmute` is **incredibly** unsafe. There are a vast number of ways to
/// cause [undefined behavior][ub] with this function. `transmute` should be
/// the absolute last resort.
///
/// Transmuting pointers to integers in a `const` context is [undefined behavior][ub].
/// Any attempt to use the resulting value for integer operations will abort const-evaluation.
///
/// The [nomicon](../../nomicon/transmutes.html) has additional
/// documentation.
/// The [nomicon](../../nomicon/transmutes.html) has additional documentation.
///
/// [ub]: ../../reference/behavior-considered-undefined.html
///
Expand Down