Skip to content

Lifetime guide refactoring #14172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 69 additions & 156 deletions src/doc/guide-lifetimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,12 @@
# Introduction

References are one of the more flexible and powerful tools available in
Rust. A reference can point anywhere: into the managed or exchange
heap, into the stack, and even into the interior of another data structure. A
reference is as flexible as a C pointer or C++ reference. However,
unlike C and C++ compilers, the Rust compiler includes special static checks
that ensure that programs use references safely. Another advantage of
references is that they are invisible to the garbage collector, so
working with references helps reduce the overhead of automatic memory
management.
Rust. They can point anywhere: into the heap, stack, and even into the
interior of another data structure. A reference is as flexible as a C pointer
or C++ reference.

Unlike C and C++ compilers, the Rust compiler includes special static
checks that ensure that programs use references safely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular motivation for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to get rid of the "managed and exchange heap" concept.

However, I can see that I deleted an important things like:

  • the Rust compiler includes special static checks that ensure that programs use references safely.
  • working with references helps reduce the overhead of automatic memory management.

Sidenote:

A goal I have in mind is to change the guides to use the words "pointers" and "references" consistently. Given that a references != pointer. Correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with that, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question.

A reference is as flexible as a C pointer or C++ reference.

In what ways ? I don't think we should compare references with C pointers. It's confusing for the new comers (assuming they are the targeted audience).

Also:

the core concepts will be familiar to anyone who has worked with C or C++

afaik the community is not interested in the dynamic languages new comers, should we really exclude them this bad ?

What's your opinion ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we really exclude them this bad

This is a perpetual question. The current answer is "assume people have systems experience, and let's create some helper stuff to get non-systems people up to speed."

I don't see this language as exclusionary, I see it was "for more, check out this concept in C or C++." Though maybe changing it to simply say that will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
After reading the comment today I think I was being very dramatic. Small changes in phrasing can make a difference though.

Thanks for the feedback <3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. :) You're totally right, small changes can matter.


Despite their complete safety, a reference's representation at runtime
is the same as that of an ordinary pointer in a C program. They introduce zero
Expand All @@ -26,7 +24,7 @@ through several examples.

References, sometimes known as *borrowed pointers*, are only valid for
a limited duration. References never claim any kind of ownership
over the data that they point to: instead, they are used for cases
over the data that they point to, instead, they are used for cases
where you would like to use data for a short time.

As an example, consider a simple struct type `Point`:
Expand All @@ -36,27 +34,23 @@ struct Point {x: f64, y: f64}
~~~

We can use this simple definition to allocate points in many different ways. For
example, in this code, each of these three local variables contains a
point, but allocated in a different place:
example, in this code, each of these local variables contains a point,
but allocated in a different place:

~~~
# struct Point {x: f64, y: f64}
let on_the_stack : Point = Point {x: 3.0, y: 4.0};
let managed_box : @Point = @Point {x: 5.0, y: 1.0};
let owned_box : Box<Point> = box Point {x: 7.0, y: 9.0};
let on_the_stack : Point = Point {x: 3.0, y: 4.0};
let on_the_heap : Box<Point> = box Point {x: 7.0, y: 9.0};
~~~

Suppose we wanted to write a procedure that computed the distance between any
two points, no matter where they were stored. For example, we might like to
compute the distance between `on_the_stack` and `managed_box`, or between
`managed_box` and `owned_box`. One option is to define a function that takes
two arguments of type `Point`—that is, it takes the points by value. But if we
define it this way, calling the function will cause the points to be
copied. For points, this is probably not so bad, but often copies are
two points, no matter where they were stored. One option is to define a function
that takes two arguments of type `Point`—that is, it takes the points __by value__.
But if we define it this way, calling the function will cause the points __to be
copied__. For points, this is probably not so bad, but often copies are
expensive. Worse, if the data type contains mutable fields, copying can change
the semantics of your program in unexpected ways. So we'd like to define a
function that takes the points by pointer. We can use references to do
this:
the semantics of your program in unexpected ways. So we'd like to define
a function that takes the points just as a __reference__/__borrowed pointer__.

~~~
# struct Point {x: f64, y: f64}
Expand All @@ -68,30 +62,27 @@ fn compute_distance(p1: &Point, p2: &Point) -> f64 {
}
~~~

Now we can call `compute_distance()` in various ways:
Now we can call `compute_distance()`

~~~
# struct Point {x: f64, y: f64}
# let on_the_stack : Point = Point{x: 3.0, y: 4.0};
# let managed_box : @Point = @Point{x: 5.0, y: 1.0};
# let owned_box : Box<Point> = box Point{x: 7.0, y: 9.0};
# let on_the_heap : Box<Point> = box Point{x: 7.0, y: 9.0};
# fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
compute_distance(&on_the_stack, managed_box);
compute_distance(managed_box, owned_box);
compute_distance(&on_the_stack, on_the_heap);
~~~

Here, the `&` operator takes the address of the variable
`on_the_stack`; this is because `on_the_stack` has the type `Point`
(that is, a struct value) and we have to take its address to get a
value. We also call this _borrowing_ the local variable
`on_the_stack`, because we have created an alias: that is, another
`on_the_stack`, because we have created __an alias__: that is, another
name for the same data.

In contrast, we can pass the boxes `managed_box` and `owned_box` to
`compute_distance` directly. The compiler automatically converts a box like
`@Point` or `~Point` to a reference like `&Point`. This is another form
of borrowing: in this case, the caller lends the contents of the managed or
owned box to the callee.
In contrast, we can pass `on_the_heap` to `compute_distance` directly.
The compiler automatically converts a box like `Box<Point>` to a reference like
`&Point`. This is another form of borrowing: in this case, the caller lends
the contents of the box to the callee.

Whenever a caller lends data to a callee, there are some limitations on what
the caller can do with the original. For example, if the contents of a
Expand Down Expand Up @@ -134,10 +125,10 @@ let on_the_stack2 : &Point = &tmp;

# Taking the address of fields

As in C, the `&` operator is not limited to taking the address of
The `&` operator is not limited to taking the address of
local variables. It can also take the address of fields or
individual array elements. For example, consider this type definition
for `rectangle`:
for `Rectangle`:

~~~
struct Point {x: f64, y: f64} // as before
Expand All @@ -153,9 +144,7 @@ Now, as before, we can define rectangles in a few different ways:
# struct Rectangle {origin: Point, size: Size}
let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0},
size: Size {w: 3.0, h: 4.0}};
let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0},
size: Size {w: 3.0, h: 4.0}};
let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0},
let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0},
size: Size {w: 3.0, h: 4.0}};
~~~

Expand All @@ -167,109 +156,29 @@ operator. For example, I could write:
# struct Size {w: f64, h: f64} // as before
# struct Rectangle {origin: Point, size: Size}
# let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}};
# let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0}, size: Size {w: 3.0, h: 4.0}};
# let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
# let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
# fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
compute_distance(&rect_stack.origin, &rect_managed.origin);
compute_distance(&rect_stack.origin, &rect_heap.origin);
~~~

which would borrow the field `origin` from the rectangle on the stack
as well as from the managed box, and then compute the distance between them.
as well as from the owned box, and then compute the distance between them.

# Borrowing managed boxes and rooting
# Lifetimes

We’ve seen a few examples so far of borrowing heap boxes, both managed
and owned. Up till this point, we’ve glossed over issues of
safety. As stated in the introduction, at runtime a reference
is simply a pointer, nothing more. Therefore, avoiding C's problems
with dangling pointers requires a compile-time safety check.
We’ve seen a few examples of borrowing data. Up till this point, we’ve glossed
over issues of safety. As stated in the introduction, at runtime a reference
is simply a pointer, nothing more. Therefore, avoiding C's problems with
dangling pointers requires a compile-time safety check.

The basis for the check is the notion of _lifetimes_. A lifetime is a
The basis for the check is the notion of __lifetimes__. A lifetime is a
static approximation of the span of execution during which the pointer
is valid: it always corresponds to some expression or block within the
program. Code inside that expression can use the pointer without
restrictions. But if the pointer escapes from that expression (for
example, if the expression contains an assignment expression that
assigns the pointer to a mutable field of a data structure with a
broader scope than the pointer itself), the compiler reports an
error. We'll be discussing lifetimes more in the examples to come, and
a more thorough introduction is also available.

When the `&` operator creates a reference, the compiler must
ensure that the pointer remains valid for its entire
lifetime. Sometimes this is relatively easy, such as when taking the
address of a local variable or a field that is stored on the stack:

~~~
struct X { f: int }
fn example1() {
let mut x = X { f: 3 };
let y = &mut x.f; // -+ L
// ... // |
} // -+
~~~

Here, the lifetime of the reference `y` is simply L, the
remainder of the function body. The compiler need not do any other
work to prove that code will not free `x.f`. This is true even if the
code mutates `x`.

The situation gets more complex when borrowing data inside heap boxes:

~~~
# struct X { f: int }
fn example2() {
let mut x = @X { f: 3 };
let y = &x.f; // -+ L
// ... // |
} // -+
~~~

In this example, the value `x` is a heap box, and `y` is therefore a
pointer into that heap box. Again the lifetime of `y` is L, the
remainder of the function body. But there is a crucial difference:
suppose `x` were to be reassigned during the lifetime L? If the
compiler isn't careful, the managed box could become *unrooted*, and
would therefore be subject to garbage collection. A heap box that is
unrooted is one such that no pointer values in the heap point to
it. It would violate memory safety for the box that was originally
assigned to `x` to be garbage-collected, since a non-heap
pointer *`y`* still points into it.

> *Note:* Our current implementation implements the garbage collector
> using reference counting and cycle detection.

For this reason, whenever an `&` expression borrows the interior of a
managed box stored in a mutable location, the compiler inserts a
temporary that ensures that the managed box remains live for the
entire lifetime. So, the above example would be compiled as if it were
written

~~~
# struct X { f: int }
fn example2() {
let mut x = @X {f: 3};
let x1 = x;
let y = &x1.f; // -+ L
// ... // |
} // -+
~~~

Now if `x` is reassigned, the pointer `y` will still remain valid. This
process is called *rooting*.

# Borrowing owned boxes

The previous example demonstrated *rooting*, the process by which the
compiler ensures that managed boxes remain live for the duration of a
borrow. Unfortunately, rooting does not work for borrows of owned
boxes, because it is not possible to have two references to an owned
box.

For owned boxes, therefore, the compiler will only allow a borrow *if
the compiler can guarantee that the owned box will not be reassigned
or moved for the lifetime of the pointer*. This does not necessarily
mean that the owned box is stored in immutable memory. For example,
program.

The compiler will only allow a borrow *if it can guarantee that the data will
not be reassigned or moved for the lifetime of the pointer*. This does not
necessarily mean that the data is stored in immutable memory. For example,
the following function is legal:

~~~
Expand All @@ -294,7 +203,7 @@ and `x` is declared as mutable. However, the compiler can prove that
and in fact is mutated later in the function.

It may not be clear why we are so concerned about mutating a borrowed
variable. The reason is that the runtime system frees any owned box
variable. The reason is that the runtime system frees any box
_as soon as its owning reference changes or goes out of
scope_. Therefore, a program like this is illegal (and would be
rejected by the compiler):
Expand Down Expand Up @@ -337,31 +246,34 @@ Once the reassignment occurs, the memory will look like this:
+---------+
~~~

Here you can see that the variable `y` still points at the old box,
which has been freed.
Here you can see that the variable `y` still points at the old `f`
property of Foo, which has been freed.

In fact, the compiler can apply the same kind of reasoning to any
memory that is _(uniquely) owned by the stack frame_. So we could
memory that is (uniquely) owned by the stack frame. So we could
modify the previous example to introduce additional owned pointers
and structs, and the compiler will still be able to detect possible
mutations:
mutations. This time, we'll use an analogy to illustrate the concept.

~~~ {.ignore}
fn example3() -> int {
struct R { g: int }
struct S { f: Box<R> }
struct House { owner: Box<Person> }
struct Person { age: int }

let mut x = box S {f: box R {g: 3}};
let y = &x.f.g;
x = box S {f: box R {g: 4}}; // Error reported here.
x.f = box R {g: 5}; // Error reported here.
*y
let mut house = box House {
owner: box Person {age: 30}
};

let owner_age = &house.owner.age;
house = box House {owner: box Person {age: 40}}; // Error reported here.
house.owner = box Person {age: 50}; // Error reported here.
*owner_age
}
~~~

In this case, two errors are reported, one when the variable `x` is
modified and another when `x.f` is modified. Either modification would
invalidate the pointer `y`.
In this case, two errors are reported, one when the variable `house` is
modified and another when `house.owner` is modified. Either modification would
invalidate the pointer `owner_age`.

# Borrowing and enums

Expand Down Expand Up @@ -412,7 +324,7 @@ circle constant][tau] and not that dreadfully outdated notion of pi).

The second match is more interesting. Here we match against a
rectangle and extract its size: but rather than copy the `size`
struct, we use a by-reference binding to create a pointer to it. In
struct, we use a __by-reference binding__ to create a pointer to it. In
other words, a pattern binding like `ref size` binds the name `size`
to a pointer of type `&size` into the _interior of the enum_.

Expand Down Expand Up @@ -526,12 +438,12 @@ time one that does not compile:

~~~ {.ignore}
struct Point {x: f64, y: f64}
fn get_x_sh(p: @Point) -> &f64 {
fn get_x_sh(p: &Point) -> &f64 {
&p.x // Error reported here
}
~~~

Here, the function `get_x_sh()` takes a managed box as input and
Here, the function `get_x_sh()` takes a reference as input and
returns a reference. As before, the lifetime of the reference
that will be returned is a parameter (specified by the
caller). That means that `get_x_sh()` promises to return a reference
Expand All @@ -540,17 +452,18 @@ subtly different from the first example, which promised to return a
pointer that was valid for as long as its pointer argument was valid.

Within `get_x_sh()`, we see the expression `&p.x` which takes the
address of a field of a managed box. The presence of this expression
implies that the compiler must guarantee that, so long as the
resulting pointer is valid, the managed box will not be reclaimed by
the garbage collector. But recall that `get_x_sh()` also promised to
address of a field of a Point. The presence of this expression
implies that the compiler must guarantee that , so long as the
resulting pointer is valid, the original Point won't be moved or changed.

But recall that `get_x_sh()` also promised to
return a pointer that was valid for as long as the caller wanted it to
be. Clearly, `get_x_sh()` is not in a position to make both of these
guarantees; in fact, it cannot guarantee that the pointer will remain
valid at all once it returns, as the parameter `p` may or may not be
live in the caller. Therefore, the compiler will report an error here.

In general, if you borrow a managed (or owned) box to create a
In general, if you borrow a structs or boxes to create a
reference, it will only be valid within the function
and cannot be returned. This is why the typical way to return references
is to take references as input (the only other case in
Expand Down
Loading