Ownership and Borrowing in Rust 2 (EECS490Lec21)
Mar 30, 2020
Last Time: We started to talk about Rust. As a recap:
- Rust is an imperative language, so it has reference semantics with side effects on memory that we need to worry about.
- Rust also has automatic memory management, where programmers did not need to have any explicit free for every memory allocation, which leads to memory safety.
- However, Rust manages to have this automatic memory management system in it’s static semantics rather than its dynamics semantics, removing the need for a garbage collector and moving memory management into compile-time, making it speedy and correct!
How? Rust manages to do this via the concept of ownership:
- The idea here is that every heap-allocated resource has an owner. So, if we had written
let x=String::from("hello")
, we say thatx
owns the resource in the heap that containshello
, and whatever else is needed to implement it. - Then, when the owner dies ( which Rust defines as when “its lifetime ends” ), Rust will add a statement to deallocate the memory at compile time.
- What makes this even more useful, however, is that Rust also checks for ownership semantics at compile time, so if a resource is deallocated, Rust will inform the user of that before the program is even run. This allows programmers to reason about memory management errors without needing to run
valgrind
or having to use a garbage collector.
Now, with all of this power and this speed, we have to fundamentally change our understanding of memory to use it, as memory objects in Rust can change owners. What I mean is the following:
- When you write a statement like
let y = x
in many languages, this can actually do a number of things, depending on the language and its semantics. In C++, for example, y would have a copy of the value of x, and, in JavaScript, a language with many quirks, if x was an object then y would be a “shallow copy” of x, simply copying all of the values for the object, making all heap references refer to the exact pieces of memory as x. In Rust, unsurprisingly, assignment has an additional semantic to it: assignment moves ownership. So, in this case,y
would own the Stringhello
, and trying to get the valuehello
from x, say by callingprintln!("{}!", x)
, would lead to a compiler error, as x does not own any heap allocated data at this point. - Another case where Rust changes ownership is in function calls. If we write a function, say
lol(x)
, passing inx
directly moves ownership ofx
into the calling function. Does, if we were to put the following code:let x = String::from("h"); lol(x); println!("{}!",x)
, then we’d also have a compiler error, aslol
owned the string and finished its lifetime, deallocating x.
Thus, while Rust is a clear example of how careful logic and reasoning can lead us to having better programming models that allow speed and logical clarity, we need to contend with this new concept of ownership transfer
to compensate. (This is really similar to how, by splitting up Haskell into a pure language and a command language to separate functional and imperative programming, we gain a feature, easy reasoning, we make writing our program slightly harder, as we have to understand the I/O Monad type and remember to structure our programs to make use of it, rather than allowing print statements everywhere like Python)
So, how do users of Rust cope with what seems like deal-breakers? Through borrowing.
Borrowing
In Rust, instead of creating functions that take in heap-allocated memory directly, we can create functions that take in heap-allocated string references. By doing so, Rust will not give ownership to the function; instead, it will allow the function to borrow ownership of the data allocated for the duration of the function, after which the owner transfers back to the original variable. Thus, if we have the following code:
fn make_greeting(s:&String){
format!("Hello, {}!", s)
}
fn make_footer(s:&String) {
format!("Sincerely, {}", s)
}
fn main() {
let s = String::from("Cyrus");
let greeting = make_greeting(&s);
let footer = make_footer(&s);
}
This all works for a few reasons:
- Firstly, makegreeting and makefooter take in all of this data as references. Thus, they do not take ownership away from s at any point, allowing us to call functions on s as many times as we like.
- Secondly, format!, and other functions that end in
!
in Rust, are not really functions, but compiler macros. When the compiler encounters them, they simply replace these nicer functions with implementations that preserve ownership semantics (i.e. they take in references to s, so they simply borrow the data)
Thus, we’re able to call functions multiple times and do fancy stuff, without explicitly needing to handle ownership at every point. Thanks Rust!
With the recap over with, let’s go and get to the new content. We’ll cover some new material regarding references, Rust’s sum and product types, mutability and concurrency, and Rust’s lifetime system.
References, Redux
So, now that we believe we have the idea of references down, let’s test our understanding by trying to write a program in Rust. From our C++ perspective, it seems like the following code should work:
fn f() {
let s = String::from("hello"); // Dynamically allocate hello
&s // Transfer a reference to this hello to the caller
}
In C++, this sort of function should work, as we’re simply allocating some memory and returning it to us via a reference, where we’ll then use it and de-allocate it like responsible programmers.
However, this actually results in a compiler error! Can you spot the problem?
The problem here comes from the fact that &s
does not own hello
; it’s merely a reference to the data owned by s
. As a result, when s
dies, the memory associated with &s
dies, and so Rust complains, as the lifetime of &s
is larger than the owner, which would lead to a double free error.
Thus, formally, there’s a rule we need to follow when it comes to (immutable) references in Rust:
References can’t outlive the owner of the resource they refer to, to prevent double-free errors.
Thus, while we can have resources borrowed by called functions, we cannot have those references outlive our owner, as that would lead to undeterminable behavior, something as language designers we strive to prevent.
The Product Type in Rust: Structs
In Rust, Structs serve as a product type. So, as an example, we could have a User struct:
Struct User = {
username: String,
email: String,
is_active: bool
};
and then create a User, in the following way:
let me = User {
username : String::from("comar"),
email: String::from("comar@umich.edu"),
is_active: false
};
As a side-note, structs must be defined before they can be created in Rust. If you’ve been through 370 by now, this is similar to C, where we need to have definitions before they are used.
While this might seem like normal behavior for all languages, be aware that there are language where this is not the case. To see an example of such a ‘feature’, simply look up “JavaScript Hoisting” when you get a chance.
In terms of owning heap-allocated data, structs in Rust own their fields. So, in this case, the variable me
owns the dynamic data contained in the fields username
and email
, even if you need to write me.username
and [me.email](http://me.email)
to access the data.
The Sum Type in Rust: Enums
This is a quick aside by Cyrus. You can probably safely skip this
Similarly, we could also define Sum Types in Rust, which are called “Enums”.
We define them through the following syntax
enum Bread {
White, Wheat, Multi(i32)
}
Thus, in this case, we simply have three types of Bread: White, Wheat, and Multi, which is a type constructor that needs a 32-bit integer to exist.
We can also pattern match on these values, using the following syntax:
match (b) {
White => ... , Wheat => ..., Multi(n) => ...
}
And, like Product Types above, the Sum Type owns any dynamically allocated data in it, so in this case if we had a name option type:
enum NameOpt {
Nil, Some(String)
}
let myName = NameOpt::Some(String::From("Arav"));
Then myName would own the String ‘Arav’.
Mutability: Part Deux
With these language features in mind, let’s return to mutability. So, if we remember, we need to use the mut
keyword to declare an owner to a mutable value, and we also need to use mut
to declare a mutable reference.
When we have structs in Rust, then, we’ll need to make sure we have defined fields to be mutable, as fields can only be mutated in very specific ways. As an example, let’s say we had the following code:
let me = User {
username : String::from("comar"),
email: String::from("comar@umich.edu"),
is_active: false
};
and then ran the following code:
me.username = String::from("cyrus");
This will lead to a compiler error, as me is an immutable object because we haven’t used the mut
keyword.
However, if we did the following:
let mut me = User {
username : String::from("comar"),
email: String::from("comar@umich.edu"),
is_active: false
};
me.username = String::from("cyrus");
This now works, as the owner dictates that the data is mutable, and so we can mutate the fields accordingly. What happens here is that, when the username field points to new data, it drops the old field data, deallocating it, which makes sense, as we essentially force that data to have no owner.
Now let’s say we have some mutable object, me
, and try to run the following function:
fn f(user: &User) {
user.username = String::from("cyrus");
}
where, as a side note, user.username
is simply syntactic sugar for (* user).username
, as that’s generally more readable.
Well, believe it or not, but Rust complains yet again.
This is simply because, although the owner is mutable, we only passed the data in through an immutable reference, &User
. If, instead, we used &mut User
, we’d get a mutable borrow of our data, allowing us to mutate the data again.
While this does make sense from a purely syntactic viewpoint, the main reason why Rust enforces this constraint is so it can do some automatic parallelism for you. If two functions, length(&me)
and user(&me)
are ran next to each other, then we can schedule them concurrently without needing to worry about concurrency problems, speeding up your program!
Now, while this is all fine and dandy, there is one caveat added by adding mutable borrows to the language:
Rust only allows one live mutable borrow to exist any point in time and, if there’s a mutable borrow, there cannot be any immutable borrows. In fact, the owner itself can read or write while a mutable borrow is in action.
So, for example, the following code does not compile:
fn main() {
let mut me = User {
username : String::from("comar"),
email: String::from("comar@umich.edu"),
is_active: false
};
let r = &mut me;
let r2 = &mut me;
length(&me)
}
As Rust recognizes that, if it tries to schedule any usage of r
, r2
, or length
concurrently, then we’ll have problems, as those mutable borrows might influence the immutable borrow or each other, depending on how the CPU scheduler operates. (Though you can change this through a lock, or any other mutex operation).
For more information about the CPU scheduler, and for a better understanding of why we need all of this in the first place, either take EECS 482, take EECS 482, or take EECS 482. If you’re impatient like me, then schedule yourself to take EECS482 next semester, AND also read “Operating Systems: Three Easy Pieces”, a set of lecture notes which goes into some really good descriptions on all of this. (It’s legally free through the instructor’s website!)
Operating Systems: Three Easy Pieces
In this way, Rust keeps you from making the hardest of bugs happen, concurrency-related bugs, with the minimal cost that you learn how to program with those issues in mind. In this manner, Rust allows us to get the benefits of parallelism and concurrency, without the trouble that’s usually caused by it.
The Work of a Lifetime: Rust’s Lifetime System
We’ve been a bit careless about how Rust handles when objects die, only stating that it happens and, when it does, the memory associated with any heap-object owned by those objects gets deallocated from memory. Now, we’ll take a more careful look into how Rust actually handles lifetime.
Originally, Rust had the lifetime of objects be simply the scope that they were defined in. Thus, variables exist only for the scope that something is defined in, an idea called lexical scope:
Lexical Scope: The idea that, after a block, variables are deallocated.
Thus, if we had the following code in Rust:
{
let mut me = User { ... };
let r = &mut me;
f(r);
let r2 = &me; // Error
g(r2);
}
Then the second to last line would cause a compiler error, as r
, as a mutable reference, is still alive, and we cannot have r2
, an immutable reference, exist when r
does.
However, this is kinda annoying, as we just want to mutate me using r in f(r)
, and so we need to do some work to change our idea of the code into reality, which can be annoying for simple tasks and potentially destructive for complex tasks.
Thus, in order to cope with the annoyance, Rust has created so-called Non-lexical lifetimes
, where a variable’s lifetime ends at its last use. Thus, the code above works if you’re using Rust’s non-lexical lifetime , as r
’s last use is before any mention of r2
, even if r
is still in scope.
This increases Rust’s flexibility, especially with short-lived mutable borrows.