Rust, and the Case of Memory Management (EECS490Lec20)
Mar 30, 2020
So far, we’ve been assuming automatic memory management.
If we have the following function:
let f y =
let counter = ref 0 in
.
.
.
.
!counter
in f 3
So here, when counter is stops being used, (i.e. at the end, at !counter), the OCaml garbage collector deallocates that memory for another program to use automatically, making memory management easy, but less efficient than C/C++
Garbage Collector: A program that manages the references to memory a program uses, and allocates/deallocates the memory depending on what and how it’s currently being used.
These programs have to be really clever and really sophisticated to work well, as they need to be able to handle a lot of strange scenarios, like the following:
let f y =
let counter = ref 0 in
..
..
..
counter
in
let x = f y in
!x
So here, we still can access the counter data, as we pass that data’s location into another variable. As a result, we need to not deallocate it, though if you had assumed references can’t go up a level in the stack that would have changed this function’s behavior.
Besides this scenario, you also have many more scenarios. For example, references can be:
- Bound to variables, like counter
- Placed inside other data structures
- Returned from functions
- Bound to multiple variables
and so on.
As you can see, Garbage Collection is a very complicated job, and thus cannot be done at compile time. Thus, OCaml, and other programs like it, uses a run-time garbage collector, which has to stop program execution every now and then to run these functions. It’s a very expensive operation, but it’s needed to maintain this abstraction.
So while GC is safe, so it won’t have use after free bugs and double free bugs, it’s slowww
Whereas, in C/C++, we have manual memory management, which is really cheap, but unsafe, via use after free bugs and double free bugs
Can we do better?
So, for a while, we had this dichotomy between memory-safe, but garbage collected, languages and non memory-safe, but fast, languages. Enter Rust:
Rust is a memory-safe imperative language without manual memory management and no garbage collector
This makes it both safe and cheap, though it does complicate memory management to achieve this.
It’s also being used in a lot of places, from Firefox, to Dropbox, to Microsoft.
Rust: Basic Principles
Before we learn what makes Rust tick, let’s go through and learn a bit of it beforehand.
Firstly, Rust is an expression oriented language:
let x = 5;
x + 1
This is an expression with type i32 and value b.
It also has all of the stuff we’re been talking about, and variables are given meaning through substitution like everything else we’ve been talking about.
Secondly, Rust distinguishes between stack and heap variables:
For stack variables:
// The let keyword defines a stack variable so:
let x = 5; // On the stack
let y = 5; // On the stack
...
// So even the following string is an immutable stack variable
let x = "Hello" // type: str ( with known size )
// As a result, we cannot concatenate strings directly:
fn f( x: str) {
let y = "hello";
let z = x + y; // INVALID, as we do not know the size of x+y at compile time
z // No explicit return as it's an expression oriented language
}
For dynamic variables, you use constructors:
// To make a heap-allocated string:
let x = String::from("hello"); //x:String
Thus, on the stack, x has the following structure:
x -> |pointer to the actual heap memory| length
| capacity (like vector grow from 281)
But, on the heap, there’s the actual string contents.
Again, this all makes sense, but how does Rust actually free heap-allocated memory without manual memory management or a garbage collector?
How Rust free’s heap-allocated memory
If we had the following code:
{
let x = String::from("hello");
println!("{}!", x);
}
Here, Rust says that, as the variable becomes out-of-scope, it’ll get freed at the end of the scope, which it figures out at compile time through its lifetime static semantics system.
Now that works and all, but let’s consider what would happen if we were to look at another example, assuming this is strictly how Rust works:
{
let x = String::from("hello");
{
let y = x;
println!("{}!", y);
}
println("{}!",x);
}
Here, if we were to follow that simple strategy, we’d run into a problem. After we transfer the pointer value to y, as y goes out of scope, y’s data would be erased. However, as y’s data is x’s data, x’s data would be erased, and we would have a use-after-free bug in our system.
However, what may be surprising here is that Rust calls the above program “illtyped” in its language, so the above program won’t even compile. How come? Rust adds a new concept called ownership to its type system, so in Rust:
- Every piece of Heap Data has a unique owner, which refers to the data that points to it. In our example, x was the original owner of this data.
- Ownership can be transferred, so when we wrote
let y = x
, Rust moved the ownership of x from x to y, making y the unique owner ( and all of this is done statically! ) - Deallocation occurs when the owner dies. Thus, after y goes out of scope, y’s heap data is freed, and so when x tries to print that data, we run into ill-typed, as x doesn’t own any data!
Thus, even the following example is poorly-typed, as x isn’t the owner of the string anymore:
let x = String::from("hello");
let y = x;
println!("{}!", y); //OK
println!("{}!", x); // Not OK, as x is not the owner.
With this concept, then, Rust is able to make all of the memory management calls compile-time bound, keeping memory management fast while still easy for basic uses.
For simple immutable stack data, Rust has no sense of ownership. Thus, the following code works:
let x = 5;
let y = x; // As x is a simple immutable, Rust just makes a copy
println("{}!", x); // So this is still okay
Additionally, there are explicit methods to copy Heap variables:
let x = String::from("hello");
let y = x.clone(); // Copy of x, so we don't need to worry changing ownership
Now that we understand the central concept that makes Rust useful, ownership, let’s dig a bit deeper into how ownership transfers and how it changes hands.
Changing Hands: Rust Ownership Transfer
Firstly, ownership is moved in and out of functions. If we had the following:
let identity(s:String) {
s
};
let x = String::from("hello");
let y = identity(x);
First, x transfers its ownership of Hello to identity’s s, as it transfers the ownership to the inner block.
Then, the identity function transfer’s its ownership to y, as y is assigned the reference value after identity is called.
Thus, the following still works:
println!("{}!", y);
But the following does not, as x transfers ownership to identity:
println!("{}!", x);
While that’s nice conceptually, this does lead to a few new problems we have to consider when writing code. For example, consider the following code, assuming a length function len : String -> i32
:
let x = String::from("hello");
let y = len(x);
println("{}!", x); // Whoops!
So we have a new problem; how do we call functions without transferring ownership to them, as we want to call length and other functions on Strings without needing to worry about ownership all of the time.
A First Attempt: Explicitly Moving the Resource in and Out
Well, one thing we could do is we could simply redefine length to return both the length and the string itself:
fn len(s:String) {
..
(n, s)
}
let x = String::from("hello");
let (y,x) = len(x); // Move ownership to len, which moves it back to x.
println!("{}!", x); //OK, as we move the transfer back to x.
However, this has the downside of complicating our functions, and it can lead to potentially a function taking 8 things returning 8 or more things ( which can happen in systems level code )
Obviously, as Rust is so used, this can’t be how Rust does it; no one would want to work in it.
A Second Attempt : Borrowing
So, what if instead of passing the string itself as an argument, what if we passed a reference to the string as an argument? Well, it turns out by doing this, Rust does not transfer ownership to the calling function. Therefore, we’re able to let a function “borrow” our string resource, compute its value, and transfer ownership back to us:
fn len (s : &String) {
...
n
}
let x = String::from("hello");
let y = len(&x); // Thus, x is still the owner, but len borrows access
This is how Rust generally works with data when dealing with immutable functions, but what about mutable functions / methods?
Mutability in Rust
Say we wish to follow the time-honored tradition of programmers and write the following Hello World program:
let x = String::from("Hello");
x.push_str(", World"); //OOF
println!("{}!", x);
If we were to compile this program, the second line will cause a compilation error, as we haven’t told Rust we want our data to be mutable. Thus, for a variable to be mutable, and for us to have a mutable piece of data, we need to add the mut
keyword:
let mut x = String::from("Hello");
x.push_str(", World"); //All good, so we mutate in place.
println!("{}!", x);
Now, let’s say we had a function, f, which edits a string in place, and we wanted to use it with a reference. We can’t do the following:
let mut x = String::from("Hello");
f(&x);
This is because &x
creates a static, or immutable, reference to x in Rust. To make it mutable, we need to add the mut keyword again:
let mut x = String::from("Hello");
f(& mut x);
That allows us to mutate the data in the inner function, and thus use our new heap data in all the ways we’d want to for most projects.
Summary
- Rust manages to not have explicit memory management AND not have a garbage collector by introducing a new concept: the owner
- Each piece of heap data has a unique owner, and frees when its owner dies
- Through borrowing and the
mut
keyword, we can have owners share resourced with functions without transferring ownership, allowing for usage like RAII in C++.