Merge branch 'version3'

This commit is contained in:
Carl Fredrik Samson
2020-04-05 17:10:01 +02:00
21 changed files with 3919 additions and 997 deletions

View File

@@ -0,0 +1,557 @@
# Some Background Information
Before we go into the details about Futures in Rust, let's take a quick look
at the alternatives for handling concurrent programming in general and some
pros and cons for each of them.
While we do that we'll get some information on concurrency which will make it
easier for us when we dive in to Futures specifically.
> For fun, I've added a small snipped of runnable code with most of the examples.
> If you're like me, things get way more interesting then and maybe you'll se some
> things you haven't seen before along the way.
## Threads provided by the operating system
Now, one way of accomplishing this is letting the OS take care of everything for
us. We do this by simply spawning a new OS thread for each task we want to
accomplish and write code like we normally would.
The runtime we use to handle concurrency for us is the operating system itself.
**Advantages:**
- Simple
- Easy to use
- Switching between tasks is reasonably fast
- You get parallelism for free
**Drawbacks:**
- OS level threads come with a rather large stack. If you have many tasks
waiting simultaneously (like you would in a web-server under heavy load) you'll
run out of memory pretty fast.
- There are a lot of syscalls involved. This can be pretty costly when the number
of tasks is high.
- The OS has many things it needs to handle. It might not switch back to your
thread as fast as you'd wish.
- Might not be an option on some systems
**Using OS threads in Rust looks like this:**
```rust
use std::thread;
fn main() {
println!("So we start the program here!");
let t1 = thread::spawn(move || {
thread::sleep(std::time::Duration::from_millis(200));
println!("We create tasks which gets run when they're finished!");
});
let t2 = thread::spawn(move || {
thread::sleep(std::time::Duration::from_millis(100));
println!("We can even chain callbacks...");
let t3 = thread::spawn(move || {
thread::sleep(std::time::Duration::from_millis(50));
println!("...like this!");
});
t3.join().unwrap();
});
println!("While our tasks are executing we can do other stuff here.");
t1.join().unwrap();
t2.join().unwrap();
}
```
OS threads sure has some pretty big advantages. So why all this talk about
"async" and concurrency in the first place?
First of all. For computers to be [_efficient_](https://en.wikipedia.org/wiki/Efficiency) it needs to multitask. Once you
start to look under the covers (like [how an operating system works](https://os.phil-opp.com/async-await/))
you'll see concurrency everywhere. It's very fundamental in everything we do.
Secondly, we have the web. Webservers is all about I/O and handling small tasks
(requests). When the number of small tasks is large it's not a good fit for OS
threads as of today because of the memory they require and the overhead involved
when creating new threads. This gets even more relevant when the load is variable
which means the current number of tasks a program has at any point in time is
unpredictable. That's why you'll see so many async web frameworks and database
drivers today.
However, for a huge number of problems, the standard OS threads will often be the
right solution. So, just think twice about your problem before you reach for an
async library.
Now, let's look at some other options for multitasking. They all have in common
that they implement a way to do multitasking by having a "userland"
runtime:
## Green threads
Green threads uses the same mechanism as an OS does by creating a thread for
each task, setting up a stack, save the CPU's state and jump from one
task(thread) to another by doing a "context switch".
We yield control to the scheduler (which is a central part of the runtime in
such a system) which then continues running a different task.
Rust had green threads once, but they were removed before it hit 1.0. The state
of execution is stored in each stack so in such a solution there would be no
need for async, await, Futures or Pin. All this would be implementation details
for the library.
The typical flow will be like this:
1. Run som non-blocking code
2. Make a blocking call to some external resource
3. CPU jumps to the "main" thread which schedules a different thread to run and
"jumps" to that stack
4. Run some non-blocking code on the new thread until a new blocking call or the
task is finished
5. "jumps" back to the "main" thread, schedule a new thread to run and jump to that
These "jumps" are know as context switches. Your OS is doing it many times each
second as you read this.
**Advantages:**
1. Simple to use. The code will look like it does when using OS threads.
2. A "context switch" is reasonably fast
3. Each stack only gets a little memory to start with so you can have hundred of
thousands of green threads running.
4. It's easy to incorporate [_preemtion_](https://cfsamson.gitbook.io/green-threads-explained-in-200-lines-of-rust/green-threads#preemptive-multitasking)
which puts a lot of control in the hands of the runtime implementors.
**Drawbacks:**
1. The stacks might need to grow. Solving this is not easy and will have a cost.
2. You need to save all the CPU state on every switch
3. It's not a _zero cost abstraction_ (Rust had green threads early on and this
was one of the reasons they were removed).
4. Complicated to implement correctly if you want to support many different
platforms.
If you were to implement green threads in Rust, it could look something like
this:
> The example presented below is an adapted example from an earlier gitbook I
> wrote about green threads called [Green Threads Explained in 200 lines of Rust.](https://cfsamson.gitbook.io/green-threads-explained-in-200-lines-of-rust/)
> If you want to know what's going on you'll find everything explained in detail
> in that book. The code below is wildly unsafe and it's just to show a real example.
> It's not in any way meant to showcase "best practice". Just so we're on
> the same page.
```rust, edition2018
#![feature(asm)]
#![feature(naked_functions)]
use std::ptr;
const DEFAULT_STACK_SIZE: usize = 1024 * 1024 * 2;
const MAX_THREADS: usize = 4;
static mut RUNTIME: usize = 0;
pub struct Runtime {
threads: Vec<Thread>,
current: usize,
}
#[derive(PartialEq, Eq, Debug)]
enum State {
Available,
Running,
Ready,
}
struct Thread {
id: usize,
stack: Vec<u8>,
ctx: ThreadContext,
state: State,
task: Option<Box<dyn Fn()>>,
}
#[derive(Debug, Default)]
#[repr(C)]
struct ThreadContext {
rsp: u64,
r15: u64,
r14: u64,
r13: u64,
r12: u64,
rbx: u64,
rbp: u64,
thread_ptr: u64,
}
impl Thread {
fn new(id: usize) -> Self {
Thread {
id,
stack: vec![0_u8; DEFAULT_STACK_SIZE],
ctx: ThreadContext::default(),
state: State::Available,
task: None,
}
}
}
impl Runtime {
pub fn new() -> Self {
let base_thread = Thread {
id: 0,
stack: vec![0_u8; DEFAULT_STACK_SIZE],
ctx: ThreadContext::default(),
state: State::Running,
task: None,
};
let mut threads = vec![base_thread];
threads[0].ctx.thread_ptr = &threads[0] as *const Thread as u64;
let mut available_threads: Vec<Thread> = (1..MAX_THREADS).map(|i| Thread::new(i)).collect();
threads.append(&mut available_threads);
Runtime {
threads,
current: 0,
}
}
pub fn init(&self) {
unsafe {
let r_ptr: *const Runtime = self;
RUNTIME = r_ptr as usize;
}
}
pub fn run(&mut self) -> ! {
while self.t_yield() {}
std::process::exit(0);
}
fn t_return(&mut self) {
if self.current != 0 {
self.threads[self.current].state = State::Available;
self.t_yield();
}
}
fn t_yield(&mut self) -> bool {
let mut pos = self.current;
while self.threads[pos].state != State::Ready {
pos += 1;
if pos == self.threads.len() {
pos = 0;
}
if pos == self.current {
return false;
}
}
if self.threads[self.current].state != State::Available {
self.threads[self.current].state = State::Ready;
}
self.threads[pos].state = State::Running;
let old_pos = self.current;
self.current = pos;
unsafe {
switch(&mut self.threads[old_pos].ctx, &self.threads[pos].ctx);
}
true
}
pub fn spawn<F: Fn() + 'static>(f: F){
unsafe {
let rt_ptr = RUNTIME as *mut Runtime;
let available = (*rt_ptr)
.threads
.iter_mut()
.find(|t| t.state == State::Available)
.expect("no available thread.");
let size = available.stack.len();
let s_ptr = available.stack.as_mut_ptr();
available.task = Some(Box::new(f));
available.ctx.thread_ptr = available as *const Thread as u64;
ptr::write(s_ptr.offset((size - 8) as isize) as *mut u64, guard as u64);
ptr::write(s_ptr.offset((size - 16) as isize) as *mut u64, call as u64);
available.ctx.rsp = s_ptr.offset((size - 16) as isize) as u64;
available.state = State::Ready;
}
}
}
fn call(thread: u64) {
let thread = unsafe { &*(thread as *const Thread) };
if let Some(f) = &thread.task {
f();
}
}
#[naked]
fn guard() {
unsafe {
let rt_ptr = RUNTIME as *mut Runtime;
let rt = &mut *rt_ptr;
println!("THREAD {} FINISHED.", rt.threads[rt.current].id);
rt.t_return();
};
}
pub fn yield_thread() {
unsafe {
let rt_ptr = RUNTIME as *mut Runtime;
(*rt_ptr).t_yield();
};
}
#[naked]
#[inline(never)]
unsafe fn switch(old: *mut ThreadContext, new: *const ThreadContext) {
asm!("
mov %rsp, 0x00($0)
mov %r15, 0x08($0)
mov %r14, 0x10($0)
mov %r13, 0x18($0)
mov %r12, 0x20($0)
mov %rbx, 0x28($0)
mov %rbp, 0x30($0)
mov 0x00($1), %rsp
mov 0x08($1), %r15
mov 0x10($1), %r14
mov 0x18($1), %r13
mov 0x20($1), %r12
mov 0x28($1), %rbx
mov 0x30($1), %rbp
mov 0x38($1), %rdi
ret
"
:
: "r"(old), "r"(new)
:
: "alignstack"
);
}
# #[cfg(not(windows))]
fn main() {
let mut runtime = Runtime::new();
runtime.init();
Runtime::spawn(|| {
println!("I haven't implemented a timer in this example.");
yield_thread();
println!("Finally, notice how the tasks are executed concurrently.");
});
Runtime::spawn(|| {
println!("But we can still nest tasks...");
Runtime::spawn(|| {
println!("...like this!");
})
});
runtime.run();
}
# #[cfg(windows)]
# fn main() { }
```
Still hanging in there? Good. Don't get frustrated if the code above is
difficult to understand. If I hadn't written it myself I would probably feel
the same. You can always go back and read the book which explains it later.
## Callback based approaches
You probably already know what we're going to talk about in the next paragraphs
from Javascript which I assume most know.
>If your exposure to Javascript has given you any sorts of PTSD earlier in life,
close your eyes now and scroll down for 2-3 seconds. You'll find a link there
that takes you to safety.
The whole idea behind a callback based approach is to save a pointer to a set of
instructions we want to run later. We can save that pointer on the stack before
we yield control to the runtime, or in some sort of collection as we do below.
The basic idea of not involving threads as a primary way to achieve concurrency
is the common denominator for the rest of the approaches. Including the one
Rust uses today which we'll soon get to.
**Advantages:**
- Easy to implement in most languages
- No context switching
- Low memory overhead (in most cases)
**Drawbacks:**
- Each task must save the state it needs for later, the memory usage will grow
linearly with the number of callbacks in a chain of computations.
- Can be hard to reason about, many people already know this as as "callback hell".
- It's a very different way of writing a program, and it can be difficult to
get an understanding of the program flow.
- Sharing state between tasks is a hard problem in Rust using this approach due
to it's ownership model.
An extremely simplified example of a how a callback based approach could look
like is:
```rust
fn program_main() {
println!("So we start the program here!");
set_timeout(200, || {
println!("We create tasks which gets run when they're finished!");
});
set_timeout(100, || {
println!("We can even chain callbacks...");
set_timeout(50, || {
println!("...like this!");
})
});
println!("While our tasks are executing we can do other stuff here.");
}
fn main() {
RT.with(|rt| rt.run(program_main));
}
use std::sync::mpsc::{channel, Receiver, Sender};
use std::{cell::RefCell, collections::HashMap, thread};
thread_local! {
static RT: Runtime = Runtime::new();
}
struct Runtime {
callbacks: RefCell<HashMap<usize, Box<dyn FnOnce() -> ()>>>,
next_id: RefCell<usize>,
evt_sender: Sender<usize>,
evt_reciever: Receiver<usize>,
}
fn set_timeout(ms: u64, cb: impl FnOnce() + 'static) {
RT.with(|rt| {
let id = *rt.next_id.borrow();
*rt.next_id.borrow_mut() += 1;
rt.callbacks.borrow_mut().insert(id, Box::new(cb));
let evt_sender = rt.evt_sender.clone();
thread::spawn(move || {
thread::sleep(std::time::Duration::from_millis(ms));
evt_sender.send(id).unwrap();
});
});
}
impl Runtime {
fn new() -> Self {
let (evt_sender, evt_reciever) = channel();
Runtime {
callbacks: RefCell::new(HashMap::new()),
next_id: RefCell::new(1),
evt_sender,
evt_reciever,
}
}
fn run(&self, program: fn()) {
program();
for evt_id in &self.evt_reciever {
let cb = self.callbacks.borrow_mut().remove(&evt_id).unwrap();
cb();
if self.callbacks.borrow().is_empty() {
break;
}
}
}
}
```
We're keeping this super simple, and you might wonder what's the difference
between this approach and the one using OS threads an passing in the callbacks
to the OS threads directly. The difference is that the callbacks are run on the
same thread using this example. The OS threads we create are basically just used
as timers.
## From callbacks to promises
You might start to wonder by now, when are we going to talk about Futures?
Well, we're getting there. You see `promises`, `futures` and other names for
deferred computations are often used interchangeably. There are formal
differences between them but we'll not cover that here but it's worth
explaining `promises` a bit since they're widely known due to beeing used in
Javascript and will serve as segway to Rusts Futures.
First of all, many languages has a concept of promises but I'll use the ones
from Javascript in the examples below.
Promises is one way to deal with the complexity which comes with a callback
based approach.
Instead of:
```js, ignore
setTimer(200, () => {
setTimer(100, () => {
setTimer(50, () => {
console.log("I'm the last one");
});
});
});
```
We can to this:
```js, ignore
function timer(ms) {
return new Promise((resolve) => setTimeout(resolve, ms))
}
timer(200)
.then(() => return timer(100))
.then(() => return timer(50))
.then(() => console.log("I'm the last one));
```
The change is even more substantial under the hood. You see, promises return
a state machine which can be in one of three states: `pending`, `fulfilled` or
`rejected`. So when we call `timer(200)` in the sample above, we get back a
promise in the state `pending`.
Since promises are re-written as state machines they also enable an even better
syntax where we now can write our last example like this:
```js, ignore
async function run() {
await timer(200);
await timer(100);
await timer(50);
console.log("I'm the last one");
}
```
You can consider the `run` function a _pausable_ task consisting of several
sub-tasks. On each "await" point it yields control to the scheduler (in this
case it's the well known Javascript event loop). Once one of the sub-tasks changes
state to either `fulfilled` or `rejected` the task is scheduled to continue to
the next step.
Syntactically, Rusts Futures 1.0 was a lot like the promises example above and
Rusts Futures 3.0 is a lot like async/await in our last example.
Now this is also where the similarities with Rusts Futures stop. The reason we
go through all this is to get an introduction and get into the right mindset for
exploring Rusts Futures.
> To avoid confusion later on: There is one difference you should know. Javascript
> promises are _eagerly_ evaluated. That means that once it's created, it starts
> running a task. Rusts Futures on the other hand is _lazily_ evaluated. They
> need to be polled once before they do any work. You'll see in a moment.
<br />
<div style="text-align: center; padding-top: 2em;">
<a href="/1_futures_in_rust.html" style="background: red; color: white; padding:2em 2em 2em 2em; font-size: 1.2em;"><strong>PANIC BUTTON (next chapter)</strong></a>
</div>

View File

@@ -1,18 +1,12 @@
# Some background information
# Futures in Rust
> **Relevant for:**
> **Overview:**
>
> - High level introduction to concurrency in Rust
> - Knowing what Rust provides and not when working with async code
> - Understanding why we need runtimes
> - Understanding why we need a runtime-library in Rust
> - Getting pointers to further reading on concurrency in general
Before we start implementing our `Futures` , we'll go through some background
information that will help demystify some of the concepts we encounter.
Actually, after going through these concepts, implementing futures will seem
pretty simple. I promise.
## Futures
So what is a future?
@@ -107,7 +101,7 @@ The difference between Rust and other languages is that you have to make an
active choice when it comes to picking a runtime. Most often, in other languages
you'll just use the one provided for you.
An async runtime can be divided into two parts:
**An async runtime can be divided into two parts:**
1. The Executor
2. The Reactor

View File

@@ -1,6 +1,6 @@
# Waker and Context
> **Relevant for:**
> **Overview:**
>
> - Understanding how the Waker object is constructed
> - Learning how the runtime know when a leaf-future can resume
@@ -101,7 +101,7 @@ object from these parts:
and try to run it. If you want to go back, press the undo symbol. Keep an eye
out for these as we go forward. Many examples will be editable.
```rust, editable
```rust
// A reference to a trait object is a fat pointer: (data_ptr, vtable_ptr)
trait Test {
fn add(&self) -> i32;
@@ -160,7 +160,6 @@ fn main() {
println!("Sub: 3 - 2 = {}", test.sub());
println!("Mul: 3 * 2 = {}", test.mul());
}
```
Now that you know this you also know why how we implement the `Waker` type

View File

@@ -1,10 +1,10 @@
# Generators
>**Relevant for:**
>**Overview:**
>
>- Understanding how the async/await syntax works since it's how `await` is implemented
>- Knowing why we need `Pin`
>- Understanding why Rusts async model is very efficient
>- Understandi how the async/await syntax works since it's how `await` is implemented
>- Know why we need `Pin`
>- Understand why Rusts async model is very efficient
>
>The motivation for `Generators` can be found in [RFC#2033][rfc2033]. It's very
>well written and I can recommend reading through it (it talks as much about
@@ -22,21 +22,9 @@ handle concurrency:
2. Using combinators.
3. Stackless coroutines, better known as generators.
### Stackful coroutines/green threads
I've written about green threads before. Go check out
[Green Threads Explained in 200 lines of Rust][greenthreads] if you're interested.
Green threads uses the same mechanism as an OS does by creating a thread for
each task, setting up a stack, save the CPU's state and jump from one
task(thread) to another by doing a "context switch".
We yield control to the scheduler (which is a central part of the runtime in
such a system) which then continues running a different task.
Rust had green threads once, but they were removed before it hit 1.0. The state
of execution is stored in each stack so in such a solution there would be no need
for `async`, `await`, `Futures` or `Pin`. All this would be implementation details for the library.
We covered [green threads in the background information](0_background_information.md#green-threads)
so we won't repeat that here. We'll concentrate on the variants of stackless
coroutines which Rust uses today.
### Combinators
@@ -93,10 +81,14 @@ async fn myfn() {
}
```
Generators in Rust are implemented as state machines. The memory footprint of a
chain of computations is only defined by the largest footprint of any single
step require. That means that adding steps to a chain of computations might not
require any increased memory at all.
Async in Rust is implemented using Generators. So to understand how Async really
works we need to understand generators first. Generators in Rust are implemented
as state machines. The memory footprint of a chain of computations is only
defined by the largest footprint of what the largest step require.
That means that adding steps to a chain of computations might not require any
increased memory at all and it's one of the reasons why Futures and Async in
Rust has very little overhead.
## How generators work
@@ -104,7 +96,6 @@ In Nightly Rust today you can use the `yield` keyword. Basically using this
keyword in a closure, converts it to a generator. A closure could look like this
before we had a concept of `Pin`:
```rust,noplaypen,ignore
#![feature(generators, generator_trait)]
use std::ops::{Generator, GeneratorState};
@@ -176,19 +167,17 @@ impl Generator for GeneratorA {
match std::mem::replace(&mut *self, GeneratorA::Exit) {
GeneratorA::Enter(a1) => {
/*|---code before yield---|*/
/*|*/ println!("Hello"); /*|*/
/*|*/ let a = a1 * 2; /*|*/
/*|------------------------|*/
/*----code before yield----*/
println!("Hello");
let a = a1 * 2;
*self = GeneratorA::Yield1(a);
GeneratorState::Yielded(a)
}
GeneratorA::Yield1(_) => {
/*|----code after yield----|*/
/*|*/ println!("world!"); /*|*/
/*|-------------------------|*/
GeneratorA::Yield1(_) => {
/*-----code after yield-----*/
println!("world!");
*self = GeneratorA::Exit;
GeneratorState::Complete(())
@@ -218,7 +207,7 @@ Instead of discussing it in theory, let's look at some code.
> in depth explanation see [Tyler Mandry's excellent article: How Rust optimizes async/await][optimizing-await]
```rust,noplaypen,ignore
let mut gen = move || {
let mut generator = move || {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
yield borrowed.len();
@@ -226,15 +215,28 @@ let mut gen = move || {
};
```
We'll be hand-coding some versions of a state-machines representing a state
machine for the generator defined aboce.
We step through each step "manually" in every example, so it looks pretty
unfamiliar. We could add some syntactic sugar like implementing the `Iterator`
trait for our generators which would let us do this:
```rust, ingore
for val in generator {
println!("{}", val);
}
```
It's a pretty trivial change to make, but this chapter is already getting long.
Just keep this in the back of your head as we move forward.
Now what does our rewritten state machine look like with this example?
```rust,compile_fail
# // If you've ever wondered why the parameters are called Y and R the naming from
# // the original rfc most likely holds the answer
# enum GeneratorState<Y, R> {
# // originally called `CoResult`
# Yielded(Y), // originally called `Yield(Y)`
# Complete(R), // originally called `Return(R)`
# Yielded(Y),
# Complete(R),
# }
#
# trait Generator {
@@ -266,7 +268,7 @@ impl Generator for GeneratorA {
match std::mem::replace(&mut *self, GeneratorA::Exit) {
GeneratorA::Enter => {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
let borrowed = &to_borrow; // <--- NB!
let res = borrowed.len();
*self = GeneratorA::Yield1 {to_borrow, borrowed};
@@ -298,31 +300,10 @@ into itself.
As you'll notice, this compiles just fine!
```rust,editable
pub fn main() {
let mut gen = GeneratorA::start();
let mut gen2 = GeneratorA::start();
if let GeneratorState::Yielded(n) = gen.resume() {
println!("Got value {}", n);
}
// If you uncomment this, very bad things can happen. This is why we need `Pin`
// std::mem::swap(&mut gen, &mut gen2);
if let GeneratorState::Yielded(n) = gen2.resume() {
println!("Got value {}", n);
}
// if you uncomment `mem::swap`.. this should now start gen2.
if let GeneratorState::Complete(()) = gen.resume() {
()
};
}
```rust, ignore
enum GeneratorState<Y, R> {
Yielded(Y), // originally called `Yield(Y)`
Complete(R), // originally called `Return(R)`
Yielded(Y),
Complete(R),
}
trait Generator {
@@ -335,7 +316,7 @@ enum GeneratorA {
Enter,
Yield1 {
to_borrow: String,
borrowed: *const String, // Normally you'll see `std::ptr::NonNull` used instead of *ptr
borrowed: *const String,
},
Exit,
}
@@ -349,20 +330,18 @@ impl Generator for GeneratorA {
type Yield = usize;
type Return = ();
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
// lets us get ownership over current state
match self {
GeneratorA::Enter => {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
let res = borrowed.len();
// Trick to actually get a self reference
*self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
match self {
GeneratorA::Yield1{to_borrow, borrowed} => *borrowed = to_borrow,
_ => unreachable!(),
};
// We set the self-reference here
if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
*borrowed = to_borrow;
}
GeneratorState::Yielded(res)
}
@@ -378,139 +357,186 @@ impl Generator for GeneratorA {
}
```
> Try to uncomment the line with `mem::swap` and see the results.
Remember that our example is the generator we crated which looked like this:
While the example above compiles just fine, we expose consumers of this this API
to both possible undefined behavior and other memory errors while using just safe
Rust. This is a big problem!
But now, let's prevent this problem using `Pin`. We'll discuss
`Pin` more in the next chapter, but you'll get an introduction here by just
reading the comments.
```rust,editable
#![feature(optin_builtin_traits)] // needed to implement `!Unpin`
use std::pin::Pin;
pub fn main() {
let gen1 = GeneratorA::start();
let gen2 = GeneratorA::start();
// Before we pin the pointers, this is safe to do
// std::mem::swap(&mut gen, &mut gen2);
// constructing a `Pin::new()` on a type which does not implement `Unpin` is unsafe.
// However, as you'll see in the start of the next chapter value pinned to
// heap can be constructed while staying in safe Rust so we can use
// that to avoid unsafe. You can also use crates like `pin_utils` to do
// this safely, just remember that they use unsafe under the hood so it's
// like using an already-reviewed unsafe implementation.
let mut pinned1 = Box::pin(gen1);
let mut pinned2 = Box::pin(gen2);
// Uncomment these if you think it's safe to pin the values to the stack instead
// (it is in this case). Remember to comment out the two previous lines first.
//let mut pinned1 = unsafe { Pin::new_unchecked(&mut gen1) };
//let mut pinned2 = unsafe { Pin::new_unchecked(&mut gen2) };
if let GeneratorState::Yielded(n) = pinned1.as_mut().resume() {
println!("Gen1 got value {}", n);
}
if let GeneratorState::Yielded(n) = pinned2.as_mut().resume() {
println!("Gen2 got value {}", n);
```rust,noplaypen,ignore
let mut gen = move || {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
yield borrowed.len();
println!("{} world!", borrowed);
};
// This won't work
// std::mem::swap(&mut gen, &mut gen2);
// This will work but will just swap the pointers. Nothing inherently bad happens here.
// std::mem::swap(&mut pinned1, &mut pinned2);
let _ = pinned1.as_mut().resume();
let _ = pinned2.as_mut().resume();
}
enum GeneratorState<Y, R> {
// originally called `CoResult`
Yielded(Y), // originally called `Yield(Y)`
Complete(R), // originally called `Return(R)`
}
trait Generator {
type Yield;
type Return;
fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return>;
}
enum GeneratorA {
Enter,
Yield1 {
to_borrow: String,
borrowed: *const String, // Normally you'll see `std::ptr::NonNull` used instead of *ptr
},
Exit,
}
impl GeneratorA {
fn start() -> Self {
GeneratorA::Enter
}
}
// This tells us that the underlying pointer is not safe to move after pinning. In this case,
// only we as implementors "feel" this, however, if someone is relying on our Pinned pointer
// this will prevent them from moving it. You need to enable the feature flag
// `#![feature(optin_builtin_traits)]` and use the nightly compiler to implement `!Unpin`.
// Normally, you would use `std::marker::PhantomPinned` to indicate that the
// struct is `!Unpin`.
impl !Unpin for GeneratorA { }
impl Generator for GeneratorA {
type Yield = usize;
type Return = ();
fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return> {
// lets us get ownership over current state
let this = unsafe { self.get_unchecked_mut() };
match this {
GeneratorA::Enter => {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
let res = borrowed.len();
// Trick to actually get a self reference. We can't reference
// the `String` earlier since these references will point to the
// location in this stack frame which will not be valid anymore
// when this function returns.
*this = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
match this {
GeneratorA::Yield1{to_borrow, borrowed} => *borrowed = to_borrow,
_ => unreachable!(),
};
GeneratorState::Yielded(res)
}
GeneratorA::Yield1 {borrowed, ..} => {
let borrowed: &String = unsafe {&**borrowed};
println!("{} world", borrowed);
*this = GeneratorA::Exit;
GeneratorState::Complete(())
}
GeneratorA::Exit => panic!("Can't advance an exited generator!"),
}
}
}
```
Now, as you see, the consumer of this API must either:
Below is an example of how we could run this state-machine. But there is still
one huge problem with this:
```rust
pub fn main() {
let mut gen = GeneratorA::start();
let mut gen2 = GeneratorA::start();
if let GeneratorState::Yielded(n) = gen.resume() {
println!("Got value {}", n);
}
if let GeneratorState::Yielded(n) = gen2.resume() {
println!("Got value {}", n);
}
if let GeneratorState::Complete(()) = gen.resume() {
()
};
}
# enum GeneratorState<Y, R> {
# Yielded(Y),
# Complete(R),
# }
#
# trait Generator {
# type Yield;
# type Return;
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
# }
#
# enum GeneratorA {
# Enter,
# Yield1 {
# to_borrow: String,
# borrowed: *const String,
# },
# Exit,
# }
#
# impl GeneratorA {
# fn start() -> Self {
# GeneratorA::Enter
# }
# }
# impl Generator for GeneratorA {
# type Yield = usize;
# type Return = ();
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
# match self {
# GeneratorA::Enter => {
# let to_borrow = String::from("Hello");
# let borrowed = &to_borrow;
# let res = borrowed.len();
# *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
#
# // We set the self-reference here
# if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
# *borrowed = to_borrow;
# }
#
# GeneratorState::Yielded(res)
# }
#
# GeneratorA::Yield1 {borrowed, ..} => {
# let borrowed: &String = unsafe {&**borrowed};
# println!("{} world", borrowed);
# *self = GeneratorA::Exit;
# GeneratorState::Complete(())
# }
# GeneratorA::Exit => panic!("Can't advance an exited generator!"),
# }
# }
# }
```
The problem however is that in safe Rust we can still do this:
_Run the code and compare the results. Do you see the problem?_
```rust
pub fn main() {
let mut gen = GeneratorA::start();
let mut gen2 = GeneratorA::start();
if let GeneratorState::Yielded(n) = gen.resume() {
println!("Got value {}", n);
}
std::mem::swap(&mut gen, &mut gen2); // <--- Big problem!
if let GeneratorState::Yielded(n) = gen2.resume() {
println!("Got value {}", n);
}
// This would now start gen2 since we swapped them.
if let GeneratorState::Complete(()) = gen.resume() {
()
};
}
# enum GeneratorState<Y, R> {
# Yielded(Y),
# Complete(R),
# }
#
# trait Generator {
# type Yield;
# type Return;
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
# }
#
# enum GeneratorA {
# Enter,
# Yield1 {
# to_borrow: String,
# borrowed: *const String,
# },
# Exit,
# }
#
# impl GeneratorA {
# fn start() -> Self {
# GeneratorA::Enter
# }
# }
# impl Generator for GeneratorA {
# type Yield = usize;
# type Return = ();
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
# match self {
# GeneratorA::Enter => {
# let to_borrow = String::from("Hello");
# let borrowed = &to_borrow;
# let res = borrowed.len();
# *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
#
# // We set the self-reference here
# if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
# *borrowed = to_borrow;
# }
#
# GeneratorState::Yielded(res)
# }
#
# GeneratorA::Yield1 {borrowed, ..} => {
# let borrowed: &String = unsafe {&**borrowed};
# println!("{} world", borrowed);
# *self = GeneratorA::Exit;
# GeneratorState::Complete(())
# }
# GeneratorA::Exit => panic!("Can't advance an exited generator!"),
# }
# }
# }
```
Wait? What happened to "Hello"?
Turns out that while the example above compiles
just fine, we expose consumers of this this API to both possible undefined
behavior and other memory errors while using just safe Rust. This is a big
problem!
We'll explain exactly what happened using a slightly simpler example in the next
chapter and we'll fix our generator using `Pin` so join me as we explore
the last topic before we implement our main Futures example.
1. Box the value and thereby allocating it on the heap
2. Use `unsafe` and pin the value to the stack. The user knows that if they move
the value afterwards it will violate the guarantee they promise to uphold when
they did their unsafe implementation.
Hopefully, after this you'll have an idea of what happens when you use the
`yield` or `await` keywords inside an async function, and why we need `Pin` if
we want to be able to safely borrow across `yield/await` points.
## Bonus section - self referential generators in Rust today

View File

@@ -1,98 +1,63 @@
# Pin
> **Relevant for**
> **Overview**
>
> 1. Understanding `Generators` and `Futures`
> 2. Knowing how to use `Pin` is required when implementing your own `Future`
> 3. Understanding how to make self-referential types safe to use in Rust
> 4. Learning how borrowing across `await` points is accomplished
> 1. Learn how to use `Pin` and why it's required when implementing your own `Future`
> 2. Understand how to make self-referential types safe to use in Rust
> 3. Learn how borrowing across `await` points is accomplished
> 4. Get a set of practical rules to help you work with `Pin`
>
> `Pin` was suggested in [RFC#2349][rfc2349]
We already got a brief introduction of `Pin` in the previous chapters, so we'll
start off without any further introduction.
Let's jump strait to some definitions and then create 10 rules to remember when
we work with `Pin`.
Let's jump strait to it. Pinning is one of those subjects which is hard to wrap
your head around in the start, but once you unlock a mental model for it
it gets significantly easier to reason about.
## Definitions
Pin is only relevant for pointers. A reference to an object is a pointer.
Pin consists of the `Pin` type and the `Unpin` marker. Pin's purpose in life is
to govern the rules that need to apply for types which implement `!Unpin`.
Pin is only relevant for pointers. A reference to an object is a pointer.
Yep, you're right, that's double negation right there. `!Unpin` means
"not-un-pin".
_This naming scheme is Rust deliberately testing if you're too tired to safely implement a type with this marker. If you're starting to get confused by
`!Unpin` it's a good sign that it's time to lay down the work and start over
tomorrow with a fresh mind._
> _This naming scheme is one of Rusts safety features where it deliberately
> tests if you're too tired to safely implement a type with this marker. If
> you're starting to get confused, or even angry, by `!Unpin` it's a good sign
> that it's time to lay down the work and start over tomorrow with a fresh mind._
> On a more serious note, I feel obliged to mention that there are valid reasons for the names
> that were chosen. If you want to you can read a bit of the discussion from the
> [internals thread][internals_unpin]. One of the best takeaways from there in my eyes
> is this quote from `tmandry`:
>
> _Think of taking a thumbtack out of a cork board so you can tweak how a flyer looks. For Unpin types, this unpinning is directly supported by the type; you can do this implicitly. You can even swap out the object with another before you put the pin back. For other types, you must be much more careful._
On a more serious note, I feel obliged to mention that there are valid reasons
for the names that were chosen. Naming is not easy, and I considered renaming
`Unpin` and `!Unpin` in this book to make them easier to reason about.
However, an experienced member of the Rust community convinced me that that there
is just too many nuances and edge-cases to consider which is easily overlooked when
naively giving these markers different names, and I'm convinced that we'll
just have to get used to them and use them as is.
For the next paragraph we'll rename these markers to:
If you want to you can read a bit of the discussion from the
[internals thread][internals_unpin]. One of the best takeaways from there in my
eyes is this quote from `tmandry`:
> `!Unpin` = `MustStay` and `Unpin` = `CanMove`
>_Think of taking a thumbtack out of a cork board so you can tweak how a flyer
looks. For Unpin types, this unpinning is directly supported by the type; you
can do this implicitly. You can even swap out the object with another before you
put the pin back. For other types, you must be much more careful._
It just makes it much easier to talk about them.
## Pinning and self-referential structs
## Rules to remember
Let's start where we left off in the last chapter by making the problem we
saw using a self-referential struct in our generator a lot simpler by making
some self-referential structs that are easier to reason about than our
state machines:
1. If `T: CanMove` (which is the default), then `Pin<'a, T>` is entirely equivalent to `&'a mut T`. in other words: `CanMove` means it's OK for this type to be moved even when pinned, so `Pin` will have no effect on such a type.
For now our example will look like this:
2. Getting a `&mut T` to a pinned pointer requires unsafe if `T: MustStay`. In other words: requiring a pinned pointer to a type which is `MustStay` prevents the _user_ of that API from moving that value unless it choses to write `unsafe` code.
3. Pinning does nothing special with memory allocation like putting it into some "read only" memory or anything fancy. It only tells the compiler that some operations on this value should be forbidden.
4. Most standard library types implement `CanMove`. The same goes for most
"normal" types you encounter in Rust. `Futures` and `Generators` are two
exceptions.
5. The main use case for `Pin` is to allow self referential types, the whole
justification for stabilizing them was to allow that. There are still corner
cases in the API which are being explored.
6. The implementation behind objects that are `MustStay` is most likely unsafe.
Moving such a type can cause the universe to crash. As of the time of writing
this book, creating and reading fields of a self referential struct still requires `unsafe`.
7. You can add a `MustStay` bound on a type on nightly with a feature flag, or
by adding `std::marker::PhantomPinned` to your type on stable.
8. You can either pin a value to memory on the stack or on the heap.
9. Pinning a `MustStay` pointer to the stack requires `unsafe`
10. Pinning a `MustStay` pointer to the heap does not require `unsafe`. There is a shortcut for doing this using `Box::pin`.
> Unsafe code does not mean it's literally "unsafe", it only relieves the
> guarantees you normally get from the compiler. An `unsafe` implementation can
> be perfectly safe to do, but you have no safety net.
Let's take a look at an example:
```rust,editable
```rust, ignore
use std::pin::Pin;
fn main() {
let mut test1 = Test::new("test1");
test1.init();
let mut test2 = Test::new("test2");
test2.init();
println!("a: {}, b: {}", test1.a(), test1.b());
std::mem::swap(&mut test1, &mut test2); // try commenting out this line
println!("a: {}, b: {}", test2.a(), test2.b());
}
#[derive(Debug)]
struct Test {
a: String,
@@ -107,7 +72,7 @@ impl Test {
b: std::ptr::null(),
}
}
fn init(&mut self) {
let self_ref: *const String = &self.a;
self.b = self_ref;
@@ -133,26 +98,177 @@ possible.
`a` and `b`. Since `b` is a reference to `a` we store it as a pointer since
the borrowing rules of Rust doesn't allow us to define this lifetime.
Now, let's use this example to explain the problem we encounter in detail. As
you see, this works as expected:
```rust
fn main() {
let mut test1 = Test::new("test1");
test1.init();
let mut test2 = Test::new("test2");
test2.init();
println!("a: {}, b: {}", test1.a(), test1.b());
println!("a: {}, b: {}", test2.a(), test2.b());
}
# use std::pin::Pin;
# #[derive(Debug)]
# struct Test {
# a: String,
# b: *const String,
# }
#
# impl Test {
# fn new(txt: &str) -> Self {
# let a = String::from(txt);
# Test {
# a,
# b: std::ptr::null(),
# }
# }
#
# // We need an `init` method to actually set our self-reference
# fn init(&mut self) {
# let self_ref: *const String = &self.a;
# self.b = self_ref;
# }
#
# fn a(&self) -> &str {
# &self.a
# }
#
# fn b(&self) -> &String {
# unsafe {&*(self.b)}
# }
# }
```
In our main method we first instantiate two instances of `Test` and print out
the value of the fields on `test1`. We get:
the value of the fields on `test1`. We get what we'd expect:
```rust, ignore
a: test1, b: test1
a: test2, b: test2
```
Let's see what happens if we swap the data stored at the memory location
which `test1` is pointing to with the data stored at the memory location
`test2` is pointing to and vice a versa.
Next we swap the data stored at the memory location which `test1` is pointing to
with the data stored at the memory location `test2` is pointing to and vice a versa.
```rust
fn main() {
let mut test1 = Test::new("test1");
test1.init();
let mut test2 = Test::new("test2");
test2.init();
We should expect that printing the fields of `test2` should display the same as
`test1` (since the object we printed before the swap has moved there now).
println!("a: {}, b: {}", test1.a(), test1.b());
std::mem::swap(&mut test1, &mut test2);
println!("a: {}, b: {}", test2.a(), test2.b());
}
# use std::pin::Pin;
# #[derive(Debug)]
# struct Test {
# a: String,
# b: *const String,
# }
#
# impl Test {
# fn new(txt: &str) -> Self {
# let a = String::from(txt);
# Test {
# a,
# b: std::ptr::null(),
# }
# }
#
# fn init(&mut self) {
# let self_ref: *const String = &self.a;
# self.b = self_ref;
# }
#
# fn a(&self) -> &str {
# &self.a
# }
#
# fn b(&self) -> &String {
# unsafe {&*(self.b)}
# }
# }
```
Naively, we could think that what we should get a debug print of `test1` two
times like this
```rust, ignore
a: test1, b: test1
a: test1, b: test1
```
But instead we get:
```rust, ignore
a: test1, b: test1
a: test1, b: test2
```
The pointer to `b` still points to the old location. That location is now
occupied with the string "test2". This can be a bit hard to visualize so I made
a figure that i hope can help.
The pointer to `test2.b` still points to the old location which is inside `test1`
now. The struct is not self-referential anymore, it holds a pointer to a field
in a different object. That means we can't rely on the lifetime of `test2.b` to
be tied to the lifetime of `test2` anymore.
If your still not convinced, this should at least convince you:
```rust
fn main() {
let mut test1 = Test::new("test1");
test1.init();
let mut test2 = Test::new("test2");
test2.init();
println!("a: {}, b: {}", test1.a(), test1.b());
std::mem::swap(&mut test1, &mut test2);
test1.a = "I've totally changed now!".to_string();
println!("a: {}, b: {}", test2.a(), test2.b());
}
# use std::pin::Pin;
# #[derive(Debug)]
# struct Test {
# a: String,
# b: *const String,
# }
#
# impl Test {
# fn new(txt: &str) -> Self {
# let a = String::from(txt);
# Test {
# a,
# b: std::ptr::null(),
# }
# }
#
# fn init(&mut self) {
# let self_ref: *const String = &self.a;
# self.b = self_ref;
# }
#
# fn a(&self) -> &str {
# &self.a
# }
#
# fn b(&self) -> &String {
# unsafe {&*(self.b)}
# }
# }
```
That shouldn't happen. There is no serious error yet, but as you can imagine
it's easy to create serious bugs using this code.
I created a diagram to help visualize what's going on:
**Fig 1: Before and after swap**
![swap_problem](./assets/swap_problem.jpg)
@@ -160,9 +276,12 @@ a figure that i hope can help.
As you can see this results in unwanted behavior. It's easy to get this to
segfault, show UB and fail in other spectacular ways as well.
If we change the example to using `Pin` instead:
## Pinning to the stack
```rust,editable
Now, we can solve this problem by using `Pin` instead. Let's take a look at what
our example would look like then:
```rust, ignore
use std::pin::Pin;
use std::marker::PhantomPinned;
@@ -197,7 +316,18 @@ impl Test {
unsafe { &*(self.b) }
}
}
```
Now, what we've done here is pinning a stack address. That will always be
`unsafe` if our type implements `!Unpin`.
We use the same tricks here, including requiring an `init`. If we want to fix that
and let users avoid `unsafe` we need to pin our data on the heap instead which
we'll show in a second.
Let's see what happens if we run our example now:
```rust
pub fn main() {
let mut test1 = Test::new("test1");
test1.init();
@@ -206,36 +336,108 @@ pub fn main() {
test2.init();
let mut test2_pin = unsafe { Pin::new_unchecked(&mut test2) };
println!(
"a: {}, b: {}",
Test::a(test1_pin.as_ref()),
Test::b(test1_pin.as_ref())
);
// Try to uncomment this and see what happens
// std::mem::swap(test1_pin.as_mut(), test2_pin.as_mut());
println!(
"a: {}, b: {}",
Test::a(test2_pin.as_ref()),
Test::b(test2_pin.as_ref())
);
println!("a: {}, b: {}", Test::a(test1_pin.as_ref()), Test::b(test1_pin.as_ref()));
println!("a: {}, b: {}", Test::a(test2_pin.as_ref()), Test::b(test2_pin.as_ref()));
}
# use std::pin::Pin;
# use std::marker::PhantomPinned;
#
# #[derive(Debug)]
# struct Test {
# a: String,
# b: *const String,
# _marker: PhantomPinned,
# }
#
#
# impl Test {
# fn new(txt: &str) -> Self {
# let a = String::from(txt);
# Test {
# a,
# b: std::ptr::null(),
# // This makes our type `!Unpin`
# _marker: PhantomPinned,
# }
# }
# fn init(&mut self) {
# let self_ptr: *const String = &self.a;
# self.b = self_ptr;
# }
#
# fn a<'a>(self: Pin<&'a Self>) -> &'a str {
# &self.get_ref().a
# }
#
# fn b<'a>(self: Pin<&'a Self>) -> &'a String {
# unsafe { &*(self.b) }
# }
# }
```
Now, what we've done here is pinning a stack address. That will always be
`unsafe` if our type implements `!Unpin` (aka `MustStay`).
Now, if we try to pull the same trick which got us in to trouble the last time
you'll get a compilation error. So t
We use some tricks here, including requiring an `init`. If we want to fix that
and let users avoid `unsafe` we need to pin our data on the heap instead.
```rust, compile_fail
pub fn main() {
let mut test1 = Test::new("test1");
test1.init();
let mut test1_pin = unsafe { Pin::new_unchecked(&mut test1) };
let mut test2 = Test::new("test2");
test2.init();
let mut test2_pin = unsafe { Pin::new_unchecked(&mut test2) };
> Stack pinning will always depend on the current stack frame we're in, so we
can't create a self referential object in one stack frame and return it since
any pointers we take to "self" is invalidated.
println!("a: {}, b: {}", Test::a(test1_pin.as_ref()), Test::b(test1_pin.as_ref()));
std::mem::swap(test1_pin.as_mut(), test2_pin.as_mut());
println!("a: {}, b: {}", Test::a(test2_pin.as_ref()), Test::b(test2_pin.as_ref()));
}
# use std::pin::Pin;
# use std::marker::PhantomPinned;
#
# #[derive(Debug)]
# struct Test {
# a: String,
# b: *const String,
# _marker: PhantomPinned,
# }
#
#
# impl Test {
# fn new(txt: &str) -> Self {
# let a = String::from(txt);
# Test {
# a,
# b: std::ptr::null(),
# // This makes our type `!Unpin`
# _marker: PhantomPinned,
# }
# }
# fn init(&mut self) {
# let self_ptr: *const String = &self.a;
# self.b = self_ptr;
# }
#
# fn a<'a>(self: Pin<&'a Self>) -> &'a str {
# &self.get_ref().a
# }
#
# fn b<'a>(self: Pin<&'a Self>) -> &'a String {
# unsafe { &*(self.b) }
# }
# }
```
The next example solves some of our friction at the cost of a heap allocation.
> It's important to note that stack pinning will always depend on the current
> stack frame we're in, so we can't create a self referential object in one
> stack frame and return it since any pointers we take to "self" is invalidated.
```rust, editbable
## Pinning to the heap
For completeness let's remove some unsafe and the need for an `init` method
at the cost of a heap allocation. Pinning to the heap is safe so the user
doesn't need to implement any unsafe code:
```rust
use std::pin::Pin;
use std::marker::PhantomPinned;
@@ -275,9 +477,6 @@ pub fn main() {
let mut test2 = Test::new("test2");
println!("a: {}, b: {}",test1.as_ref().a(), test1.as_ref().b());
// Try to uncomment this and see what happens
// std::mem::swap(&mut test1, &mut test2);
println!("a: {}, b: {}",test2.as_ref().a(), test2.as_ref().b());
}
```
@@ -291,6 +490,47 @@ that the self-referential pointer stays valid.
There are ways to safely give some guarantees on stack pinning as well, but right
now you need to use a crate like [pin_project][pin_project] to do that.
## Practical rules for Pinning
1. If `T: Unpin` (which is the default), then `Pin<'a, T>` is entirely
equivalent to `&'a mut T`. in other words: `Unpin` means it's OK for this type
to be moved even when pinned, so `Pin` will have no effect on such a type.
2. Getting a `&mut T` to a pinned pointer requires unsafe if `T: !Unpin`. In
other words: requiring a pinned pointer to a type which is `!Unpin` prevents
the _user_ of that API from moving that value unless it choses to write `unsafe`
code.
3. Pinning does nothing special with memory allocation like putting it into some
"read only" memory or anything fancy. It only tells the compiler that some
operations on this value should be forbidden.
4. Most standard library types implement `Unpin`. The same goes for most
"normal" types you encounter in Rust. `Futures` and `Generators` are two
exceptions.
5. The main use case for `Pin` is to allow self referential types, the whole
justification for stabilizing them was to allow that. There are still corner
cases in the API which are being explored.
6. The implementation behind objects that are `!Unpin` is most likely unsafe.
Moving such a type can cause the universe to crash. As of the time of writing
this book, creating and reading fields of a self referential struct still requires `unsafe`.
7. You can add a `!Unpin` bound on a type on nightly with a feature flag, or
by adding `std::marker::PhantomPinned` to your type on stable.
8. You can either pin a value to memory on the stack or on the heap.
9. Pinning a `!Unpin` pointer to the stack requires `unsafe`
10. Pinning a `!Unpin` pointer to the heap does not require `unsafe`. There is a shortcut for doing this using `Box::pin`.
> Unsafe code does not mean it's literally "unsafe", it only relieves the
> guarantees you normally get from the compiler. An `unsafe` implementation can
> be perfectly safe to do, but you have no safety net.
### Projection/structural pinning
In short, projection is a programming language term. `mystruct.field1` is a
@@ -311,4 +551,133 @@ we're soon finished.
[rfc2349]: https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md
[pin_project]: https://docs.rs/pin-project/
[internals_unpin]: https://internals.rust-lang.org/t/naming-pin-anchor-move/6864/12
[internals_unpin]: https://internals.rust-lang.org/t/naming-pin-anchor-move/6864/12
## Bonus section: Fixing our self-referential generator and learning more about Pin
But now, let's prevent this problem using `Pin`. We'll discuss
`Pin` more in the next chapter, but you'll get an introduction here by just
reading the comments.
```rust
#![feature(optin_builtin_traits)] // needed to implement `!Unpin`
use std::pin::Pin;
pub fn main() {
let gen1 = GeneratorA::start();
let gen2 = GeneratorA::start();
// Before we pin the pointers, this is safe to do
// std::mem::swap(&mut gen, &mut gen2);
// constructing a `Pin::new()` on a type which does not implement `Unpin` is
// unsafe. A value pinned to heap can be constructed while staying in safe
// Rust so we can use that to avoid unsafe. You can also use crates like
// `pin_utils` to pin to the stack safely, just remember that they use
// unsafe under the hood so it's like using an already-reviewed unsafe
// implementation.
let mut pinned1 = Box::pin(gen1);
let mut pinned2 = Box::pin(gen2);
// Uncomment these if you think it's safe to pin the values to the stack instead
// (it is in this case). Remember to comment out the two previous lines first.
//let mut pinned1 = unsafe { Pin::new_unchecked(&mut gen1) };
//let mut pinned2 = unsafe { Pin::new_unchecked(&mut gen2) };
if let GeneratorState::Yielded(n) = pinned1.as_mut().resume() {
println!("Gen1 got value {}", n);
}
if let GeneratorState::Yielded(n) = pinned2.as_mut().resume() {
println!("Gen2 got value {}", n);
};
// This won't work:
// std::mem::swap(&mut gen, &mut gen2);
// This will work but will just swap the pointers so nothing bad happens here:
// std::mem::swap(&mut pinned1, &mut pinned2);
let _ = pinned1.as_mut().resume();
let _ = pinned2.as_mut().resume();
}
enum GeneratorState<Y, R> {
Yielded(Y),
Complete(R),
}
trait Generator {
type Yield;
type Return;
fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return>;
}
enum GeneratorA {
Enter,
Yield1 {
to_borrow: String,
borrowed: *const String,
},
Exit,
}
impl GeneratorA {
fn start() -> Self {
GeneratorA::Enter
}
}
// This tells us that the underlying pointer is not safe to move after pinning.
// In this case, only we as implementors "feel" this, however, if someone is
// relying on our Pinned pointer this will prevent them from moving it. You need
// to enable the feature flag `#![feature(optin_builtin_traits)]` and use the
// nightly compiler to implement `!Unpin`. Normally, you would use
// `std::marker::PhantomPinned` to indicate that the struct is `!Unpin`.
impl !Unpin for GeneratorA { }
impl Generator for GeneratorA {
type Yield = usize;
type Return = ();
fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return> {
// lets us get ownership over current state
let this = unsafe { self.get_unchecked_mut() };
match this {
GeneratorA::Enter => {
let to_borrow = String::from("Hello");
let borrowed = &to_borrow;
let res = borrowed.len();
*this = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
// Trick to actually get a self reference. We can't reference
// the `String` earlier since these references will point to the
// location in this stack frame which will not be valid anymore
// when this function returns.
if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
*borrowed = to_borrow;
}
GeneratorState::Yielded(res)
}
GeneratorA::Yield1 {borrowed, ..} => {
let borrowed: &String = unsafe {&**borrowed};
println!("{} world", borrowed);
*this = GeneratorA::Exit;
GeneratorState::Complete(())
}
GeneratorA::Exit => panic!("Can't advance an exited generator!"),
}
}
}
```
Now, as you see, the consumer of this API must either:
1. Box the value and thereby allocating it on the heap
2. Use `unsafe` and pin the value to the stack. The user knows that if they move
the value afterwards it will violate the guarantee they promise to uphold when
they did their unsafe implementation.
Hopefully, after this you'll have an idea of what happens when you use the
`yield` or `await` keywords inside an async function, and why we need `Pin` if
we want to be able to safely borrow across `yield/await` points.

View File

@@ -2,7 +2,8 @@
[Introduction](./introduction.md)
- [Some background information](./1_background_information.md)
- [Background information](./0_background_information.md)
- [Futures in Rust](./1_futures_in_rust.md)
- [Waker and Context](./2_waker_context.md)
- [Generators](./3_generators_pin.md)
- [Pin](./4_pin.md)