660 lines
20 KiB
Markdown
660 lines
20 KiB
Markdown
# Generators and async/await
|
|
|
|
>**Overview:**
|
|
>
|
|
>- Understand how the async/await syntax works under the hood
|
|
>- See first hand why we need `Pin`
|
|
>- Understand what makes Rust's async model very memory efficient
|
|
>
|
|
>The motivation for `Generator`s can be found in [RFC#2033][rfc2033]. It's very
|
|
>well written and I can recommend reading through it (it talks as much about
|
|
>async/await as it does about generators).
|
|
|
|
## Why learn about generators?
|
|
|
|
Generators/yield and async/await are so similar that once you understand one
|
|
you should be able to understand the other.
|
|
|
|
It's much easier for me to provide runnable and short examples using Generators
|
|
instead of Futures which require us to introduce a lot of concepts now that
|
|
we'll cover later just to show an example.
|
|
|
|
Async/await works like generators but instead of returning a generator it returns
|
|
a special object implementing the Future trait.
|
|
|
|
A small bonus is that you'll have a pretty good introduction to both Generators
|
|
and Async/Await by the end of this chapter.
|
|
|
|
Basically, there were three main options discussed when designing how Rust would
|
|
handle concurrency:
|
|
|
|
1. Stackful coroutines, better known as green threads.
|
|
2. Using combinators.
|
|
3. Stackless coroutines, better known as generators.
|
|
|
|
We covered [green threads in the background information](0_background_information.md#green-threads)
|
|
so we won't repeat that here. We'll concentrate on the variants of stackless
|
|
coroutines which Rust uses today.
|
|
|
|
### Combinators
|
|
|
|
`Futures 0.1` used combinators. If you've worked with Promises in JavaScript,
|
|
you already know combinators. In Rust they look like this:
|
|
|
|
```rust,noplaypen,ignore
|
|
let future = Connection::connect(conn_str).and_then(|conn| {
|
|
conn.query("somerequest").map(|row|{
|
|
SomeStruct::from(row)
|
|
}).collect::<Vec<SomeStruct>>()
|
|
});
|
|
|
|
let rows: Result<Vec<SomeStruct>, SomeLibraryError> = block_on(future);
|
|
|
|
```
|
|
|
|
**There are mainly three downsides I'll focus on using this technique:**
|
|
|
|
1. The error messages produced could be extremely long and arcane
|
|
2. Not optimal memory usage
|
|
3. Did not allow borrowing across combinator steps.
|
|
|
|
Point #3, is actually a major drawback with `Futures 0.1`.
|
|
|
|
Not allowing borrows across suspension points ends up being very
|
|
un-ergonomic and to accomplish some tasks it requires extra allocations or
|
|
copying which is inefficient.
|
|
|
|
The reason for the higher than optimal memory usage is that this is basically
|
|
a callback-based approach, where each closure stores all the data it needs
|
|
for computation. This means that as we chain these, the memory required to store
|
|
the needed state increases with each added step.
|
|
|
|
### Stackless coroutines/generators
|
|
|
|
This is the model used in Rust today. It has a few notable advantages:
|
|
|
|
1. It's easy to convert normal Rust code to a stackless coroutine using
|
|
async/await as keywords (it can even be done using a macro).
|
|
2. No need for context switching and saving/restoring CPU state
|
|
3. No need to handle dynamic stack allocation
|
|
4. Very memory efficient
|
|
5. Allows us to borrow across suspension points
|
|
|
|
The last point is in contrast to `Futures 0.1`. With async/await we can do this:
|
|
|
|
```rust, ignore
|
|
async fn myfn() {
|
|
let text = String::from("Hello world");
|
|
let borrowed = &text[0..5];
|
|
somefuture.await;
|
|
println!("{}", borrowed);
|
|
}
|
|
```
|
|
|
|
Async in Rust is implemented using Generators. So to understand how async really
|
|
works we need to understand generators first. Generators in Rust are implemented
|
|
as state machines.
|
|
|
|
The memory footprint of a chain of computations is defined by _the largest footprint
|
|
that a single step requires_.
|
|
|
|
That means that adding steps to a chain of computations might not require any
|
|
increased memory at all and it's one of the reasons why Futures and Async in
|
|
Rust has very little overhead.
|
|
|
|
## How generators work
|
|
|
|
In Nightly Rust today you can use the `yield` keyword. Basically using this
|
|
keyword in a closure, converts it to a generator. A closure could look like this
|
|
before we had a concept of `Pin`:
|
|
|
|
```rust,noplaypen,ignore
|
|
#![feature(generators, generator_trait)]
|
|
use std::ops::{Generator, GeneratorState};
|
|
|
|
fn main() {
|
|
let a: i32 = 4;
|
|
let mut gen = move || {
|
|
println!("Hello");
|
|
yield a * 2;
|
|
println!("world!");
|
|
};
|
|
|
|
if let GeneratorState::Yielded(n) = gen.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
if let GeneratorState::Complete(()) = gen.resume() {
|
|
()
|
|
};
|
|
}
|
|
```
|
|
|
|
Early on, before there was a consensus about the design of `Pin`, this
|
|
compiled to something looking similar to this:
|
|
|
|
```rust
|
|
fn main() {
|
|
let mut gen = GeneratorA::start(4);
|
|
|
|
if let GeneratorState::Yielded(n) = gen.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
if let GeneratorState::Complete(()) = gen.resume() {
|
|
()
|
|
};
|
|
}
|
|
|
|
// If you've ever wondered why the parameters are called Y and R the naming from
|
|
// the original rfc most likely holds the answer
|
|
enum GeneratorState<Y, R> {
|
|
Yielded(Y), // originally called `Yield(Y)`
|
|
Complete(R), // originally called `Return(R)`
|
|
}
|
|
|
|
trait Generator {
|
|
type Yield;
|
|
type Return;
|
|
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
|
|
}
|
|
|
|
enum GeneratorA {
|
|
Enter(i32),
|
|
Yield1(i32),
|
|
Exit,
|
|
}
|
|
|
|
impl GeneratorA {
|
|
fn start(a1: i32) -> Self {
|
|
GeneratorA::Enter(a1)
|
|
}
|
|
}
|
|
|
|
impl Generator for GeneratorA {
|
|
type Yield = i32;
|
|
type Return = ();
|
|
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
|
|
// lets us get ownership over current state
|
|
match std::mem::replace(self, GeneratorA::Exit) {
|
|
GeneratorA::Enter(a1) => {
|
|
|
|
/*----code before yield----*/
|
|
println!("Hello");
|
|
let a = a1 * 2;
|
|
|
|
*self = GeneratorA::Yield1(a);
|
|
GeneratorState::Yielded(a)
|
|
}
|
|
|
|
GeneratorA::Yield1(_) => {
|
|
/*-----code after yield-----*/
|
|
println!("world!");
|
|
|
|
*self = GeneratorA::Exit;
|
|
GeneratorState::Complete(())
|
|
}
|
|
GeneratorA::Exit => panic!("Can't advance an exited generator!"),
|
|
}
|
|
}
|
|
}
|
|
|
|
```
|
|
|
|
>The `yield` keyword was discussed first in [RFC#1823][rfc1823] and in [RFC#1832][rfc1832].
|
|
|
|
Now that you know that the `yield` keyword in reality rewrites your code to become a state machine,
|
|
you'll also know the basics of how `await` works. It's very similar.
|
|
|
|
Now, there are some limitations in our naive state machine above. What happens when you have a
|
|
`borrow` across a `yield` point?
|
|
|
|
We could forbid that, but **one of the major design goals for the async/await syntax has been
|
|
to allow this**. These kinds of borrows were not possible using `Futures 0.1` so we can't let this
|
|
limitation just slip and call it a day yet.
|
|
|
|
Instead of discussing it in theory, let's look at some code.
|
|
|
|
> We'll use the optimized version of the state machines which is used in Rust today. For a more
|
|
> in depth explanation see [Tyler Mandry's excellent article: How Rust optimizes async/await][optimizing-await]
|
|
|
|
```rust,noplaypen,ignore
|
|
let mut generator = move || {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
yield borrowed.len();
|
|
println!("{} world!", borrowed);
|
|
};
|
|
```
|
|
|
|
We'll be hand-coding some versions of a state-machines representing a state
|
|
machine for the generator defined above.
|
|
|
|
We step through each step "manually" in every example, so it looks pretty
|
|
unfamiliar. We could add some syntactic sugar like implementing the `Iterator`
|
|
trait for our generators which would let us do this:
|
|
|
|
```rust, ignore
|
|
while let Some(val) = generator.next() {
|
|
println!("{}", val);
|
|
}
|
|
```
|
|
|
|
It's a pretty trivial change to make, but this chapter is already getting long.
|
|
Just keep this in the back of your head as we move forward.
|
|
|
|
Now what does our rewritten state machine look like with this example?
|
|
|
|
```rust,compile_fail
|
|
# enum GeneratorState<Y, R> {
|
|
# Yielded(Y),
|
|
# Complete(R),
|
|
# }
|
|
#
|
|
# trait Generator {
|
|
# type Yield;
|
|
# type Return;
|
|
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
|
|
# }
|
|
|
|
enum GeneratorA {
|
|
Enter,
|
|
Yield1 {
|
|
to_borrow: String,
|
|
borrowed: &String, // uh, what lifetime should this have?
|
|
},
|
|
Exit,
|
|
}
|
|
|
|
# impl GeneratorA {
|
|
# fn start() -> Self {
|
|
# GeneratorA::Enter
|
|
# }
|
|
# }
|
|
|
|
impl Generator for GeneratorA {
|
|
type Yield = usize;
|
|
type Return = ();
|
|
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
|
|
// lets us get ownership over current state
|
|
match std::mem::replace(self, GeneratorA::Exit) {
|
|
GeneratorA::Enter => {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow; // <--- NB!
|
|
let res = borrowed.len();
|
|
|
|
*self = GeneratorA::Yield1 {to_borrow, borrowed};
|
|
GeneratorState::Yielded(res)
|
|
}
|
|
|
|
GeneratorA::Yield1 {to_borrow, borrowed} => {
|
|
println!("Hello {}", borrowed);
|
|
*self = GeneratorA::Exit;
|
|
GeneratorState::Complete(())
|
|
}
|
|
GeneratorA::Exit => panic!("Can't advance an exited generator!"),
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you try to compile this you'll get an error (just try it yourself by pressing play).
|
|
|
|
What is the lifetime of `&String`. It's not the same as the lifetime of `Self`. It's not `static`.
|
|
Turns out that it's not possible for us in Rust's syntax to describe this lifetime, which means, that
|
|
to make this work, we'll have to let the compiler know that _we_ control this correctly ourselves.
|
|
|
|
That means turning to unsafe.
|
|
|
|
Let's try to write an implementation that will compile using `unsafe`. As you'll
|
|
see we end up in a _self-referential struct_. A struct which holds references
|
|
into itself.
|
|
|
|
As you'll notice, this compiles just fine!
|
|
|
|
```rust
|
|
enum GeneratorState<Y, R> {
|
|
Yielded(Y),
|
|
Complete(R),
|
|
}
|
|
|
|
trait Generator {
|
|
type Yield;
|
|
type Return;
|
|
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
|
|
}
|
|
|
|
enum GeneratorA {
|
|
Enter,
|
|
Yield1 {
|
|
to_borrow: String,
|
|
borrowed: *const String, // NB! This is now a raw pointer!
|
|
},
|
|
Exit,
|
|
}
|
|
|
|
impl GeneratorA {
|
|
fn start() -> Self {
|
|
GeneratorA::Enter
|
|
}
|
|
}
|
|
impl Generator for GeneratorA {
|
|
type Yield = usize;
|
|
type Return = ();
|
|
fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
|
|
match self {
|
|
GeneratorA::Enter => {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
let res = borrowed.len();
|
|
*self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
|
|
|
|
// NB! And we set the pointer to reference the to_borrow string here
|
|
if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
|
|
*borrowed = to_borrow;
|
|
}
|
|
|
|
GeneratorState::Yielded(res)
|
|
}
|
|
|
|
GeneratorA::Yield1 {borrowed, ..} => {
|
|
let borrowed: &String = unsafe {&**borrowed};
|
|
println!("{} world", borrowed);
|
|
*self = GeneratorA::Exit;
|
|
GeneratorState::Complete(())
|
|
}
|
|
GeneratorA::Exit => panic!("Can't advance an exited generator!"),
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Remember that our example is the generator we created which looked like this:
|
|
|
|
```rust,noplaypen,ignore
|
|
let mut gen = move || {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
yield borrowed.len();
|
|
println!("{} world!", borrowed);
|
|
};
|
|
```
|
|
|
|
Below is an example of how we could run this state-machine and as you see it
|
|
does what we'd expect. But there is still one huge problem with this:
|
|
|
|
```rust
|
|
pub fn main() {
|
|
let mut gen = GeneratorA::start();
|
|
let mut gen2 = GeneratorA::start();
|
|
|
|
if let GeneratorState::Yielded(n) = gen.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
if let GeneratorState::Yielded(n) = gen2.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
if let GeneratorState::Complete(()) = gen.resume() {
|
|
()
|
|
};
|
|
}
|
|
# enum GeneratorState<Y, R> {
|
|
# Yielded(Y),
|
|
# Complete(R),
|
|
# }
|
|
#
|
|
# trait Generator {
|
|
# type Yield;
|
|
# type Return;
|
|
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
|
|
# }
|
|
#
|
|
# enum GeneratorA {
|
|
# Enter,
|
|
# Yield1 {
|
|
# to_borrow: String,
|
|
# borrowed: *const String,
|
|
# },
|
|
# Exit,
|
|
# }
|
|
#
|
|
# impl GeneratorA {
|
|
# fn start() -> Self {
|
|
# GeneratorA::Enter
|
|
# }
|
|
# }
|
|
# impl Generator for GeneratorA {
|
|
# type Yield = usize;
|
|
# type Return = ();
|
|
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
|
|
# match self {
|
|
# GeneratorA::Enter => {
|
|
# let to_borrow = String::from("Hello");
|
|
# let borrowed = &to_borrow;
|
|
# let res = borrowed.len();
|
|
# *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
|
|
#
|
|
# // We set the self-reference here
|
|
# if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
|
|
# *borrowed = to_borrow;
|
|
# }
|
|
#
|
|
# GeneratorState::Yielded(res)
|
|
# }
|
|
#
|
|
# GeneratorA::Yield1 {borrowed, ..} => {
|
|
# let borrowed: &String = unsafe {&**borrowed};
|
|
# println!("{} world", borrowed);
|
|
# *self = GeneratorA::Exit;
|
|
# GeneratorState::Complete(())
|
|
# }
|
|
# GeneratorA::Exit => panic!("Can't advance an exited generator!"),
|
|
# }
|
|
# }
|
|
# }
|
|
```
|
|
|
|
The problem is that in safe Rust we can still do this:
|
|
|
|
_Run the code and compare the results. Do you see the problem?_
|
|
|
|
```rust, should_panic
|
|
# #![feature(never_type)] // Force nightly compiler to be used in playground
|
|
# // by betting on it's true that this type is named after it's stabilization date...
|
|
pub fn main() {
|
|
let mut gen = GeneratorA::start();
|
|
let mut gen2 = GeneratorA::start();
|
|
|
|
if let GeneratorState::Yielded(n) = gen.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
std::mem::swap(&mut gen, &mut gen2); // <--- Big problem!
|
|
|
|
if let GeneratorState::Yielded(n) = gen2.resume() {
|
|
println!("Got value {}", n);
|
|
}
|
|
|
|
// This would now start gen2 since we swapped them.
|
|
if let GeneratorState::Complete(()) = gen.resume() {
|
|
()
|
|
};
|
|
}
|
|
# enum GeneratorState<Y, R> {
|
|
# Yielded(Y),
|
|
# Complete(R),
|
|
# }
|
|
#
|
|
# trait Generator {
|
|
# type Yield;
|
|
# type Return;
|
|
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
|
|
# }
|
|
#
|
|
# enum GeneratorA {
|
|
# Enter,
|
|
# Yield1 {
|
|
# to_borrow: String,
|
|
# borrowed: *const String,
|
|
# },
|
|
# Exit,
|
|
# }
|
|
#
|
|
# impl GeneratorA {
|
|
# fn start() -> Self {
|
|
# GeneratorA::Enter
|
|
# }
|
|
# }
|
|
# impl Generator for GeneratorA {
|
|
# type Yield = usize;
|
|
# type Return = ();
|
|
# fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
|
|
# match self {
|
|
# GeneratorA::Enter => {
|
|
# let to_borrow = String::from("Hello");
|
|
# let borrowed = &to_borrow;
|
|
# let res = borrowed.len();
|
|
# *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
|
|
#
|
|
# // We set the self-reference here
|
|
# if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
|
|
# *borrowed = to_borrow;
|
|
# }
|
|
#
|
|
# GeneratorState::Yielded(res)
|
|
# }
|
|
#
|
|
# GeneratorA::Yield1 {borrowed, ..} => {
|
|
# let borrowed: &String = unsafe {&**borrowed};
|
|
# println!("{} world", borrowed);
|
|
# *self = GeneratorA::Exit;
|
|
# GeneratorState::Complete(())
|
|
# }
|
|
# GeneratorA::Exit => panic!("Can't advance an exited generator!"),
|
|
# }
|
|
# }
|
|
# }
|
|
```
|
|
|
|
Wait? What happened to "Hello"? And why did our code segfault?
|
|
|
|
Turns out that while the example above compiles just fine, we expose consumers
|
|
of this this API to both possible undefined behavior and other memory errors
|
|
while using just safe Rust. This is a big problem!
|
|
|
|
> I've actually forced the code above to use the nightly version of the compiler.
|
|
> If you run [the example above on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5cbe9897c0e23a502afd2740c7e78b98),
|
|
> you'll see that it runs without panicking on the current stable (1.42.0) but
|
|
> panics on the current nightly (1.44.0). Scary!
|
|
|
|
We'll explain exactly what happened here using a slightly simpler example in the next
|
|
chapter and we'll fix our generator using `Pin` so don't worry, you'll see exactly
|
|
what goes wrong and see how `Pin` can help us deal with self-referential types safely in a
|
|
second.
|
|
|
|
Before we go and explain the problem in detail, let's finish off this chapter
|
|
by looking at how generators and the async keyword is related.
|
|
|
|
## Async and generators
|
|
|
|
Futures in Rust are implemented as state machines much the same way Generators
|
|
are state machines.
|
|
|
|
You might have noticed the similarities in the syntax used in async blocks and
|
|
the syntax used in generators:
|
|
|
|
```rust, ignore
|
|
let mut gen = move || {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
yield borrowed.len();
|
|
println!("{} world!", borrowed);
|
|
};
|
|
```
|
|
|
|
Compare that with a similar example using async blocks:
|
|
|
|
```rust, ignore
|
|
let mut fut = async {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
SomeResource::some_task().await;
|
|
println!("{} world!", borrowed);
|
|
};
|
|
```
|
|
|
|
The difference is that Futures have different states than what a `Generator` would
|
|
have.
|
|
|
|
An async block will return a `Future` instead of a `Generator`, however, the way
|
|
a Future works and the way a Generator work internally is similar.
|
|
|
|
Instead of calling `Generator::resume` we call `Future::poll`, and instead of
|
|
returning `Yielded` or `Complete` it returns `Pending` or `Ready`. Each `await`
|
|
point in a future is like a `yield` point in a generator.
|
|
|
|
Do you see how they're connected now?
|
|
|
|
Thats why knowing how generators work and the challenges they pose also teaches
|
|
you how futures work and the challenges we need to tackle when working with them.
|
|
|
|
The same goes for the challenges of borrowing across yield/await points.
|
|
|
|
## Bonus section - self referential generators in Rust today
|
|
|
|
Thanks to [PR#45337][pr45337] you can actually run code like the one in our
|
|
example in Rust today using the `static` keyword on nightly. Try it for
|
|
yourself:
|
|
|
|
>Beware that the API is changing rapidly. As I was writing this book, generators
|
|
had an API change adding support for a "resume" argument to get passed into the
|
|
generator closure.
|
|
>
|
|
>Follow the progress on the [tracking issue #4312][issue43122] for [RFC#033][rfc2033].
|
|
|
|
```rust
|
|
#![feature(generators, generator_trait)]
|
|
use std::ops::{Generator, GeneratorState};
|
|
|
|
|
|
pub fn main() {
|
|
let gen1 = static || {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
yield borrowed.len();
|
|
println!("{} world!", borrowed);
|
|
};
|
|
|
|
let gen2 = static || {
|
|
let to_borrow = String::from("Hello");
|
|
let borrowed = &to_borrow;
|
|
yield borrowed.len();
|
|
println!("{} world!", borrowed);
|
|
};
|
|
|
|
let mut pinned1 = Box::pin(gen1);
|
|
let mut pinned2 = Box::pin(gen2);
|
|
|
|
if let GeneratorState::Yielded(n) = pinned1.as_mut().resume(()) {
|
|
println!("Gen1 got value {}", n);
|
|
}
|
|
|
|
if let GeneratorState::Yielded(n) = pinned2.as_mut().resume(()) {
|
|
println!("Gen2 got value {}", n);
|
|
};
|
|
|
|
let _ = pinned1.as_mut().resume(());
|
|
let _ = pinned2.as_mut().resume(());
|
|
}
|
|
```
|
|
|
|
[rfc2033]: https://github.com/rust-lang/rfcs/blob/master/text/2033-experimental-coroutines.md
|
|
[greenthreads]: https://cfsamson.gitbook.io/green-threads-explained-in-200-lines-of-rust/
|
|
[rfc1823]: https://github.com/rust-lang/rfcs/pull/1823
|
|
[rfc1832]: https://github.com/rust-lang/rfcs/pull/1832
|
|
[optimizing-await]: https://tmandry.gitlab.io/blog/posts/optimizing-await-1/
|
|
[pr45337]: https://github.com/rust-lang/rust/pull/45337/files
|
|
[issue43122]: https://github.com/rust-lang/rust/issues/43122
|