version3

2020-04-05 17:09:05 +02:00
parent 21db34c022
commit 9b2401b8a4
15 changed files with 2091 additions and 968 deletions
--- a/src/4_pin.md
+++ b/src/4_pin.md
@@ -1,98 +1,63 @@
 # Pin

-> **Relevant for**
+> **Overview**
 >
-> 1. Understanding `Generators` and `Futures`
-> 2. Knowing how to use `Pin` is required when implementing your own `Future`
-> 3. Understanding how to make self-referential types safe to use in Rust
-> 4. Learning how borrowing across `await` points is accomplished
+> 1. Learn how to use `Pin` and why it's required when implementing your own `Future`
+> 2. Understand how to make self-referential types safe to use in Rust
+> 3. Learn how borrowing across `await` points is accomplished
+> 4. Get a set of practical rules to help you work with `Pin`
 >
 > `Pin` was suggested in [RFC#2349][rfc2349]

-We already got a brief introduction of `Pin` in the previous chapters, so we'll
-start off without any further introduction.
-
-Let's jump strait to some definitions and then create 10 rules to remember when
-we work with `Pin`.
+Let's jump strait to it. Pinning is one of those subjects which is hard to wrap
+your head around in the start, but once you unlock a mental model for it
+it gets significantly easier to reason about.

 ## Definitions

+Pin is only relevant for pointers. A reference to an object is a pointer.
+
 Pin consists of the `Pin` type and the `Unpin` marker. Pin's purpose in life is
 to govern the rules that need to apply for types which implement `!Unpin`.

-Pin is only relevant for pointers. A reference to an object is a pointer.
-
 Yep, you're right, that's double negation right there. `!Unpin` means
 "not-un-pin".

-_This naming scheme is Rust deliberately testing if you're too tired to safely implement a type with this marker. If you're starting to get confused by
-`!Unpin` it's a good sign that it's time to lay down the work and start over
-tomorrow with a fresh mind._
+> _This naming scheme is one of Rusts safety features where it deliberately
+> tests if you're too tired to safely implement a type with this marker. If
+> you're starting to get confused, or even angry, by `!Unpin` it's a good sign
+> that it's time to lay down the work and start over tomorrow with a fresh mind._

-> On a more serious note, I feel obliged to mention that there are valid reasons for the names
-> that were chosen. If you want to you can read a bit of the discussion from the
-> [internals thread][internals_unpin]. One of the best takeaways from there in my eyes
-> is this quote from `tmandry`:
->
-> _Think of taking a thumbtack out of a cork board so you can tweak how a flyer looks. For Unpin types, this unpinning is directly supported by the type; you can do this implicitly. You can even swap out the object with another before you put the pin back. For other types, you must be much more careful._
+On a more serious note, I feel obliged to mention that there are valid reasons
+for the names that were chosen. Naming is not easy, and I considered renaming
+`Unpin` and `!Unpin` in this book to make them easier to reason about. 

+However, an experienced member of the Rust community convinced me that that there
+is just too many nuances and edge-cases to consider which is easily overlooked when
+naively giving these markers different names, and I'm convinced that we'll
+just have to get used to them and use them as is.

-For the next paragraph we'll rename these markers to:
+If you want to you can read a bit of the discussion from the
+[internals thread][internals_unpin]. One of the best takeaways from there in my
+eyes is this quote from `tmandry`:

-> `!Unpin` = `MustStay` and `Unpin` = `CanMove`
+>_Think of taking a thumbtack out of a cork board so you can tweak how a flyer
+looks. For Unpin types, this unpinning is directly supported by the type; you
+can do this implicitly. You can even swap out the object with another before you
+put the pin back. For other types, you must be much more careful._

-It just makes it much easier to talk about them.
+## Pinning and self-referential structs

-## Rules to remember
+Let's start where we left off in the last chapter by making the problem we
+saw using a self-referential struct in our generator a lot simpler by making
+some self-referential structs that are easier to reason about than our
+state machines:

-1. If `T: CanMove` (which is the default), then `Pin<'a, T>` is entirely equivalent to `&'a mut T`. in other words: `CanMove` means it's OK for this type to be moved even when pinned, so `Pin` will have no effect on such a type.
+For now our example will look like this:

-2. Getting a `&mut T` to a pinned pointer requires unsafe if `T: MustStay`. In other words: requiring a pinned pointer to a type which is `MustStay` prevents the _user_ of that API from moving that value unless it choses to write `unsafe` code.
-
-3. Pinning does nothing special with memory allocation like putting it into some "read only" memory or anything fancy. It only tells the compiler that some operations on this value should be forbidden.
-
-4. Most standard library types implement `CanMove`. The same goes for most
-"normal" types you encounter in Rust. `Futures` and `Generators` are two
-exceptions.
-
-5. The main use case for `Pin` is to allow self referential types, the whole
-justification for stabilizing them was to allow that. There are still corner
-cases in the API which are being explored.
-
-6. The implementation behind objects that are `MustStay` is most likely unsafe.
-Moving such a type can cause the universe to crash. As of the time of writing
-this book, creating and reading fields of a self referential struct still requires `unsafe`.
-
-7. You can add a `MustStay` bound on a type on nightly with a feature flag, or
-by adding `std::marker::PhantomPinned` to your type on stable.
-
-8. You can either pin a value to memory on the stack or on the heap.
-
-9. Pinning a `MustStay` pointer to the stack requires `unsafe`
-
-10. Pinning a `MustStay` pointer to the heap does not require `unsafe`. There is a shortcut for doing this using `Box::pin`.
-
-> Unsafe code does not mean it's literally "unsafe", it only relieves the 
-> guarantees you normally get from the compiler. An `unsafe` implementation can 
-> be perfectly safe to do, but you have no safety net.
-
-Let's take a look at an example:
-
-```rust,editable
+```rust, ignore
 use std::pin::Pin;

-fn main() {
-    let mut test1 = Test::new("test1");
-    test1.init();
-    let mut test2 = Test::new("test2");
-    test2.init();
-
-    println!("a: {}, b: {}", test1.a(), test1.b());
-    std::mem::swap(&mut test1, &mut test2); // try commenting out this line
-    println!("a: {}, b: {}", test2.a(), test2.b());
-
-}
-
 #[derive(Debug)]
 struct Test {
    a: String,
@@ -107,7 +72,7 @@ impl Test {
            b: std::ptr::null(),
        }
    }
-    
+
    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
@@ -133,26 +98,177 @@ possible.
 `a` and `b`. Since `b` is a reference to `a` we store it as a pointer since
 the borrowing rules of Rust doesn't allow us to define this lifetime.

+Now, let's use this example to explain the problem we encounter in detail. As
+you see, this works as expected:
+
+```rust
+fn main() {
+    let mut test1 = Test::new("test1");
+    test1.init();
+    let mut test2 = Test::new("test2");
+    test2.init();
+
+    println!("a: {}, b: {}", test1.a(), test1.b());
+    println!("a: {}, b: {}", test2.a(), test2.b());
+
+}
+# use std::pin::Pin;
+# #[derive(Debug)]
+# struct Test {
+#     a: String,
+#     b: *const String,
+# }
+# 
+# impl Test {
+#     fn new(txt: &str) -> Self {
+#         let a = String::from(txt);
+#         Test {
+#             a,
+#             b: std::ptr::null(),
+#         }
+#     }
+# 
+#     // We need an `init` method to actually set our self-reference
+#     fn init(&mut self) {
+#         let self_ref: *const String = &self.a;
+#         self.b = self_ref;
+#     }
+#     
+#     fn a(&self) -> &str {
+#         &self.a
+#     } 
+#     
+#     fn b(&self) -> &String {
+#         unsafe {&*(self.b)}
+#     }
+# }
+```
+
 In our main method we first instantiate two instances of `Test` and print out
-the value of the fields on `test1`. We get:
+the value of the fields on `test1`. We get what we'd expect:

 ```rust, ignore
 a: test1, b: test1
+a: test2, b: test2
 ```

+Let's see what happens if we swap the data stored at the memory location
+which `test1` is pointing to with the data stored at the memory location
+`test2` is pointing to and vice a versa.

-Next we swap the data stored at the memory location which `test1` is pointing to
-with the data stored at the memory location `test2` is pointing to and vice a versa.
+```rust
+fn main() {
+    let mut test1 = Test::new("test1");
+    test1.init();
+    let mut test2 = Test::new("test2");
+    test2.init();

-We should expect that printing the fields of `test2` should display the same as
-`test1` (since the object we printed before the swap has moved there now).
+    println!("a: {}, b: {}", test1.a(), test1.b());
+    std::mem::swap(&mut test1, &mut test2);
+    println!("a: {}, b: {}", test2.a(), test2.b());
+
+}
+# use std::pin::Pin;
+# #[derive(Debug)]
+# struct Test {
+#     a: String,
+#     b: *const String,
+# }
+# 
+# impl Test {
+#     fn new(txt: &str) -> Self {
+#         let a = String::from(txt);
+#         Test {
+#             a,
+#             b: std::ptr::null(),
+#         }
+#     }
+# 
+#     fn init(&mut self) {
+#         let self_ref: *const String = &self.a;
+#         self.b = self_ref;
+#     }
+#     
+#     fn a(&self) -> &str {
+#         &self.a
+#     } 
+#     
+#     fn b(&self) -> &String {
+#         unsafe {&*(self.b)}
+#     }
+# }
+```
+
+Naively, we could think that what we should get a debug print of `test1` two
+times like this

 ```rust, ignore
+a: test1, b: test1
+a: test1, b: test1
+```
+
+But instead we get:
+
+```rust, ignore
+a: test1, b: test1
 a: test1, b: test2
 ```
-The pointer to `b` still points to the old location. That location is now
-occupied with the string "test2". This can be a bit hard to visualize so I made
-a figure that i hope can help.
+
+The pointer to `test2.b` still points to the old location which is inside `test1`
+now. The struct is not self-referential anymore, it holds a pointer to a field
+in a different object. That means we can't rely on the lifetime of `test2.b` to
+be tied to the lifetime of `test2` anymore.
+
+If your still not convinced, this should at least convince you:
+
+```rust
+fn main() {
+    let mut test1 = Test::new("test1");
+    test1.init();
+    let mut test2 = Test::new("test2");
+    test2.init();
+
+    println!("a: {}, b: {}", test1.a(), test1.b());
+    std::mem::swap(&mut test1, &mut test2);
+    test1.a = "I've totally changed now!".to_string();
+    println!("a: {}, b: {}", test2.a(), test2.b());
+
+}
+# use std::pin::Pin;
+# #[derive(Debug)]
+# struct Test {
+#     a: String,
+#     b: *const String,
+# }
+# 
+# impl Test {
+#     fn new(txt: &str) -> Self {
+#         let a = String::from(txt);
+#         Test {
+#             a,
+#             b: std::ptr::null(),
+#         }
+#     }
+# 
+#     fn init(&mut self) {
+#         let self_ref: *const String = &self.a;
+#         self.b = self_ref;
+#     }
+#     
+#     fn a(&self) -> &str {
+#         &self.a
+#     } 
+#     
+#     fn b(&self) -> &String {
+#         unsafe {&*(self.b)}
+#     }
+# }
+```
+
+That shouldn't happen. There is no serious error yet, but as you can imagine
+it's easy to create serious bugs using this code.
+
+I created a diagram to help visualize what's going on:

 **Fig 1: Before and after swap**
 ![swap_problem](./assets/swap_problem.jpg)
@@ -160,9 +276,12 @@ a figure that i hope can help.
 As you can see this results in unwanted behavior. It's easy to get this to 
 segfault, show UB and fail in other spectacular ways as well.

-If we change the example to using `Pin` instead:
+## Pinning to the stack

-```rust,editable
+Now, we can solve this problem by using `Pin` instead. Let's take a look at what
+our example would look like then:
+
+```rust, ignore
 use std::pin::Pin;
 use std::marker::PhantomPinned;

@@ -197,7 +316,18 @@ impl Test {
        unsafe { &*(self.b) }
    }
 }
+```

+Now, what we've done here is pinning a stack address. That will always be
+`unsafe` if our type implements `!Unpin`.
+
+We use the same tricks here, including requiring an `init`. If we want to fix that
+and let users avoid `unsafe` we need to pin our data on the heap instead which
+we'll show in a second.
+
+Let's see what happens if we run our example now:
+
+```rust
 pub fn main() {
    let mut test1 = Test::new("test1");
    test1.init();
@@ -206,36 +336,108 @@ pub fn main() {
    test2.init();
    let mut test2_pin = unsafe { Pin::new_unchecked(&mut test2) };

-    println!(
-        "a: {}, b: {}",
-        Test::a(test1_pin.as_ref()),
-        Test::b(test1_pin.as_ref())
-    );
-
-    // Try to uncomment this and see what happens
-    // std::mem::swap(test1_pin.as_mut(), test2_pin.as_mut());
-    println!(
-        "a: {}, b: {}",
-        Test::a(test2_pin.as_ref()),
-        Test::b(test2_pin.as_ref())
-    );
+    println!("a: {}, b: {}", Test::a(test1_pin.as_ref()), Test::b(test1_pin.as_ref()));
+    println!("a: {}, b: {}", Test::a(test2_pin.as_ref()), Test::b(test2_pin.as_ref()));
 }
-
+# use std::pin::Pin;
+# use std::marker::PhantomPinned;
+# 
+# #[derive(Debug)]
+# struct Test {
+#     a: String,
+#     b: *const String,
+#     _marker: PhantomPinned,
+# }
+# 
+# 
+# impl Test {
+#     fn new(txt: &str) -> Self {
+#         let a = String::from(txt);
+#         Test {
+#             a,
+#             b: std::ptr::null(),
+#             // This makes our type `!Unpin`
+#             _marker: PhantomPinned,
+#         }
+#     }
+#     fn init(&mut self) {
+#         let self_ptr: *const String = &self.a;
+#         self.b = self_ptr;
+#     }
+# 
+#     fn a<'a>(self: Pin<&'a Self>) -> &'a str {
+#         &self.get_ref().a
+#     }
+# 
+#     fn b<'a>(self: Pin<&'a Self>) -> &'a String {
+#         unsafe { &*(self.b) }
+#     }
+# }
 ```

-Now, what we've done here is pinning a stack address. That will always be
-`unsafe` if our type implements `!Unpin` (aka `MustStay`).
+Now, if we try to pull the same trick which got us in to trouble the last time
+you'll get a compilation error. So t

-We use some tricks here, including requiring an `init`. If we want to fix that
-and let users avoid `unsafe` we need to pin our data on the heap instead.
+```rust, compile_fail
+pub fn main() {
+    let mut test1 = Test::new("test1");
+    test1.init();
+    let mut test1_pin = unsafe { Pin::new_unchecked(&mut test1) }; 
+    let mut test2 = Test::new("test2");
+    test2.init();
+    let mut test2_pin = unsafe { Pin::new_unchecked(&mut test2) };

-> Stack pinning will always depend on the current stack frame we're in, so we
-can't create a self referential object in one stack frame and return it since
-any pointers we take to "self" is invalidated.
+    println!("a: {}, b: {}", Test::a(test1_pin.as_ref()), Test::b(test1_pin.as_ref()));
+    std::mem::swap(test1_pin.as_mut(), test2_pin.as_mut());
+    println!("a: {}, b: {}", Test::a(test2_pin.as_ref()), Test::b(test2_pin.as_ref()));
+}
+# use std::pin::Pin;
+# use std::marker::PhantomPinned;
+# 
+# #[derive(Debug)]
+# struct Test {
+#     a: String,
+#     b: *const String,
+#     _marker: PhantomPinned,
+# }
+# 
+# 
+# impl Test {
+#     fn new(txt: &str) -> Self {
+#         let a = String::from(txt);
+#         Test {
+#             a,
+#             b: std::ptr::null(),
+#             // This makes our type `!Unpin`
+#             _marker: PhantomPinned,
+#         }
+#     }
+#     fn init(&mut self) {
+#         let self_ptr: *const String = &self.a;
+#         self.b = self_ptr;
+#     }
+# 
+#     fn a<'a>(self: Pin<&'a Self>) -> &'a str {
+#         &self.get_ref().a
+#     }
+# 
+#     fn b<'a>(self: Pin<&'a Self>) -> &'a String {
+#         unsafe { &*(self.b) }
+#     }
+# }
+```

-The next example solves some of our friction at the cost of a heap allocation.
+> It's important to note that stack pinning will always depend on the current
+> stack frame we're in, so we can't create a self referential object in one
+> stack frame and return it since any pointers we take to "self" is invalidated.

-```rust, editbable
+## Pinning to the heap
+
+For completeness let's remove some unsafe and the need for an `init` method
+at the cost of a heap allocation. Pinning to the heap is safe so the user
+doesn't need to implement any unsafe code:
+
+```rust
 use std::pin::Pin;
 use std::marker::PhantomPinned;

@@ -275,9 +477,6 @@ pub fn main() {
    let mut test2 = Test::new("test2");

    println!("a: {}, b: {}",test1.as_ref().a(), test1.as_ref().b());
-
-    // Try to uncomment this and see what happens
-    // std::mem::swap(&mut test1, &mut test2);
    println!("a: {}, b: {}",test2.as_ref().a(), test2.as_ref().b());
 }
 ```
@@ -291,6 +490,47 @@ that the self-referential pointer stays valid.
 There are ways to safely give some guarantees on stack pinning as well, but right
 now you need to use a crate like [pin_project][pin_project] to do that.

+## Practical rules for Pinning
+
+1. If `T: Unpin` (which is the default), then `Pin<'a, T>` is entirely
+equivalent to `&'a mut T`. in other words: `Unpin` means it's OK for this type
+to be moved even when pinned, so `Pin` will have no effect on such a type.
+
+2. Getting a `&mut T` to a pinned pointer requires unsafe if `T: !Unpin`. In
+other words: requiring a pinned pointer to a type which is `!Unpin` prevents
+the _user_ of that API from moving that value unless it choses to write `unsafe`
+code.
+
+3. Pinning does nothing special with memory allocation like putting it into some
+"read only" memory or anything fancy. It only tells the compiler that some
+operations on this value should be forbidden.
+
+4. Most standard library types implement `Unpin`. The same goes for most
+"normal" types you encounter in Rust. `Futures` and `Generators` are two
+exceptions.
+
+5. The main use case for `Pin` is to allow self referential types, the whole
+justification for stabilizing them was to allow that. There are still corner
+cases in the API which are being explored.
+
+6. The implementation behind objects that are `!Unpin` is most likely unsafe.
+Moving such a type can cause the universe to crash. As of the time of writing
+this book, creating and reading fields of a self referential struct still requires `unsafe`.
+
+7. You can add a `!Unpin` bound on a type on nightly with a feature flag, or
+by adding `std::marker::PhantomPinned` to your type on stable.
+
+8. You can either pin a value to memory on the stack or on the heap.
+
+9. Pinning a `!Unpin` pointer to the stack requires `unsafe`
+
+10. Pinning a `!Unpin` pointer to the heap does not require `unsafe`. There is a shortcut for doing this using `Box::pin`.
+
+> Unsafe code does not mean it's literally "unsafe", it only relieves the 
+> guarantees you normally get from the compiler. An `unsafe` implementation can 
+> be perfectly safe to do, but you have no safety net.
+
+
 ### Projection/structural pinning

 In short, projection is a programming language term. `mystruct.field1` is a
@@ -311,4 +551,133 @@ we're soon finished.

 [rfc2349]: https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md
 [pin_project]: https://docs.rs/pin-project/
-[internals_unpin]: https://internals.rust-lang.org/t/naming-pin-anchor-move/6864/12
+[internals_unpin]: https://internals.rust-lang.org/t/naming-pin-anchor-move/6864/12
+
+## Bonus section: Fixing our self-referential generator and learning more about Pin
+
+But now, let's prevent this problem using `Pin`. We'll discuss
+`Pin` more in the next chapter, but you'll get an introduction here by just
+reading the comments.
+
+```rust
+#![feature(optin_builtin_traits)] // needed to implement `!Unpin`
+use std::pin::Pin;
+
+pub fn main() {
+    let gen1 = GeneratorA::start();
+    let gen2 = GeneratorA::start();
+    // Before we pin the pointers, this is safe to do
+    // std::mem::swap(&mut gen, &mut gen2);
+
+    // constructing a `Pin::new()` on a type which does not implement `Unpin` is
+    // unsafe. A value pinned to heap can be constructed while staying in safe
+    // Rust so we can use that to avoid unsafe. You can also use crates like
+    // `pin_utils` to pin to the stack safely, just remember that they use
+    // unsafe under the hood so it's like using an already-reviewed unsafe
+    // implementation.
+
+    let mut pinned1 = Box::pin(gen1);
+    let mut pinned2 = Box::pin(gen2);
+
+    // Uncomment these if you think it's safe to pin the values to the stack instead 
+    // (it is in this case). Remember to comment out the two previous lines first.
+    //let mut pinned1 = unsafe { Pin::new_unchecked(&mut gen1) };
+    //let mut pinned2 = unsafe { Pin::new_unchecked(&mut gen2) };
+
+    if let GeneratorState::Yielded(n) = pinned1.as_mut().resume() {
+        println!("Gen1 got value {}", n);
+    }
+    
+    if let GeneratorState::Yielded(n) = pinned2.as_mut().resume() {
+        println!("Gen2 got value {}", n);
+    };
+
+    // This won't work:
+    // std::mem::swap(&mut gen, &mut gen2);
+    // This will work but will just swap the pointers so nothing bad happens here:
+    // std::mem::swap(&mut pinned1, &mut pinned2);
+
+    let _ = pinned1.as_mut().resume();
+    let _ = pinned2.as_mut().resume();
+}
+
+enum GeneratorState<Y, R> {
+    Yielded(Y),  
+    Complete(R), 
+}
+
+trait Generator {
+    type Yield;
+    type Return;
+    fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return>;
+}
+
+enum GeneratorA {
+    Enter,
+    Yield1 {
+        to_borrow: String,
+        borrowed: *const String,
+    },
+    Exit,
+}
+
+impl GeneratorA {
+    fn start() -> Self {
+        GeneratorA::Enter
+    }
+}
+
+// This tells us that the underlying pointer is not safe to move after pinning.
+// In this case, only we as implementors "feel" this, however, if someone is
+// relying on our Pinned pointer this will prevent them from moving it. You need
+// to enable the feature flag `#![feature(optin_builtin_traits)]` and use the
+// nightly compiler to implement `!Unpin`. Normally, you would use
+// `std::marker::PhantomPinned` to indicate that the struct is `!Unpin`.
+impl !Unpin for GeneratorA { }
+
+impl Generator for GeneratorA {
+    type Yield = usize;
+    type Return = ();
+    fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return> {
+        // lets us get ownership over current state
+        let this = unsafe { self.get_unchecked_mut() };
+            match this {
+            GeneratorA::Enter => {
+                let to_borrow = String::from("Hello");
+                let borrowed = &to_borrow;
+                let res = borrowed.len();
+                *this = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};
+
+                // Trick to actually get a self reference. We can't reference
+                // the `String` earlier since these references will point to the
+                // location in this stack frame which will not be valid anymore
+                // when this function returns.
+                if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
+                    *borrowed = to_borrow;
+                }
+
+                GeneratorState::Yielded(res)
+            }
+
+            GeneratorA::Yield1 {borrowed, ..} => {
+                let borrowed: &String = unsafe {&**borrowed};
+                println!("{} world", borrowed);
+                *this = GeneratorA::Exit;
+                GeneratorState::Complete(())
+            }
+            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
+        }
+    }
+}
+```
+
+Now, as you see, the consumer of this API must either:
+
+1. Box the value and thereby allocating it on the heap
+2. Use `unsafe` and pin the value to the stack. The user knows that if they move
+the value afterwards it will violate the guarantee they promise to uphold when
+they did their unsafe implementation.
+
+Hopefully, after this you'll have an idea of what happens when you use the
+`yield` or `await` keywords inside an async function, and why we need `Pin` if
+we want to be able to safely borrow across `yield/await` points.