Distributed Computing (EECS490Lec24)

Apr 13, 2020

So, over the last few lectures we’ve covered the ideas of parallelism.

In particular, we’ve covered the ideas about determinism, where it doesn’t matter how you split up the program, allowing us to program in parallel efficiently.

Obviously, we get this parallelism in a pure expression language, so in Data Parallel Haskell, among other things, we get this advantage
We also get it in an imperative language w/ ownership and unique borrowing, like Rust

In Rust, for example, the following you not be allowed:

fn main() {
     let mut s = String::from("test");
     let handle = thread::spawn(|| {
                  s.push_str("a");
                });
     let handle2 = thread::spawn(|| {
                  s.push_str("b");
        });
       println!("{}!", s);
}

Rust will complain about this, as the thread borrows s but the thread’s lifetime is larger than s, which could allow nondeterminate programs.

But, we still have determinacy.

However, last lecture we covered concurrency, where we had different processes interaction non-deterministically through side effect manipulation.

To concretely reason about it, we created a process calculus. This allowed us to reason about non-determinate programs, even though the task became harder to do so.

However, when we were doing that, we had global signals, so every send or receive was heard by all parties. This leads to our first problem: we wish to have some sense of a private channel, where we can communicate our data secretly.

Scala does this in its actor model

It turns out, there’s another type of concurrency we can look at, known as “distributed computing”. This is like concurrency, but instead of ordering processes non-deterministically via time, we only give some processes access to local resources.

An example of this could be a web client and server application. Each has access to its own systems, like a database or the state of the user’s UI.

To formally reason about these ideas, we can simply introduce the idea of a world, a space where such actions occur:

$e: \tau @ w$

Where we limit the execution of some expression/command of type tau, to some world w, which allows us to segregate functionality between parties. In our web-server instance, this becomes:

        queryDB: Query -> Result @ server
        getFormData: unit -> FormData @ client

This allows us to combine both server and client code into one file, an approach known as “tier-less web programming”.

This approach exists today, in the following platforms:

Links language
ML5 language
OCaml library called Eliom

An alternative way to do this is through “Remote Procedure Calls”, or RPCs. Formally, this allows one computer, commonly called the master, to execute a function on another computer, commonly called the slave. This allows for the creation of distributed file systems and other really cool systems-level tech, and keeps programmers from needing to explicitly worry about it. Formally, one could write the following rule to reason about RPCs:

$\frac{e:\tau @ w' \ \ \ \ \tau \text{\ mobile}}{\text{at w' do e: } \tau @ w}$

Here, all we are saying is that, if we can get something from the distributed call, a so-called “mobile type”, which refers to the fact that the type can be sent over a network, then we execute it at the specified location and return, as usual.

This is the magic behind clean file system interfaces like AWS or AFS, the Andrew File System developed by Carnegie Mellon, as it allows us to make use of more computers, but allows us to tighten security so that we do not accidentally break anything.