Lectures | The Cornell, Maryland, Max Planck Pre-doctoral Research School 2017

Lorenzo Alvisi, Cornell University: “The Pit and the Pendulum”

A fundamental challenge in the design of distributed data stores is to balance two conflicting goals: strong consistency (which directly translates into ease of programming) and performance. We’ll explore this dilemma across distributed systems and databases–we will understand the why and the how of this tension, and become familiar with notions such as linearizability, eventual consistency, causal consistency, and the ACID/BASE dualism. Then, we will deep dive into recent attempts in both fields at bridging this gap.

Hal Daumé III, University of Maryland: “Bias in Supervised Learning”

Machine learning technology now pervades our lives in seen ways (self-driving cars) and unseen ways (predicting recidivism of felons). This wide-spread application of technology comes with new failure modes that give rise to new opportunities for research. We will learn about, and apply, mechanisms for coping with the various forms of bias that are insidious in “standard” machine learning technology. We will discuss the limits of such approaches, and discuss the ethical implications of widespread use of automated decision making.

Derek Dreyer, MPI-SWS: “Separation Logic: A Foundation for Verifying Realistic Concurrent Programs”

The behavior of concurrent programs is challenging to reason about because of “interference”: threads running in parallel can step on each other’s toes by mutating shared state. In this lecture series, we’ll see how to tame interference using separation logic, a verification framework in which assertions denote not only facts about the machine state, but also ownership of pieces of that state. Ownership and separation constitute one of the most fundamental conceptual developments in program verification in the past 20 years, and have had a major influence on the design of cutting-edge programming languages like Rust. We’ll give a hands-on tutorial on how to verify concurrent programs using a state-of-the-art implementation of separation logic in the Coq proof assistant. Finally, time permitting, we’ll explore how separation logic is currently being adapted to support reasoning about more realistic (“weak” or “relaxed”) memory models like those of C/C++ and Java.

Krishna Gummadi, MPI-SWS: “Understanding, Predicting, and Influencing Human Decision Making”

Today, it is possible to gather and analyze detailed data online about human societal interactions and decisions at a previously unimaginable scale. Sites like Facebook and Twitter enable passive observational studies of interactions between hundreds of millions of users, while sites like Amazon’s Mechanical Turk (AMT) and SurveyMonkey enable scientists to recruit humans for active experimental studies of their decision making at scale. Quantitative studies of data so gathered are often referred to as computational social science. In these lectures, I will discuss recent advances in (i) understanding information production, diffusion, and consumption in social media, i.e., who says what to whom? who likes what from whom? to what extent, can these information exchanges be modeled computationally and predicted? and (ii) understanding human / crowd decision making, particularly by peering through the lens of computation, i.e., can we better understand how or why humans made their decisions by modeling decision making as information processing? can we detect whether certain decisions made by humans are “unreasonable” or “unfair”?

Michael Hicks, University of Maryland: “Programming Languages meet Cryptography”

A wealth of sophisticated techniques have been developed by the cryptography research community that can ensure the secrecy or integrity of a computation. For example, techniques such as homomorphic encryption, garbled circuits, and oblivious RAM can ensure that even when a program is run by an untrusted party, that party can learn nothing about the program’s secrets. Unfortunately, these techniques can be inefficient and by-hand optimizations are both tedious and risky, as mistakes can compromise security. In this series of lectures, we show how ideas developed in the programming languages community—involving automated reasoning and compiler optimization—can be deployed to ensure that cryptographic computations are both efficient and secure. We will present background on how to mathematically describe the behavior of programming languages, using a technique called operational semantics. We will also consider how to mathematically express the way programs are type-checked and otherwise reasoned about through automated analysis. Then we show how to apply these basic techniques to languages that employ cryptographic mechanisms, and to show that their efficient use does not compromise security.

Readings/Preparations:

For advance reading, please read the following three blog posts:

What is PL Research and how is it useful? http://www.pl-enthusiast.net/2015/05/27/what-is-pl-research-and-how-is-it-useful/
The Synergy between Programming Languages and Cryptography http://www.pl-enthusiast.net/2014/12/17/synergy-programming-languages-cryptography/
Formal Reasoning in PL and Crypto http://www.pl-enthusiast.net/2014/12/23/formal-reasoning-pl-crypto/

In addition, please skim the first three chapters of Robert Harper’s book, paying particular attention to 1.1, 2.1-2.3, 2.6, and 3.1.1. You are welcome also to look ahead to chapter 4, particularly 4.1 and 4.2. A preview of the book can be found at: https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf

Finally, please also install the Racket programming language, found at https://racket-lang.org. Many of the developments presented in lectures will be executable in the PLT Redex framework, provided by Racket.

I expect all of this should take 2-3 hours of work.

Vitaly Shmatikov, Cornell Tech: “Privacy in the Year 2027”

This series of lectures will focus on several developments that promise to dramatically change the problems and methodology of data privacy research in the next decade. The first development is that machine learning will become a ubiquitous commodity and the primary method for incorporating users’ personal data into apps and online services. I will discuss how trained machine-learning models reveal sensitive data and how to use machine learning to infer information about individuals. The second, related development is that a vast array of devices – from household gadgets to wearable sensors – will be generating streams of data correlated with individual user activities, presenting new privacy challenges and opportunities.

Readings:

“Membership Inference Attacks against Machine Learning Models”:
https://arxiv.org/abs/1610.05820
“Privacy-Preserving Deep Learning”:
https://www.cs.cornell.edu/~shmat/shmat_ccs15.pdf
“Federated Learning: Collaborative Machine Learning without Centralized Training Data”:
https://research.googleblog.com/2017/04/federated-learning-collaborative.html