Thank you very much. I would like to thank the organizers for giving this opportunity to learn so much from my peers and colleagues some of which are here. This is a wonderful survey of many of the works in this area. As I was sitting here, I was in real-time adjusting my slides to minimize the overlap. I will cut down on the amount of introduction because what we just saw (Mike Walfish) was wonderful. I would like to focus on the specific things that we have been looking at in one area of research.
For the sake of concreteness, let me start with the application of SNARKs. One of the motivating applications is zerocash http://zerocash-project.org. Decentralized anonymous payments system based on bitcoin, this was joint work with Eli Ben-Sasson, Chiesa, Garman, Green, Miers, Tromer, Virza. And it's all down to bitcoin. It's a decentralized digital currency. Ownership of coins is represented just by some numbers and to pay someone you pass some number to them, and that raises the issue of double spending. If you pass the number to them, then they would be convinced just like anyone else, therefore causing inflation. The way that bitcoin deals with this is the blockchain or distributed ledger, which posts every transaction, we see who paid whom and when and how much. Every time a transaction is recorded, everyone knows to respect the coin that was just spent. The problem here is privacy because the blockchain has all of the information. As a consumer, anyone could see your purchases. Merchants have cash flows exposed. Anyone can see your balance as paying merchants going down the street. This is clearly a problem. It has been recognized by many. There are some ways maybe to dismiss this like saying bitcoin is pseudonymous, but most users use very few addresses and there are works that show that you can deanonymize people by tracking the transaction graph.
Another issue is fungibility because coins have history associated with them, so different coins may have different value which can ruin the fungibility of the currency. So we started off creating a decentralized system, avoid trusting the bank, but with privacy there's much worse by trusting everyone to not look at the ledger, which is clearly not a reasonable way to build a currency with focus on privacy. There have been many adhoc solutions. The problem with most of them is that they rely on some mixing or mixnets that require essentially to launder money, to mix your money with others. We get the worst of both worlds. Can we build a currency in a way that is inherently private? Anyone can pay each other, in a completely anonymous way. There's still a notion of a blockchain, but instead of recording everything, there are zero knowledge proofs recorded that reveal nothing about the source or destination or payment except to those participating in that transaction.
We should be somewhat dubious about SNARKs I guess. The proofs are less than 300 bytes long. It takes just a few milliseconds to verify. Less than a minute to make a transaction. Some of the public parameters or common reference string is required for setup. They are less than 1 GB in size. So that's somewhat practical. There are some cryptographic assumptions, we'll be using pairing-based elliptic curve crypto, knowledge of exponent, and some properties of sha256.
So what is the approach of zerocash? Well, the idea is to use SNARKs to rely on proofs. Look at a situation where one party is sending some amount to a recipient and they have the records that prove they know who the money is from and they haven't spent it. The recipient is not yet convinced of course. That would lose privacy if it was just sent over. Imagine instead if you could have an accountant that could apply a predicate to the financial books, apply some decision algorithm to check them, then issue a signature attesting to that fact. Thta would be convincing, but then... wbut we do have the algorithm that the accountant would have run. We can try running that virtually in the sender's mind. And then we can see the proof that this algorithm accepted these financial records as a witness for this payment. Generalizing this, we have some NP statement, we have some NP witness, there's some NP decision algorithm, and there's a proof for this to work; we need the proof to be convincing. We have some relaxed notion of .. computationally sound proofs, or arguments as we like to call them. We would like it to be noninteractive. These proofs will later be posted in the blockchain so that the recipient of the payment will be able to convince others that it was legitimate, so it must be a string rather than interactive. It must be a proof of knowledge. That means that the prover doesn't just- not only do their exist financial records attesting to the statement, but the prover knows them, which is crucial for being able to actually send money to someone- and then the signature has to exist, the question is did they receive them. We want these proofs to be succinct, they must be short, and they must be quickly verifiable.
Is this a zero knowledge proof of knowledge? The verifier can be efficient. But the privacy of the prover, that is the zero knowledge property. These things give us a SNARK, a zero knowledge succinct noninteractive argument of knowledge. Adding the zero knowledge requirement we get zk-SNARKs.
GMR89 zero knowledge, well we know it exists since GMR. And then BFM88, NY90, BDMP91, FLS99 using a common reference string you can have a setup procedure, and then some fixed NP language L, you can send the proof to the verifier with some nice zero knowledge, there can be soundness properties, and they are actually pretty efficient, for succinct NIZKs. There is a NIZK for all of NP. There is efficient NIZK for specific languages. What about complicated languages? Can we have succint noninteractive zero knowledge... and by succinctness, we say that if a Turing machine executes it, succinctness means that the proofs are short, and verification is cheap.
If we had this, we could do many wonderful things. We could outsource computation, we could delegate it. We could send information to the cloud with instructions for what to do. The cloud could come back and have a claim that they know the results. If we could create a proof that convinces the verifier that the result is correct, then we wouldn't have to trust the cloud, just trust the result by efficiently verifying it. At BCGTV13, we had someone solve the traveling salesman program, and we verified it in 5 milliseconds on stage. The proof was a few hundred bytes, the computation was done in the cloud. The general case is not quite economically appealing yet.
If you want additional big thinking cloudy motivations, then the information technology supply chain, if you are applying for grants this is a good one. Everyone is buying hardware from the cheapest vendor, but there might be counterfeit chips. Could there be malicious hardware? We know that Chinese have been saying that we shouldn't bother building space ships- no, carrier ships- when they can just make sure that the adversary ships just malfunction at the right time. So SNARKs are well justified. So how do we construct them?
We just heard a lot about it. There has been amazing progress in just a few years. We have known for a while htat SNARKs and non-interactive zero knowledge proofs are possible, starting with Kilian92, Micali94, Valiant08. Those were based on PCPs with merkle trees on top, in the random oracle, to navigate the merkle tree. That's a very beautiful feasability result. We have an unpublished full PCPs proving simple statements, the overhead is very large, but it's not galactic, it's something we can run experiments for.
Q: What are the bottlenecks for short PCPs?
Staying with the full blown PCP track, we have a good understanding of what's needed. Random oracles well, we don't understand everything, there's a character.. extractable collision resistant hash function. There are constructions both ways. Extractable collision resistant hashes are CRHs with the extra property that if they output in the element in the range of that function, we actually know that something. Non-falisfyable knowledge assumption, we give some candidates, like others like ... little is known about these assumptions. The problem is that these are still based on PCPs. And the newer results take a different approach based on linear PCPs. BCIOP13 gives a good characterization of SNARK = linear PCP + linear-only encoding. The intuition for why the proofs are more efficient is that we shift some of the weight from the .. to the ... if you look at the full PCPs, they extend lots of resources to make sure that they are certain that they have structures between elements of the proof. But we can get some of this for free because only certain things can be constructed by linear construction so you can skip some of the checks at a very high level.
There are many protocols here and many implementations. Many of these are based on GGPR protocol preceded by protocols of Gro and Litmai that setup the stage. There are many ways to show this progress. SNARKs for C programs. To make it concrete for users, C programs are compiled into a version that not only produces the outputs but also produces a proof that the outputs are correct. In principle this is feasible for simple computation. You can do this for ongoing computation using proof-carrying data, where the proofs concatenate. Using the GGPR and PGHR protocol we proved feasability and the efficiency for verification leading up to a working prototype for small C programs running for a short amount of time. This can be extended by allowing any C programs using universality, using a universal processor that is for an architecture. You can construct a setup parameter once, and then also, you can using these proof composition extend it step by step and support programs running arbitrarily long just with a multiplactive constant overhead in both time and memory. You can think of it as a processor that just, issuing proofs, and it has constant resources, function of the security parameter in the original program size that does not grow with the length of the computation. The memory consumption of the original program, not just the size. This processor has a well-defined hertz rating, right now it's measured in decihertz which is kind of too slow for practice, but at least it's well defined. The big issue here is the proving overhead, the main challenge for practicality, like outsourcing to the cloud.
Geppetto, Buffet and so on with tighter frontends have much tighter reductions but the cost is universality, supporting random accesses and general control flow, it does support it but it has a fallback of performance that is comparable to what the other implementations use.
Imagine you have a data center full of machines none of which are trusted. You would like the first machine to produce an output, and then the other machine would consume it, and so on, and every value flowing in your data center has a proof associated with it. All of these are verifiable. You can concatenate the proofs and all of the values to check the proofs, but if you want succinctness or only the last value, then you need something more.
In order to support universality, you need to .... universality is the same circuit to verify multiple programs. So you represent a processor as a circuit. Yes, that's exactly the claim. Not everything has to go through the universality. There are ways to optimize that.
For general C programs, you have to have the underlying SNARK prover and verifier; you need some input, auxiliary input, then the output and the proof goes to the verifier. There's an algebraic construct, if you start with a C program, you need to go through algebraic constraint satisfaction problems, CSP generator, tinyram assembly code, compiler (new gcc backend), then to a C program (oops this is going backwards). All of this is fairly expensive and does contribute 2 orders of magnitude ovehread.
For zerocash, we tailored things in a tighter way. Let me tell you about this. If you do the fully universal version with scalability with recursive composition, that gives you a CPU on the order of decihertz, that is a cycle of the CPU takes about 10 seconds with a proof. That's without multithreading or GPU acceleration or anything that would be applied if you put your mind to it. It's doable but not yet appealing. I have some additional numbers about tradeoffs on this.
For zerocash, what if you want to get a practical application without going through all the extra layers of abstraction and reduction? Basic anonymous e-cash starts with a serial number. The most basic way to build e-cash not very well is to have a bunch of serial numbers. To create a new coin, you prove that you are eligible for it, you get a serial number, all serial numbers are posted, then you just say which serial number you are using. It makes sense but no privacy whatsoever. Then there's a variant of this from Sander and Ta-Shma 1999, the idea is to use a commitment scheme where every coin has a serial number, a commitment to the serial number using some randomness (a greeting function?), you spend some 1 BTC to create the coin commitment, all of these commitments are public, to spend a coin, you just burn a commitment by showing the preimage. Anyone can verify the serial number is unique and matches the commitment. But this is not private either. The idea of Sander Ta-Sham was to use a merkle tree of CHR over all of the commitments, then we can just show that the randomness and commitment in the tree such that these two match the serial number, but we don't say which one. This is done using a zero knowledge proof. It needs to be non-interactive so that people can broadcast this. It needs to be zero knowledge. It should be proof of knowledge. It should be succint. So zk-SNARKs. We can already do a very basic scheme to instantiate Ta-Shma Sanders.
We can get additional functionality with zk-SNARKs. We can have adding an extra commitment for adding variable denomination. You say that you spend v BTC to create a commitment, you reveal the value it's public, you reveal a randomness for another commit, there's an extra commit that you don't reveal, to spend you reveal everything and show that it's consistent, and protect it under zero knowledge. You can add direct anonymous payments, you have an address pair, then the coin structure becomes much more complicated, to derive the serial number from the secret key, and the public key involved in creating the commitment, you can setup things so that minting and spending are analogous, to use a coin you create a zero knowledge proof that the values you have are consistent with some commitment in the past, you reveal everything so that they see that it's valid, the receiver can decrypt using the secret key, and get the values for them to spend later.
This is summarized in a pour transaction. The pour transaction is a primitive for direct anonymous payments. You can merge coins, split coins, make change, provide transaction fees, and many other things. It consumes old coins, it creates new coins, it gets instructions on what is the value of the destination of the new coins, and outputs things that will be put in the ledger that consume the old coins and create the new coins, and create a proof that the old coins were indeed valid and that the monetary invariant was preserved, that the total value of the old coins equals the value of the new coins. This is the NP statement that is verified by that virtual accountant, and that's what you find in the ledger.
We implemented this in QAPs following GGPR and the improvements on that. You can see what a transaction looks like. The data structure of a zerocash pour transaction includes: root, serial number 1, serial number 2, commitment 1, commitment 2, v pub, pubkeyhash info, sigpk, sig, mac 1, mac 2, ciphertext 1, ciphertext 2, zkSNARKproof. The ciphertext is addressed to the recipient so that they can get ownership. And the zero knowledge proof is small enough to be inserted in there too.
This is formalized in decentralized anonymous payment systems. You can do setup, mint pour, verify transaction, ... ledger indistinguishability, balance, transaction non-malleability. There's a full working prototype (libzerocash). We replayed bitcoin transactions on ec2 and it could keep up with the rate of transactions on bitcoin which is non-trivial and cannot be said for previous ocnstructions with even less functionality.
Trusted setup is a big issue here. The trusted setup creates a template for what a correct construction looks like. It's about 1 GB in size. If it's corrupted, privacy still holds but soundness is lost. We can use multi-party computation where at least 1 party needs to remain honest for the currency to remain sound.
Open problems: reducing the proving time. It takes about a minute to create a transaction, compared to the 10 minute blockchain update rate, and we could hope for better. Preprocessing is a big issue. It could be achieved by better PCPCPs that don't have trusted setup or perhaps other approaches, we think it's a fascinating problem and we don't know much about impossibility here. Do you need trusted setup for zero knowledge? That's a good question. We are looking for more applications that have very tight reductions, creating small NP statements that can be expressed in the underlying language. There's also fascinating field work in interesting challenges here like PL and optimization techniques. On the theory side, can we get SNARks based on more assumptions? We know that we can do general ECHRs and knowledge of exponent. What about the others? Can we build SNARKs on the LWE assumptions? Can we build SNARKs that are sound in the presence of quantum computers?
Community-maintained archive to unlocking knowledge from technical bitcoin transcripts