Okay, thanks. I want to begin by thanking the organizers, especially the tall guy for allowing me to be here. This talk will be talking about a research area where the goal is deployable systems for verifiable computation for probabilistic proofs like SNARKs etc. The goal of this talk is to motivate the next theoretical steps in this area. I want to convey the excitement and also provide a reality check.
The high level setup is the following. We want to arrange for a client verifier to send a description of a computation F and some input X to a server or prover that returns the output Y with a short proof that establishes for the client that Y is the correct output given the input x. If F takes non-deterministic input, then the proof should establish that the prover knows a witness that is a setting of the non-deterministic input to the validity of the pair X, Y. The motivation of this setup is the general lack of trust in third-party computing. You can imagine a client wants assurance that a query performed on a remote database was done correctly.
The area has several requirements, beginning with efficiency which decomposes first of all into the running time of the verifier. We want it to be constant or logarithmic in the running time of the computation F. Likewise for the communication cost for the number of rounds, that's the ideal. The server's running time should be linear or quasilinear in the number of steps required to execute F. A second requirement is that in some applications it will be desirable for the prover's auxiliary input (or the witness W) to be private from the verifier. Variants of this setup and also non-trivial subsets of this property have been achievable in theory, like interactive proofs, arguments of various kinds, zero knowledge arguments of various kinds, etc.
There's an additional requirement. We want all of this to be practical. What I mean by practical may not be what cryptographers mean by practical. There has been good news and bad news. There has been a lot of attention to this area. On the left is every publication I am aware of that has reported on a probabilistic proof protocol. I have been trying to be comprehensive, so please let me know if there are others. This includes the work of Eli Ben Sasson, Eran Tromer, Alexandra Kiez, Daniel Genkin, Madars Virza, Bobescumonthu, Elaine Shi, Justin Thaler, Michael Mitzenmacher, Bryan P, and Craig Gentry as well. (Yes, all of these names are butchered.)
SBW11, CMT12, SMBW12, TRMP12, SVPBBW12, SBVBPW13, VSBW13, PGHR13, Thaler13, BCGTV13, BFRSBW13, BFR13, DFKP13, BCTV14a, BCTV14b, BCGGMTV14, FL14, KPPSST14, FGV14, BBFR14, WSRHBW15, CFHKKNPZ15, CTV15
So the other excitement other than just the sheer amount of activity is that there's been genuinely cool results. There have been cost reductions of 20 orders of magnitude relative to naive implementations of the theory. Compilers in the sense of an implementation that go from programs in a high-level language that go to binaries that implement the protocols entities. Extensions of the machinery to broader expressiveness that work over remote state, and so on. And most of these systems have verifiers that are generally and concretely efficient.
I also can't resist telling you about one of the papers done by my group appeared in SOSP (BFRSBW13), which is the premiere operating systems venue. This was a landmark or milestone for the area. It was also a milestone for SOSP. The system we sent there wasn't remotely practical. We were very upfront about this. This brings me to the bad news for this area. Just because something is implemented does not mean it's practical. All of the systems on this slide suffer from a limitation that they can only handle small computations and have outrageous expense. For the time being, this machinery is limited to carefully selected particular kinds of applications. I consider this to be bad news given the promise of the area. I'll describe the state of the art systems, then perform a reality check, then use these things to motivate the next theoretical steps in the area.
Summarizing the state of hte art, I wish to mention that all of these systems be decomposed into frontends and backends. The frontend takes in a high-level description of a program such as in C, and outputs an arithmetic circuit over a large finite field. The backend is a probabilistic proof protocol by which the prover can convince the verifier that there is a satisfying assignment to that arithmetic circuit. It's the job of the frontend to produce the circuit in such a way such that the satisfyability of that circuit is tantamount to correct program execution. The probabilistic proof protocols include interactive arguments, non-interactive arguments like SNARKs, as well as interactive proofs. For lack of time, I am not going ot be able to describe in detail the refinements that have been done to Groth10 or GKR08 which was done by principally Justin Taylor. I will restrict focus to arguments. All of these argument systems have the quadratic arithmetic program QAPs (GGPR12).
In order to explain the role of QAPs I am going to walk through a bunch of strawmen or bad ideas for probabilitsic proof protocols. So the first thing you can imagine doing is that if you're a graduate student taking a complexity theory class is you could say well PCPs are pretty awesome they allow us to check things, so what we're going to do is have the profer materialize an asymptotically efficient PCP and send it to the verifier; we know that PCPs have the property that if the computation was done correctly the verifier accepts and if the computation was not done correctly then the verifier is highly unlikely to accept the proof. And the verifiers work here in checking the proof in a number of location is work very low. The problem with this picture is that the prover has to encode the satisfying assignment in the proof, so the proof is actually larger than the satisfying assignment, which means on the basis of communication and running time of the verifier, we don't meet the efficiency requirement in the first slide. So the next thing you could imagine doing is saying well rather than have the verifier send the proof to the verifier, we could just have the prover materialize the proof and then have the verifier ask the proof what values the proof contains at a particular location, and this is the paradigm of Kilian's argument (Kilian92) and Micali's (Micali94) computational soundness proofs. This is actually an excellent idea in principle. The problem is that the constants that the tan of the constructions of short PCPs make this approach very difficult. Among other iissues, the underlying alphabet is not a great fit for the underlying arithmetic circuit model of computation. Eran Tromer who we will be speaking next can probably talk more about this, since I know that he and Eli Ben Sasson actually implemented this stuff although they haven't actually reported on experimental results. I don't know precisely what issues they ran into, but I think this is it.
The next thing you could imagine doing is to turn to a really cool idea from IKO07, SMBW12, SVPBBW12 who may be in this room, or maybe all of them. And the observation is the following. There are 3 pieces to this. The first is that the so called "simple" PCPs were bad idea PCPs that appeared in the first half of the PCP theorem in the ALS paper. These constructions are actually very simple. They are quite straight forward to implement. The second observation is that even though these proofs would rely if they were materialized in full, they would require the construction of an exponentially large table, there's actually a succint representation of these proofs as a function in a linear function that takes this input as a vector and returns this dot product of a vector V, and V encodes the satisfying assignment. The final observation of IKO was that it's possible to get the prover to commit to a function of this form. Then the verifier asks the prover what values the proof contains at a particular location; it's not just that the proof doesn't have to travel to the verifier, it's also that the prover only has to compute the locations in the proof that are asked for on demand, so neither party incurs the exponential blow-up. The advantage to this is that it's simple, my group implemented it in the earlier works in this area, the constants are favorable. Since the proof is exponentially sized, the locations where the verifier wants to inspect it, itself requires a very long string, so we have to work in a preprocessing model where the verifier has to generate these offline, and then amortize this over many uses of the same circuit. The econd disadvantage is that this vector V that you would use in the naive implementation is that it is actually quadratically larger than the satisfying assignment and hence the running time of the computation. This picture sets up another idea based on BCIOP13 and their observation was that if you are willing to weaken the cheating prover and assume it's restricted to only certain operations, then it's possible to have that previous concept from the other slide, but in a non-interactive context. The preprocessing cost can be incurred once, and then reused in all future instances. Those queries that I mentioned earlier would here be encrypted. The verifier may not have to incur this computation, perhaps a third-party does it. The prover looks very similar but they happen in cypherspace here.
The disadvantage to this approach is that besides the restriction on the prover which may or may not be an issue depending on philosophy, the provers have to participate in this is still quadratic. I want to put these on the same spectrum. lThanks to Rafael Pass for suggesting a figure of this kind. So if you look, we started with short PCPs, we moved to arguments which introduced the assumption of collision resistant hash functions, we then actually made a stronger cryptographic assumption or IKO did where we commit to a longer PCP which introduces the downside of preprocessing but the key advantage of simplicity and plausible constants, and the same picture can be realized in a non-interactive way, which gets rid of the issue of amortizing over a batch. But we still have the issue that the prover's work is quadratic. I am going to restrict focus to GGPR'S QAPs. We are going to exploit the quadratic arithmetic programs of GGPR. And I don't have time to go into the details of QAPs, but what they are is a way of encoding a circuit as a set of univariate high degree polynomials in such a way that if the circuit is satisfiable then a particular algebraic relationship holds among the polynomials, namely one polynomial divides another. And that algebraic relationsihp can be checked probabilistically, and further that the probabilistic check actually has a linear query structure. Exploiting that fact leads to that vector V that the prover would have otherwise had to materialize as quadratic, now becomes just H here is some stuff I'm not going to talk about that is quasilinear, and Z is the satisfying assignment to the circuits. So the prover's work is shrunk enormously and this was a critical development. In fairness to GGPR is that the fact that they have a linear PCP structure was not explicit in their paper, which was elucidated in BCIOP13 as well as in SBVBBW13 in my group, where we used different vocabulary but I think it's fair to say that both were vigorously expressed.
So what this leads to is the following picture. The foundation of all of the argument backends is QAPs, the queries can take place in plaintext or they can take place in the exponent. The backend that implements the approach on the left my group calls Zaatar, we're systems people so we pick things not based on our author names just to confuse you, and you can think of this as an interactive argument that refines the protocol of IKO. The approach on the right is what is now know as a SNARG or a zero knowledge SNARK (zk-SNARK) with preprocessing, and GGPR12 stands at the end of several orther works. And it was implemented by Bryan Parno and others as Pinocchio, and hten optimized in a very nice piece of engineering in what is a released software library called libsnark. There is a question to ask about this which is can we dispense with the preprocessing? It's absolutely possible. BCCT13, BCTV14b, CTV15. It shows that it is possible to dispense with preprocessing by applying this picture recursively. The experimental results by some of these authors make clear that these works should be considered theories and they are many orders of magnitude more expenside. The one on the right uses slightly stronger cryptographic assumptions and gets better cryptographic abilities like non-interactive and zeor knowledge variance, etc. The cost of providing zero knowledge is effectively zero which is a really cool result of GGPR. The reason why I am emphasising that these systems are based on GGPR, is that eventhough they are analyzed differently, they still have this commonality.
Okay, so now moving to state of the art frontends, there are effectively two different approaches. One approach is to treat the circuit as the unrolled execution of a general purpose processor and to compile a C program to a binary in the instruction set of that processor and supply the binary as the input to the arithmetic circuit, done by BCGTV13, BCTV14a, BCTV14b, CTV15. This is extremely interesting. The preprocessing work is done only once for all computations of the same length, because they can reuse the circuit. They have excellent reprogrammability, they can handle any program that compiles to general purpose CPU, which is all of them. Universality has a price. The second approach, followed by my group as well as Bryan Parno and his work and SBVBPW13, VSBW13, PGHR13, BFRSBW13, BCGGMTV14, BBFR14, FL14, KPPSST14, WSRBW15, CFHKKNPZ15. I think he will talk about zerocash work. The idea is to compile C programs to circuits directly, a tailored circuit for the computation directly. The circuits are much more concise because they are tailored to the computation. The preprocessing has to be incurred for each different circuit in play. This subset of C program in all of these systems, in this part of the picture, is a subset of C, so that raises the question how restrictive is that subset of C for this "ASIC" approach? I'll answer this by talking about the frontends and the tradeoff between programmability and cost. The more expressiveness, the more costly. The ASIC approach, the more like CPUs are down here. At this point, the approaches over here are not good at handling dynamic loops or where loop bounds are incurred at runtime. There are also problems with control flow and so on. Is it possible to get the best of both worlds? In work done with Buffet we find that it is possible to get the best of both worlds, it's a subset of C and it does not handle function pointers as a limitation. It has excellent-by-comparison costs. Hopefully some of you can push this frontier up into the right and further.
This is the common framework. We have backends based on QAPs and frontends with different approaches. I want to perform a reality check.
How important is it to develop cryptographic protocols that handle more expressive language veruss handling that work at a compiler level and asking the compiler to compile your favorite program into a compatible format? When you compile to remove function pointers, does that have an insane blowup in complexity? You could be asking two things; if we're willing to restrict the frontend, do we get a cheaper backend? If restricting the class of program constructs, does the circuit in the ASIC model too large? How much of the concern of expressiveness can be handled by having better compilers? Some. I'll talk about that in a bit. How can we come up, if we're willing to restrict programmability, can we have more efficient circuits or more concise circuits?
As a reality check, I'm going to do a quick performance study. Zaatar (ASIC), BCTV (CPU), and Buffet (ASIC). We're going to run them on the same computations on the same hardware. What are the verifier's cost and what are the prover's cost? The benchmarks for this is that .. ..... benchmark results go here. The preprocessing cost for the verifier, or the prover, actually scales with the size of the circuit meaning the number of gates. That raises two further questions. The size of the circuit is given by the frontend. We have constants in here like 180 microseconds, 60 microseconds, is that good or bad? I dunno. Let's do a quick comparison. Matrix multiplication, merge sort, Knuth-Morris-Pratt string search. Buffet is the best of them, we thought this was great because we're 2 orders of magnitude better... but some perspective is in order.. if you compare all of them to native execution, we're all terrible. Native execution is way better. This isn't even actual data, this is extrapolated because not even all of these systems can run at these input sizes that we're porporting to depict. The maxium input size is far too small to be called practical. The bottleneck is the memory at the prover and at the verifier preprocessing stage, because at a certain point there is a large operation that has to be performed. So we can't handle computations with more than 10 million gates. The preprocessing costs are extremely high like 10 to 30 minutes. Some of these can be addressed in theory.
The CPU approaches are better in theory but expensive in practice. The prover's overheads are scary. Memory is a scaling limit. What about proof-carrying data work like BCCT13, BCTV4b, CTV15 like at Eurocrypt. This deals with the prover's memory bottleneck in principle and asymptotically but the concrete costs make this difficult to deploy soon. GKR-derived systems (interactive proof sytsems) (CMT12, VSBW13), .. great verifier performance, some expressivity limitations.
Where do we go from here? Target domains where the prover cost is tolerable. One of these might be the Zerocash system like BCGGMTV Oakland14. The whole point of bitcoin is to make people spend CPU cycles, so that's okay. Another application is detailed in our work at SOSP is an application of the machinery to location privacy, I wont have time to dive into this, but I can point you to PBB Security09 or BFRSBW SOSP13 which describes this an application where you can imagine paying these costs becaus ethe size of the data in question is generated on human scales and is not very large.
I'd like to try to motivate theoretical advances in this area. We'd ideally like to have a 1000x improvement in prover performance and verifier preprocessing costs. Real programs interact with files, IO devices, etc., there's not a concretely efficient way to incorporate that into machinery. It would be nice to base this machinery on standard assumptions. Things that could be viewed as a luxury is non-interactivity, although sometimes it's absolutely critical. Pre-processing while it would be nice to get rid of it, it's fine as long as the costs aren't too high, I don't even care if the proof is non-constant in length, and I could tolerate a verifier that works harder if the costs are down.
One interesting thing would be if you remember the frontend/backend decomposition, and the frontend is producing arithmetic circuits, could we target probabilistic proof protocols that do not require circuits? We would have to identify good candidate problems and good protocols for those problems. A variant for this would be to take the circuit satisfyingability machinery to get special purpose algorithms that outsource common pieces of computations that are common like cryptographic operations and Badlis and Elaine have done work in this area. More efficient reductions from programs to circuit are also interesting, this might be the domain of programming languages.
The backend- the short PCP paradigm where the verifier gets a commitment to a short PCP from the prover, that was a good idea in principle, could we get short PCPs that are efficient and simple? It would be great if that existed. Another thing would be to endow IKO's arguments of the preprocessing with better properties, like reuse the preprocessing work beyond a batch, make the protocol zero knowledge, enhance efficiency via stronger assumptions. Or improve the analysis? Some of the cost of the protocol is from the way that soundness is established.
Another way is that GGPR's QAPs have come very far, but they are a principle source of cost. Is it possible to improve them? Or to improve the cryptography on top of QAPs? Imagine if the cloud had special purpose processors for cryptographic operations, and these are one of the key bottlenecks, so could we get to the point where the cryptographic operatoins can be outsourced and the verifier would then check that those were correct?
Even if we make lots of progress on these, this is not going to be plausibly deployable tomorrow. I would be thrilled if I have motivated any of you to work on these problems. This is an exciting area. The bad news is that implementation does not imply deployability. The overhead is from transforming programs into circuits and the overhead of the QAP backend and the cryptography on top of it. We need theoretical breakthroughs. I think the incentive here is very large. If we had low overhead ways of verifying computation, we would have extremely wide applicability. I'll stop here and take questions if there's time.
Q: Which of the overheads is bigger? If you stripped out all of the cryptography?
A: If we just strip the cryptography out of it, the prover still, and the cost can be addressed using their stuff, the prover still has to do a fairly large operation, QAPs mandate that hte prover does polynomial arithmetic and finite fast fourier transformations on the satisfying assignment, this is a bottleneck independent of the cryptography. Getting rid of the cryptography and just have QAPs, the cost would go down, I'm making this up, but it would go down like 2 orders of magnitude, just in running time not the memory bottleneck to be clear.
Q: You mentioned problems that come up with a lot, like having a special solution .. reducing overhead?
A: There are lots of computations that I care about, like the things I use computer sfor every day. What is the commonality? I don't actually know, but I want to think about. If there were operations that could allow transformations on databases, or data structures. I'm not sure. One of the things that tends to blowup which is a bummer is that in the ASIC based approahces, time and conditionality turn into space, so if you have an if statement or switch statement with 4 possibilities, the circuit now needs logic for each of those possibilities. Another thing that might sound crazy is inequality comparisons, if you check if one number is less than another, that introduces expense.
Community-maintained archive to unlocking knowledge from technical bitcoin transcripts