zkEVM | Notion

https://hackmd.io/@yezhang/S1_KMMbGt Dr.zhangye

zkEVM We thank Vitalik Buterin, Barry Whitehat, Chih-Cheng Liang, Kobi Gurkan and Georgios Konstantopoulos for their reviews and insightful comments.

TL;DR We believe zk-Rollup to be the holy grail — a best-in-class Layer 2 scaling solution that is very cheap and secure. However, existing zk-Rollups are application-specific, which makes it hard to build general composable DApps inside a zk-Rollup and migrate existing applications. We introduce zkEVM, which can generate zk proofs for general EVM verification. This allows us to build a fully EVM-compatible zk-Rollup, which any existing Ethereum application can easily migrate to.

In this article, we identify the design challenges of zkEVM and why it’s possible now. We also give a more specific intuition and describe the high-level ideas of how to build it from scratch.

Background zk-Rollup is recognized as the best scaling solution for Ethereum. It is as secure as Ethereum Layer 1 and has the shortest finalizing time comparing to all other Layer 2 solutions (Detailed comparisons here).

In the medium to long term, ZK rollups will win out in all use cases as ZK-SNARK technology improves. — Vitalik Buterin

The basic idea of zk-Rollup is to aggregate a huge number of transactions into one Rollup block and generate a succinct proof for the block off-chain. Then the smart contract on Layer 1 only needs to verify the proof and apply the updated state directly without re-executing those transactions. This can help save gas fee for an order of magnitude since proof verification is much cheaper than re-executing computations. Another saving comes from data compression (i.e., only keep minimum on-chain data for verification)

Although zk-Rollup is secure and efficient, its applications are still limited to payments and swaps. It’s hard to build general-purpose DApps due to the following two reasons.

First, if you want to develop DApps in a zk-Rollup, you need to write all your smart contract logic using a special language (i.e. R1CS). Not only is the syntax of required language complicated, but doing so also demands extremely strong expertise in zero-knowledge proof. Second, current zk-Rollup doesn’t support composability[1]. It means different zk-Rollup applications can’t interact with each other within Layer 2. Such quality significantly undermines the composability of DeFi applications. In a nutshell, zk-Rollup is developer-unfriendly and has limited functionality for now. That’s the biggest problem we want to tackle. We want to provide the best developer experience and support composability within Layer 2 by supporting native EVM verification directly, so that existing Ethereum applications can simply migrate over onto the zk-Rollup as is.

Build general DApps in zk-Rollup There are two ways to build general DApps in zk-Rollup.

One is building application-specific circuit (“ASIC”) for different DApps. The other is building a universal “EVM” circuit for smart contract execution. “circuit” refer to the program representation used in zero-knowledge proof. For example, if you want to prove hash(x) = y, you need to re-write the hash function using the circuit form. The circuit form only supports very limited expressions (i.e., R1CS only support addition and multiplication). So, it’s very hard to write program using the circuit language — you have to build all your program logic (including if else, loop and so on) using add and mul.

The first approach requires developer to design specialized “ASIC” circuits for different DApps. It is the most traditional way to use zero-knowledge proof. Each DApp will have a smaller overhead through customized circuit design. However, it brings the problem of composability since the circuit is “static” and terrible developer experience since it needs strong expertise in circuit design[2].

The second approach doesn’t require any special design or expertise for developer. The high-level idea of such machine-based proof is that any program will eventually run on CPU, so we only need to build a universal CPU circuit to verify the low-level CPU step. Then we can use this CPU circuit to verify any program execution. In our scenario, program is smart contract and CPU is EVM. However, this approach is not commonly adopted in the past years due to its large overhead. For example, even if you only want to prove the result of add is correct in one step, you still need to afford the overhead of an entire EVM circuit. If you have thousands of steps in your execution trace, it will be 1000x EVM circuit overhead on the prover side.[3]

Recently, there has been a lot of research going on to optimize zk proofs following those two approaches, including (i) proposing new zk-friendly primitives i.e. Poseidon hash can achieve 100x efficiency than SHA256 in circuit, (ii) ongoing work on improving efficiency of general-purpose verifiable VMs, as in TinyRAM, and (iii) a growing number of general-purpose optimization tricks like Plookup, and even more generally faster cryptographic libraries.

In our previous article, We propose to design “ASIC” circuit for each DApp and let them communicate through cryptographic commitments. However, based on the feedback from the community, we changed our priority and will focus on building a general EVM circuit (so called “zkEVM”) first following the second approach. zkEVM will allow exactly the same developer experience as developing on Layer 1. Instead of leaving design complexity to the developer, we will take over it and solve the efficiency problem through customized EVM circuit design.

Design challenges of zkEVM zkEVM is hard to build. Even though the intuition is clear for years, no one has built a native EVM circuit successfully. Different from TinyRAM, it’s even more challenging to design and implement zkEVM due to the following reasons:

First, EVM has limited support of elliptic curves. For now, EVM only supports BN254 pairing. It’s hard to do proof recursion since cyclic elliptic curve is not directly supported. It’s also hard to use other specialized protocols under this setting. The verification algorithm has to be EVM friendly. Second, EVM word size is 256bit. EVM operates over 256-bit integers (much like most regular VMs operate over 32-64 bit integers), whereas zk proofs most “naturally” work over prime fields. Doing “mismatched field arithmetic” inside a circuit requires range proofs, which will add ~100 constraints per EVM step. This will blow up EVM circuit size by two orders of magnitudes. Third, EVM has many special opcodes. EVM is different from traditional VM, it has many special opcodes like CALL and it also has error types related to the execution context and gas. This will bring new challenges to circuit design. Fourth, EVM is a stack-based virtual machine. The SyncVM (zksync) and Cairo (starkware) architecture defines its own IR/AIR in the register-based model. They built a specialized compiler to compile smart contract code into a new zk-friendly IR. Their approach is language compatible instead of native EVM-compatible. It’s harder to prove for stack-based model and support native tool chain directly. Fifth, Ethereum storage layout carries a huge overhead. The Ethereum storage layout highly relies on Keccak and a huge MPT[4], both of them are not zk-friendly and have a huge proving overhead. For example, Keccak hash is 1000x larger than Poseidon hash in circuit. However, if you replace Keccak with another hash, it will cause some compatibility problems for the existing Ethereum infrastructure. Sixth, machine-based proof has a gigantic overhead. Even if you can handle all the aforementioned problems properly, you still need to find an efficient way to compose them together to get a complete EVM circuit. As I mentioned in previous section, even simple opcodes like add might result in the overhead of the entire EVM circuit. Why possible now? Thanks for the great progress made by researchers in this area, more and more efficiency problems are solved in the last two years, the proving cost for zkEVM is eventually feasible! The biggest technology improvement comes from the following aspects:

The usage of polynomial commitment. In the past few years, most succinct zero-knowledge proof protocols stick to R1CS with PCP query encoded in an application-specific trusted setup. The circuit size usually blows up and you can’t do many customized optimizations since the degree of each constraint needs to be 2 (bilinear pairing only allows one multiplication in the exponent). With polynomial commitment schemes, you can lift your constraints to any degree with a universal setup or even transparent setup. This allows great flexibility in the choice of backend. The appearance of lookup table arguments and customized gadgets. Another strong optimization comes from the usage of lookup tables. The optimization is firstly proposed in Arya and then gets optimized in Plookup. This can save A LOT for zk-unfriendly primitives (i.e., bitwise operations like AND, XOR, etc.) . Customized gadgets allow you to do high degree constraints with efficiency. TurboPlonk and UltraPlonk defines elegant program syntax to make it easier to use lookup tables and define customized gadgets. This can be extremely helpful for reducing the overhead of EVM circuit. Recursive proof is more and more feasible. Recursive proof has a huge overhead in the past since it relies on special pairing-friendly cyclic elliptic curves (i.e. MNT curve-based construction). This introduces a large computation overhead. However, more techniques are making this possible without sacrificing the efficiency. For example, Halo can avoid the need of pairing-friendly curve and amortize the cost of recursion using special inner product argument. Aztec shows that you can do proof aggregation for existing protocols directly (lookup tables can reduce the overhead of non-native field operation thus can make the verification circuit smaller). It can highly improve the scalability of supported circuit size. Hardware acceleration is making proving more efficient. To the best of our knowledge, we have made the fastest GPU and ASIC/FPGA accelerator for the prover. Our paper describing ASIC prover has already been accepted by the largest computer conference (ISCA) this year. The GPU prover is around 5x-10x faster than Filecoin’s implementation. This can greatly improve the prover’s computation efficiency. Ok, so how does it work and how to build it? Besides the strong intuition and technology improvement, we still need to have a more clear idea of what we need to prove and figure out a more specific architecture. We will introduce more technical details and comparisons in the follow up articles. Here, we describe the overall workflow and some key ideas.

Workflow for Developers and Users For developers, they can implement smart contracts using any EVM-compatible language and deploy the compiled bytecode on Scroll. Then, users can send transactions to interact with those deployed smart contracts. The experience for both users and developers will be the exactly the same as Layer 1. However, the gas fee is significantly reduced and transactions are pre-confirmed instantly on Scroll (withdraw only takes a few minutes to finalize).

Workflow for zkEVM Even if the workflow outside remains the same, the underlying processing procedure for Layer 1 and Layer 2 are entirely different: