zk-ECDSA 第 2 部分:底层原理

中文：https://mp.weixin.qq.com/s/xV35d1WWQJnSImxYFgkZbA

by Yi Sun, Tony L., Wen-Ding L., and gubsheep

Earlier this year, we presented circom-ecdsa, an efficient proof-of-concept implementation of zkSNARK circuits for ECDSA algorithms in circom. This post goes more in-depth on some of the building blocks that go into the circuit constructions.

In a future post, we'll also discuss some of the more advanced optimizations we implemented for secp256k1 operations.

How does ECDSA work?

In Part 1 of our zk-ECDSA series, we covered the role that the Elliptic Curve Digital Signature Algorithm (ECDSA) plays in networks including Bitcoin and Ethereum. Now, we'll dive into the implementation details and see how to translate ECDSA operations into arithmetic circuits in circom, a programming language for ZK circuits. These circuits in turn specify a zkSNARK that we can use to implement proof of group membership, private airdrops, on-chain vote aggregation, etc. (For more details on how this "circuit to zkSNARK" translation is done, see Vitalik's post and a writeup from zCash.)

For our purposes, we'll focus on two ECDSA operations: (1) converting a private key to a public key; and (2) verifying an ECDSA signature. We won't need to implement signature generation itself, since zero-knowledge verification of a signature's validity accomplishes the same thing as zero-knowledge verification of the valid computation of a signature.

The ECDSA protocol takes several protocol parameters: an elliptic curve equation, a prime number specifying the field which (x, y) coordinates on the curve are drawn from, a generator point on the curve, and the order of the group generated by the generator point. We'll be operating on the secp256k1 curve. Specifically, the curve is defined by the equation:

y^2 \equiv x^3 + 7~(mod~p)y2≡x3+7 (mod p)

where the 256-bit prime p is

p=2^{256}-2^{32}-2^9-2^8-2^7-2^6-2^4-1p=2256−232−29−28−27−26−24−1

The secp256k1 standard also specifies a particular value for generator point G and corresponding group order n. See here for more details on the parameters.

Additionally, Ethereum relies on the cryptographically secure hash function keccak256, which is what we'll be using here. We use a circom implementation of keccak from Vocdoni.

ECDSAPrivToPub

Given a private key, ECDSA tells us how to generate a corresponding public key that can be used for signature verification.

The private key d_A is a randomly generated integer in the range [1, n - 1] and the public key is defined as Q_A = d_A * G. In other words, we simply need to multiply our base point G by the private key, using elliptic curve multiplication by a scalar. This will be done using a combination of elliptic curve point addition and point doubling.

ECDSAVerify

Signing a message m with a public key Q_A produces a signature (r, s). Now, say we are given a signature (r, s) and public key Q_A. We want to check that this signature is valid.

You can find a complete description of the signature verification algorithm here. At a high level, this algorithm can be broken down into the following operations:

Hashing the message m (with keccak256)
Modular multiplication and inverse
Elliptic curve multiplication by a scalar, and elliptic curve point addition

Circuits and Techniques

Now that we have a good understanding of the operations involved, we're onto the fun part: let's translate these operations into arithmetic circuits. Compared to using a procedural programming language, writing circuits in circom both introduces additional constraints, but also allows us to use a few tricks to reduce the amount of computation required. We'll measure progress based on how many constraints our final circuits require.

As a brief refresher on arithmetic circuits, a constraint is simply an equation of the form A * B + C = 0, where A, B, and C are linear combinations of signals, which for our purposes can be thought of as variables.

Producing a constraint-satisfying witness (variable assignments for intermediate values) is relatively easy to do since we can perform any operations to generate the witness--not just + and *. However, ensuring that our constraints are sufficient is much harder. Our goal is to construct circuits such that the output signals are the correct function of the input signals. In other words, there should be no way to manipulate the values without violating the constraints. Currently, we rely on auditing our code and running tests to verify the correctness of these circuits.

BigInt Arithmetic

BigInt Representation

First, let's find a suitable representation of big integers. Recall that today's zkSNARK's proving systems generally use elliptic curves with a 254-bit prime that effectively limits the register1 sizes to 254 bits. However, the prime p for secp256k1 is 256 bits, which creates a bit of awkwardness.

If we want to support multiplication "natively", we should limit our register size to at most (254 - 1) / 2 = 126 bits, since multiplying two k-bit numbers results in a 2k-bit number. However, this would require ceil(256 / 126) = 3 registers to represent integers in [0, p - 1]. Then we're "wasting" 3 * 126 - 256 = 122 bits, which is basically a full register

Given that we're forced to use 3 registers already, we can simply pick the number of bits to be ceil(256 / 3) = 86. This way, we're only "wasting" 3 * 86 - 256 = 2 bits. Not terrible! This is the best we can do given the constraints.

Let's assume 86-bit registers going forward, keeping in mind that we need 3 of them to represent integer ranges we care about. We'll represent a[k] and b[k] as two integers, each using k registers.

BigInt Comparison/Addition/Subtraction

For comparisons, we can simply check register by register, e.g. checking a < b amounts to:

a[k - 1] < b[k - 1]
or (a[k - 1] == b[k - 1] and a[k - 2] < b[k - 2])
or (a[k - 1] == b[k - 1] and a[k - 2] = b[k - 2] and a[k - 3] < b[k - 3])
etc...

These can be done by simply composing the existing AND/OR circuits in circom.

For addition, we can simply sum up each pair of registers a[i] + b[i] and emulate the carry with a separate signal. Subtraction is similar, if we assume a >= b and calculate a[i] - b[i] with a borrow signal.

BigInt Multiplication

Normally, with k registers, a naive multiplication implementation uses two for-loops to compute intermediate products, for O(k^2) constraints, followed by a bunch of carries that takes O(nk) constraints (the main cost being a set of range checks that each of 2k registers is at most n bits). Our initial implementation of bigint multiplication required several thousand constraints to verify the multiplication of two 256-bit numbers. Can we do better?

Our strategy is as follows. First, we define a few terms:

Define the proper representation of a number n in base x to be an array a such that each value a[i] is a nonnegative integer less than x, and such that n = a[0] + a[1] * x + ... + a[k - 1] * x ** (k - 1). Note that for any nonnegative integer n and any base x, there is exactly one proper representation of n.
Define an m-overflow representation of n to be an array a such that each value a[i] is a nonnegative integer less than m * x, and such that n = a[0] + a[1] * x + ... + a[k - 1] * x ** (k - 1). Note that a nonnegative integer n may have many different m-overflow representations for the same base x.

In the first part of our approach, we find a specific k * 2**n-overflow representation (c[2k]) of the product of our two bigint numbers, represented as a[k] and b[k]. Importantly, we pick a particular k * 2**n-overflow representation that can be efficiently constrained, inspired by a trick from xJsnark. In the second part of our approach, we build a circuit that constrains an overflow representation of a number to equal its proper representation--a process we call reducing.

Let's break down the first part. Since our result is 2k registers, the best we can hope for is roughly 2k constraints, e.g. one for each register.

We can define the polynomials:

A(x) = a[0] + a[1] * x + ... + a[k - 1] * x ** (k - 1)
B(x) = b[0] + b[1] * x + ... + b[k - 1] * x ** (k - 1)
C(x) = c[0] + c[1] * x + ... + c[2 * k - 2] * x ** (2 * k - 2)

If we ignore carries (we'll get back to these later), verifying a multiplication a * b = c is the same as verifying the polynomial identity A(x) * B(x) = C(x) using x = 2 ** 86, where the coefficients c[j] may be up to k * (2 ** 86) ** 2. The two for-loop implementation amounts to calculating each of the coefficients c[j] = sum a[i] * b[j - i] separately. However, what if we just evaluate the two polynomials on 2k - 1 points?

The key insight here: if two degree-m polynomials agree at m + 1 distinct points, they must be equal. For instance, two points uniquely determine a line (degree-1), three points determine a parabola (degree-2), and so on. Since our polynomials have degree 2 * k - 2, the 2 * k - 1 constraints C(y) = A(y) * B(y) for y = 0, ..., 2 * k - 2 are enough to check polynomial equality. Because each of A(y), B(y), C(y) is simply a weighted sum of the signals a[i], b[i], c[i], each equality only costs a single constraint in our computational model.

This "polynomial trick" allows us to obtain a K * (2 ** n)-overflow representation of our desired product, reducing the set of constraints from O(k ** 2) to O(k)!

Now we are left with the problem of reducing this overflow representation into the proper representation. In other words, we'd like to prove the equivalence of our overflow representation c[2k] and the proper representation d[2k]. Our strategy here will be to initialize an array of intermediate signals carry[2k], and going register-by-register to verify that the difference between c[i] and d[i] (including carries from previous registers) is always an integer multiple of 2 ** n. We use the following constraints:

Initialize intermediate signals carry[2k].
Constrain carry[0] = c[0] - d[0] * (2 ** -n).
Constrain carry[i] = (c[i] - d[i] + carry[i-1]) * (2 ** -n) for all i > 0. We can do this with additions and multiplications by constraining 2 ** n * carry[i] = c[i] - d[i] + carry[i-1].
Range check that each value d[i] is at most n bits.
Range check that each value carry[i] is at most n + log(k) bits.
Constrain carry[2k-1] = 0.

BigInt Division/Modulo

As it turns out, division is only slightly more complicated than multiplication in terms of constructing constraints. If we write a / b as (q, r), where q is the quotient and r is the remainder, we simply need to constrain a = b * q + r and 0 <= r < b to hold. We still need to calculate the values using long division to produce a witness, but constraining this witness is much more straightforward as we can reuse the BigInt multiplication constraints.

With division, we can implement the a mod p operation, which is just the remainder upon division by p. We can use this circuit to build addition, subtractions, and multiplication mod p as well! The only operation left is modular inverse, which we can constrain as a_inv * a = 1 (mod p) and reuse our multiplication circuit. We can calculate a_inv for the witness by using Fermat's Little Theorem, i.e. a ** (p - 2) = a_inv (mod p). Once again, we can turn division into multiplication when writing constraints.

ECC Operations

In this section, we discuss how to go from BigInt arithmetic to elliptic curve operations over a non-native field. Constraint counts below are quoted for our v0.0.0 implementation. In the past few months we've implemented further optimizations, which we're still in the process of cleaning up and documenting.

Point Addition and Doubling

In the conventional computational model, modular inverse is substantially more expensive than modular multiplication. As a result, point addition and doubling are usually done in Jacobian form, which removes modular inversions at the cost of more modular multiplications. In our setting, modular inverse and multiplication cost the same number of constraints, so we use the more efficient standard formulas in Weierstrass form. This is another example of how programming zkSNARKs creates a new regime for performance optimization.

Scalar Multiplication

Implementing ECDSAPrivToPub requires scalar multiplication d_A * G of a base point G on secp256k1 by the private key d_A, a scalar. The standard algorithm for this is double-and-add, which uses the binary representation of d_A:

d_A = bits[0] + bits[1] * 2 + ... + bits[255] * (2 ** 255).

It works by successively doubling and adding the base point, with pseudocode:

def double_and_add(bits):
    res = O                   // O is the additive identity on secp256k1
    for i in [0, ..., 255]:
        res = res + res       // doubling on secp256k1
        if bits[i] == 1:
            res = res + G     // addition of unequal points on secp256k1
    return res

For a 256-bit private key, in the worst case this requires 256 point additions and doublings, which translates to 9037920 constraints. Because the base point G is known and fixed in our setting, we can reduce the number of point operations by precomputing and caching multiples of G by powers of two. We can then compute the public key as the sum of cached multiples of G corresponding to the binary digits of the private key.

Gpow[256] = precompute_G_powers()  // Gpow[i] = (2 ** i) * G

def cached_double_and_add(bits):
    let res = O                    // O is the additive identity on secp256k1
    for i in [0, ..., 255]:
        if bits[i] == 1:
            res = res + Gpow[i]    // addition of unequal points on secp256k1
    return res

This cached double-and-add method requires at most 256 point additions and no point doublings, reducing the number of constraints to 3733697.

In our implementation, we further reduce the number of point additions using a cached version of the windowed method. That is, we group binary digits of the private key into groups of size stride, cache multiples of G corresponding to numbers with non-zero binary digits only in a single group, and represent G as a sum of the cached elliptic curve points:

// Gpow[i][j] = j * (2 ** (i * stride)) * G for j = 1, ..., 2**stride - 1
Gpow[ceil(256 / stride)][2 ** stride] = precompute_G_powers(stride)

def cached_windowed(bits, stride):
    let res = O                     // O is the additive identity on secp256k1
    for i in [0, ..., ceil(256 / stride) - 1]:
        j = bits[i * stride] + ... + bits[(i + 1) * stride - 1] * 2**(stride - 1)
        if j != 0:
            res = res + Gpow[i][j]  // addition of unequal points on secp256k1
    return res

This cached windowed method reduces the number of point additions to ceil(256 / stride), which decreases as stride increases. However, as stride increases, we must maintain a growing cache of size ceil(256 / stride) * 2**stride points and select the jth element from Gpow[i] using a multiplexer with 2 ** stride possible selections. Surprisingly, this is an expensive operation in a zkSNARK, requiring O(2 ** stride) constraints per selection.

Running an exhaustive search for the stride which balances constraint usage in point additions and multiplexers, we find the following relation between the stride and number of constraints:

Untitled

The number of constraints seems to be minimized at stride 10. The resulting stride 10 windowed method is what we use in our circuit with 416883 constraints, an over 20-fold decrease from our starting point!

Outlook and Future Work

Further optimizing the R1CS representation size

Beyond the strategy that we laid out above, there are still some opportunities for further optimizing our circuits in R1CS form. We'll briefly mention one common technique: lazy reduction.

After each big integer arithmetic operations which we apply during an ECC operation, we keep the big integer in range [0, p - 1] and keep all the registers in the desired range as well. As a result, we have two kinds of reduction which correspond to keeping these values in range. However, it is possible to skip some of these reductions, and only do the reduction when it is necessary. This is commonly called lazy reduction, and with this we are in fact able to further reduce the total number of constraints. In a future blog post, we'll go into some of these optimizations in more detail.

New Proving System - PLONK

One advantage of PLONK is that it doesn't require a trusted setup per application-specific circuit, and this will greatly ease the burden of developers who want to deploy ZK applications in real world. Our circuits in R1CS form can be trivially converted to the form for PLONK with just standard addition and multiplication gates. One caveat of such naive conversion is that addition gates do not come for free like in Groth16, so the circuits might be suboptimal.

We are aware that there are many recent developments of PLONKish proving systems with carefully designed custom gates or lookup tables which have very promising results on implementing ECC operations (e.g. Turbo Plonk). We are also thinking about optimize our circuit by taking advantage of the efficient range proof in PLONK and PLOOKUP, and redesign our circuit with smaller size of register. The bottom line right now however is that we'd like to deliver an implementation that actually works in today's circom ecosystem, as it can help developers to prototype and develop new zk applications more easily.

Pairing-Based Crypto and Other Primitives

It is possible to instead build an ECC module with a pairing-friendly curve besides the secp256k1 curve we used here since most of our techniques are general enough for other curves as well. Another 0xPARC team will soon be sharing an implementation of pairing based crypto inside a zkSNARK, which can potentially lead to additional applications like BLS signatures.

Circuit Auditing and Verification

As we noted in the first part of this serie, these circuits are unaudited and we do not recommend them for production use as of now. We did our best effort to have witness tests cover most circuit modules, but generally speaking, witness testing alone is not enough. There are a few directions we can take to address this issue:

Developing better testing method: Currently we only test whether witness generations is correct and satisfied the circuit constraints, it would be better to have a test that are able to catch "under-contrained" bugs such as the one report here.
Applying formal verfication: Besides testing, formal method is also suitable in our case because the spec for our circuits is easy to specified mathematically.
Submitting for professional audit: If there is strong demand, we can also submit them to professional auditting company for security audit.

Conclusion

We invite you to check out our repo and would love to hear any feedback, ideas for improvement / optimization, or application experiments you'd like to build on top!

This project was built during 0xPARC's Applied ZK Learning Group #1, and further optimized during the 0xPARC ZK ID Working Group.

We use a circom implementation of keccak from Vocdoni. We also use some circom utilities for converting an ECDSA public key to an Ethereum address implemented by lsankar4033, jefflau, and veronicaz41 for another ZK Learning Group project in the same cohort. We use an optimization for big integer multiplication from xJsnark.

Footnotes

In the literature these are commonly referred to as "limbs."↩
Concretely, the "waste" here comes from needing to perform range checks for unnecessarily large registers. Range checks are one of the most common (and most expensive, constraints-wise) operations in ZK circuits. The constraint cost of a range check is proportional to the number of bits of the number we are checking; range-checking three 126-bit numbers is more expensive than range-checking three 86-bit numbers.↩

今年早些时候，我们介绍了circom-ecdsa，这是一种用于 circom 中 ECDSA 算法的 zkSNARK 电路的有效概念验证实现。这篇文章更深入地介绍了电路结构中的一些构建块。

在以后的文章中，我们还将讨论我们为 secp256k1 操作实施的一些更高级的优化。

ECDSA 是如何工作的？

在我们的 zk-ECDSA 系列的第 1 部分中，我们介绍了椭圆曲线数字签名算法 (ECDSA)在包括比特币和以太坊在内的网络中所扮演的角色。现在，我们将深入研究实现细节，看看如何将 ECDSA 操作转换为 circom 中的算术电路，circom是一种用于 ZK 电路的编程语言。这些电路又指定了一个 zkSNARK，我们可以使用它来实现组成员身份证明、私人空投、链上投票聚合等。（有关如何完成“电路到 zkSNARK”转换的更多详细信息，请参阅**Vitalik 的帖子**和来自 zCash 的文章。）

出于我们的目的，我们将关注两个 ECDSA 操作：（1）将私钥转换为公钥；(2) 验证 ECDSA 签名。我们不需要自己实现签名生成，因为签名有效性的零知识验证与签名有效计算的零知识验证完成相同的事情。

ECDSA 协议采用几个协议参数：椭圆曲线方程、指定(x, y)曲线上坐标所来自的字段的素数、曲线上的生成点以及生成点生成的组的顺序。我们将在secp256k1曲线上运行。具体来说，曲线由以下等式定义：

是的2≡X3+7 ( m o d p )

256 位素数在p哪里

p=2256-232-29-28-27-26-24-1

Gsecp256k1 标准还为生成器点和相应的组顺序指定了一个特定值n。有关参数的更多详细信息，请参见此处。

此外，以太坊依赖于加密安全散列函数keccak256，这就是我们将在这里使用的。我们使用来自 Vocdoni的 keccak 的 circom 实现。

ECDSAPrivToPub

给定一个私钥，ECDSA 告诉我们如何生成对应的公钥，用于签名验证。

私钥d_A是范围内随机生成的整数，[1, n - 1]公钥定义为Q_A = d_A * G。换句话说，我们只需要将我们的基点乘以G私钥，使用椭圆曲线乘以一个标量。这将使用椭圆曲线点加法和点倍增的组合来完成。

ECDSA验证

m使用公钥对消息进行Q_A签名会产生签名(r, s)。现在，假设我们得到了一个签名(r, s)和公钥Q_A。我们要检查此签名是否有效。

您可以在此处找到签名验证算法的完整说明。在高层次上，该算法可以分解为以下操作：

散列消息m
模乘法和逆
椭圆曲线乘以标量和椭圆曲线点加法

电路和技术

现在我们对所涉及的操作有了很好的理解，我们进入了有趣的部分：让我们将这些操作转换为算术电路。与使用过程式编程语言相比，在 circom 中编写电路既引入了额外的约束，也允许我们使用一些技巧来减少所需的计算量。我们将根据最终电路需要多少约束来衡量进度。

作为对算术电路的简要回顾，约束只是一个形式为的方程A * B + C = 0，其中A、B和C是信号的线性组合，就我们的目的而言，可以将其视为变量。

生成满足约束的见证（中间值的变量分配）相对容易，因为我们可以执行任何操作来生成见证——不仅仅是+和*。然而，确保我们的约束是足够的要困难得多。我们的目标是构建电路，使输出信号是输入信号的正确函数。换句话说，不应该违反约束来操纵这些值。目前，我们依靠审核我们的代码和运行测试来验证这些电路的正确性。