https://bwetzel.medium.com/zero-knowledge-machine-learning-zkml-projects-exploring-the-space-fc9d5f04fb65
Over the previous year, there have been significant developments in zero-knowledge technology, and in 2023, we are experiencing a remarkable increase in its adoption across the blockchain sector.
In parallel, the deployment of machine learning (ML) is becoming more intricate. Numerous enterprises are now opting for ML-as-a-service providers (Amazon, Google, Microsoft, among others) to implement complicated, proprietary ML models. With the proliferation of these services, they become progressively more challenging to audit and understand, posing a vital question: how can consumers of these services trust the validity of the predictions provided?
ZKML offers a solution by enabling the validation of private data using public models or verifying private models with public data.
Zero-knowledge (ZK) proofs are a cryptographic mechanism where the prover can demonstrate to the verifier that a given statement is true, without revealing any supplementary information except that the statement is true. The field of ZK proofs has made significant strides on various fronts, from research to protocol implementations and real-world applications.
ZK proofs leverage two primary “primitives,” or building blocks, to enable their functionality. The first is the ability to establish proofs of computational integrity for a given set of computations. The proof is substantially more straightforward to validate than to execute the computation itself, referred to as “succinctness.” Additionally, ZK proofs offer the option to conceal specific parts of the computation while preserving computational accuracy, known as “zero-knowledge.”
If you want to learn more about ZK, I recommend you attend this ZKP MOOC.
The State of ZK report, a quarterly publication that examines key developments in the zero-knowledge ecosystem, highlights the trends that have generated the most interest within the ZK community:
The leading use case for ZK is privacy. The zero-knowledge primitive of ZK proofs allows for concealing specific parts of the computation being validated. This capability is particularly advantageous for creating applications that uphold users’ privacy and safeguard their personal data while producing cryptographic attestations. Several noteworthy initiatives in this regard include: Semaphore, MACI, Penumbra or Aztec Network.
ZK for scaling ranks second. Distributed systems, such as public blockchains, possess restricted computational capabilities because every participating node (computer) must run the computations in each block by themselves to validate them. However, by utilizing ZK proofs, we can perform these computations off-chain, generate a ZK proof, and then authenticate this proof on-chain, achieving scalability while maintaining security and decentralization. Exemplary projects include: Starknet, Scroll, Polygon, Zero, Polygon Miden, Polygon zkEVM or zkSync.
Identity also draws attention indicating a growing curiosity in utilizing ZK technology in the realm of identity management. This includes developing proof-of-personhood protocols to create cryptographic attestations. Some notable initiatives in this area include: WorldID, Sismo, Clique or Axiom
When asked about the most exciting new use-cases, it is apparent that the community’s focus is on ZKML, which is considered the most appealing new use-case (besides interoperability or zkBridges). The remainder of this article will focus on ZKML for the purpose of effectively verifying that all computations have been accurately executed, which has far-reaching implications beyond just blockchains.
Creating zero-knowledge proofs requires significant computational resources, often much more than the original computation. As a result, there are certain computations that are impractical to prove with zero-knowledge proofs because of the time required to generate them. However, recent progress in cryptography, hardware, and distributed systems has made it possible to generate zero-knowledge proofs for increasingly intensive computations. These advances have enabled the development of protocols that can use proofs of intensive computations, expanding the range of applications for which zero-knowledge proofs can be used. A recent study by the Modulus Labs team titled “The Cost of Intelligence” evaluates various existing ZK proof systems against a wide range of models of varying sizes.
As AI technology continues to advance, it becomes more challenging to distinguish AI-generated content from human-generated content. However, zero-knowledge cryptography may hold the potential to solve this problem by enabling us to determine whether a particular piece of content was produced by applying a specific model to a given input, without revealing any additional information about the model or the input. In the case of large language models such as GPT-4, the creation of a zero-knowledge circuit representation could provide a means for verifying their outputs.
The zero-knowledge property inherent in these proofs would enable us to conceal any sensitive parts of the input or the model if necessary. An illustrative example would be the use of a machine learning model on personal data, where a user could obtain the outcome of the model inference on their data without disclosing their input to any external entity (such as in the healthcare sector).
ZKML is still a nascent technology and many use cases have yet to be explored. However, below are some of the most obvious use cases as also highlighted by Worldcoin’s article and Elena Burger’s post.
Validity proofs such as SNARKs and STARKs have the capability to demonstrate that a computation has been executed correctly, and this can be applied to machine learning by verifying ML model inference or that a model generated a particular output based on a specific input. The ability to easily prove and verify that the output is the result of a particular model and input combination allows for the deployment of machine learning models on specialized hardware off-chain, while the ZK proofs can be conveniently verified on-chain.
When discussing ZKML, the focus is typically on generating zero-knowledge proofs of the inference step of the ML model, rather than on verifying the validity of the data used to train the model. The latter is already a highly computationally intensive process on its own.
Aside from validity proofs, zero-knowledge cryptography can also be used to preserve privacy in machine learning applications. One example would be to prove that a model has a certain level of accuracy on test data without revealing the weights used. An example of another use case is privacy-preserving inference, where private patient data can be used for medical diagnostics and the sensitive inference, such as a cancer test result, can be sent to the patient without revealing their data to any third party.
In cases where companies offer access to ML models through their APIs, it can be difficult for users to know whether the provider is actually offering the model that they claim to be providing, since the API is essentially a black box. Validity proofs that are associated with an ML model API would be valuable in providing transparency to the user, as they could verify which model they are utilizing.
Performing machine learning inference or training in a decentralized way while allowing people to submit data to a public model requires deploying an existing model on-chain or building a new network. Zero-knowledge proofs can be used to compress the model.
To incorporate attestations from external verified parties, such as a digital platform or hardware that can produce a digital signature, into a smart contract running on-chain, one can verify the signature using a zero-knowledge proof and use it as an input in a program. This method can be applied to any digitally attested information, providing a means of verifying authenticity and provenance from a trusted source. Endpoints that generate digital signatures can be verified and used in this way.
As advancements in cryptography, hardware, and distributed systems continue to make zero-knowledge proofs feasible for increasingly intensive computations, an increasing number of projects are exploring the use of ZKML. The illustration below provides a non-exhaustive overview of current projects, though it should be noted that there may be some overlap between categories, and that this presentation is simplified for clarity. Additionally, there are numerous open source codebases available for building ZKML applications, indicating a growing interest and excitement in the community.
Not comprehensive; as of May 10, 2023; strict demarcations are not possible between categories
As ZK technology continues to advance, it is becoming increasingly feasible to prove larger machine learning models on less powerful machines in a shorter period of time. This is due to improvements in specialized hardware, proof system architecture, and more efficient ZK protocol implementations. As a result of these advancements, new ZKML applications and use cases are expected to emerge.
While the big picture use case of ZKML in Web3 is to enable on-chain organizations to run machine learning models, the fast-paced evolution of ZKML offers potential solutions to intricate problems in multiple fields. In my view, the following use cases could arise in this context:
⁃ Decentralized finance (DeFi): using ZKML to validate yield-maximizing strategies or rebalancing of pools for customers. One example of this is RockyBot
⁃ Gaming: using ZKML to validate betting mechanisms or AI-enhanced players. An example of this is Leela vs the World
⁃ Identity: using ZKML to perform AI analysis on user biometric information while ensuring custody of the data. An example of this is WorldID
⁃ Healthcare: ZKML can be utilized in the medical field for disease prediction by running machine learning models over sensitive medical data while preserving privacy
Although ZKML shows great potential, the field is still in its early stages of development. One challenge is that accuracy and fidelity may be compromised during the conversion of a model into a circuit. Another limitation is that the parameters and activations of many machine learning models are encoded as 32-bits for precision, which current zero-knowledge proof systems struggle to represent in the required arithmetic circuit format without significant overhead.
Currently, the field of ZKML is still catching up as zero-knowledge proofs continue to be optimized to handle increasingly complex machine learning models.
Acknowledgements: ZKML community’s awesome-zkml repository on GitHub, Worldcoin’s article & Elena Burger’s post and many thanks to Daniel Shorr & Diogo Almeida for conversations around this topic
If you have a project in ZKML, please reach out!
You could follow me on Twitter and LinkedIn
在过去一年中,零知识技术取得了显著进展,并且在2023年,我们在区块链行业中见证了其采用率的显著增长。
与此同时,机器学习(ML)的部署变得更加复杂。许多企业现在选择ML作为服务提供商(例如亚马逊,谷歌,微软等)来实施复杂的专有ML模型。随着这些服务的普及,它们变得越来越难以审计和理解,提出了一个重要问题:这些服务的消费者如何相信所提供的预测的有效性?
ZKML通过使用公共模型验证私有数据或使用公共数据验证私有模型来解决这个问题。
零知识(ZK)证明是一种加密机制,其中证明者可以向验证者证明一个给定的陈述是真实的,而不透露任何附加信息,除了该陈述是真实的。 ZK证明领域在多个方面取得了重大进展,从研究到协议实施和实际应用。
ZK证明利用两个主要的“基元”或构建块来实现其功能。第一个是确立对给定计算集的计算完整性证明的能力。证明的验证比执行计算本身要容易得多,称为“简洁性”。此外,ZK证明提供了隐藏计算特定部分的选项,同时保留计算精度,称为“零知识”。
如果想了解更多关于ZK的内容,建议参加此ZKP MOOC。
《ZK状态报告》是一份季度性出版物,研究零知识生态系统中的关键发展,重点介绍了在ZK社区内引起最大兴趣的趋势:
ZK的主要用例是隐私。ZK证明的零知识基本原理允许隐藏正在验证的计算的特定部分。这种功能对于创建能够维护用户隐私并保护其个人数据的应用程序而言尤为有利,同时生成加密证明。在这方面的一些值得注意的倡议包括:Semaphore,MACI,Penumbra或Aztec Network。
ZK用于扩展排名第二。分布式系统(例如公共区块链)具有受限的计算能力,因为每个参与节点(计算机)必须自己运行每个块中的计算以验证它们。然而,通过利用ZK证明,我们可以在链外执行这些计算,生成ZK证明,然后在链上验证此证明,实现可扩展性同时保持安全性和去中心化性。值得注意的项目包括:Starknet,Scroll,Polygon, Zero,Polygon Miden,Polygon zkEVM或zkSync。
身份也引起关注,表明在身份管理领域利用ZK技术的兴趣不断增长。这包括开发证明个人身份协议以创建加密证明。在这方面的一些值得注意的倡议包括:WorldID,Sismo,Clique或Axiom
当被问及最令人兴奋的新用例时,社区的重点是ZKML,被认为是最吸引人的新用例(除了互操作性或zk桥接)。本文其余部分将重点介绍ZKML,目的是有效验证所有计算是否准确执行,这具有远远超出区块链的重大影响。
创建零知识证明需要大量计算资源,通常比原始计算要多得多。因此,由于生成它们所需的时间,存在某些计算无法使用零知识证明证明的情况。然而,密码学,硬件和分布式系统的最新进展使得能够为越来越密集的计算生成零知识证明成为可能。这些进步使得能够开发可以使用密集计算的证明的协议,扩展了可以使用零知识证明的应用程序范围。由Modulus Labs团队撰写的最近一项研究,名为“智能成本”评估了各种现有的ZK证明系统,针对各种大小的模型进行了评估。
随着人工智能技术的不断发展,越来越难以区分AI生成的内容和人类生成的内容。然而,零知识密码学可能具有解决这个问题的潜力,因为它使我们能够确定特定内容是通过将给定输入应用于特定模型生成的,而不透露有关模型或输入的任何其他信息。对于诸如GPT-4之类的大型语言模型,创建零知识电路表示法可以提供验证其输出的手段。
这些证明中固有的零知识属性使我们能够隐藏输入或模型的任何敏感部分(如果必要)。一个说明性的例子是在个人数据上使用机器学习模型,其中用户可以在不向任何外部实体(例如在医疗保健领域)披露其输入的情况下,获得其数据的模型推断结果。
ZKML仍然是一项新兴技术,许多用例尚未被探索。但是,以下是一些最明显的用例,正如Worldcoin的文章和Elena Burger的文章中所强调的那样。
有效性证明(例如SNARK和STARK)具有证明计算已正确执行的能力,这可以应用于机器学习,通过验证ML模型推断或模型生成特定输入的特定输出。轻松证明和验证输出是特定模型和输入组合的结果的能力,允许机器学习模型在链外的专用硬件上部署,而ZK证明可以在链上方便地验证。
在讨论ZKML时,通常重点是生成ML模型推断的零知识证明,而不是验证用于训练模型的数据的有效性。后者本身已经是一个高度需要计算的过程。
除了有效性证明之外,零知识密码学还可以用于保护机器学习应用程序中的隐私。一个例子是证明模型在测试数据上具有一定的准确度,而不透露所使用的权重。另一个用例的例子是隐私保护推断,其中可以使用私人患者数据进行医疗诊断,并将敏感推断(例如癌症测试结果)发送给患者,而不向任何第三方透露其数据。
在公司通过其API提供访问ML模型的情况下,用户很难知道提供者是否确实提供了他们所声称的模型,因为API本质上是一个黑匣子。与ML模型API相关的有效性证明对于向用户提供透明度非常有价值,因为他们可以验证所使用的模型。
在允许人们向公共模型提交数据的同时,在分散的方式下进行机器学习推断或训练需要在链上部署现有模型或构建新网络。零知识证明可用于压缩模型。
为了将外部验证方(例如可以产生数字签名的数字平台或硬件)的证明纳入在链上运行的智能合约中,可以使用零知识证明验证签名,并将其用作程序的输入。此方法可应用于任何数字认证信息,提供从可信来源验证真实性和来源的手段。可以验证并以这种方式使用生成数字签名的终端。
随着密码学,硬件和分布式系统的进步,使零知识证明可行于越来越密集的计算,越来越多的项目正在探索使用ZKML。下面的插图提供了当前项目的非全面概述,尽管应注意分类之间可能存在重叠,并且此演示简化了一些内容以提高清晰度。此外,有大量开源代码库可用于构建ZKML应用程序,表明社区对此感兴趣和兴奋。
![https://miro.medium.com/v2/resize:fit:1400/1*jqZqZ8uHw9tzFS4AFfggxw.jpeg](https://miro.medium.com/v2/resize:fit:1400/1*jqZqZ8uHw9tzFS4AFfggxw.jpeg)
不全面;截至2023年5月10日;类别之间的严格分界线不可能
随着 ZK 技术的不断进步,在更短的时间内在功能较弱的机器上证明更大的机器学习模型变得越来越可行。 这是由于专用硬件、证明系统架构和更高效的 ZK 协议实现方面的改进。 由于这些进步,新的 ZKML 应用程序和用例有望出现。
虽然 ZKML 在 Web3 中的主要用例是使链上组织能够运行机器学习模型,但 ZKML 的快速发展为多个领域的复杂问题提供了潜在的解决方案。 在我看来,在这种情况下可能会出现以下用例:
⁃去中心化金融(DeFi):使用 ZKML 来验证收益最大化策略或为客户重新平衡资金池。 RockyBot 就是其中一个例子
⁃游戏:使用 ZKML 来验证投注机制或 AI 增强的玩家。 这方面的一个例子是 Leela vs the World
⁃ Identity:使用ZKML对用户生物识别信息进行AI分析,同时确保数据的保管。 WorldID 就是一个例子
⁃ 医疗保健:ZKML 可用于医疗领域,通过在敏感医疗数据上运行机器学习模型同时保护隐私来进行疾病预测
尽管 ZKML 显示出巨大的潜力,但该领域仍处于早期发展阶段。 一个挑战是,在将模型转换为电路的过程中,准确性和保真度可能会受到影响。 另一个限制是许多机器学习模型的参数和激活被编码为 32 位精度,当前的零知识证明系统很难在没有大量开销的情况下以所需的算术电路格式表示。
目前,随着零知识证明不断优化以处理日益复杂的机器学习模型,ZKML 领域仍在迎头赶上。
致谢:ZKML 社区的 awesome-zkml GitHub 存储库,Worldcoin 的[文章](https://worldcoin.org/blog/engineering/intro- to-zkml) 和 Elena Burger 的 post 非常感谢 [Daniel Shorr] (https://twitter.com/realDanielShorr) 和 Diogo Almeida 围绕这个话题的对话
如果您在 ZKML 中有项目,请联系我们!