ZK Machine Learning

A tutorial and demo by Horace Pan, Francis Ho, and Henri Palacci

Introduction

Smart contracts on the Ethereum blockchain extend the range of what is possible to define in code. However, the constraints of blockchain computation and the public nature of operations on the blockchain impede the expansion to compute heavy applications on private/sensitive data, such as machine learning.

In this tutorial post, together with the associated repo and demo webapp, we explore how zero knowledge proofs can help lift these barriers by performing computations off-chain and providing a proof that this computation was correctly executed, while shielding private data. The proof can then be verified on-chain for a much smaller computational cost, enabling us to implement on-chain, private machine learning.

For this demo, we focused on the implementation of a simple computer vision deep learning convolutional neural network for handwritten digit recognition (MNIST).

对于这个演示，我们专注于实现用于手写数字识别 (MNIST) 的简单计算机视觉深度学习卷积神经网络。

Demo

To check out the demo, please follow the instructions in this github repo, or play with the webapp demo at: **https://zkmnist.netlify.app/**

The webapp allows the user to "draw" a digit or to select an image of a digit from examples taken from the MNIST dataset. This handwritten digit can then be classified by the neural network, outputting a predicted digit as well as a zero-knowledge proof that you have an input image that is classified by the ML model to yield a specific digit. Finally, this proof can be verified on-chain via a smart contract.

该网络应用程序允许用户“绘制”一个数字或从 MNIST 数据集的示例中选择一个数字的图像。然后，这个手写数字可以通过神经网络进行分类，输出预测数字以及零知识证明，证明您有一个输入图像，该输入图像由 ML 模型分类以产生特定数字。最后，这个证明可以通过智能合约在链上进行验证。

Background

This blog post is written for developers with some understanding of machine learning, but not necessarily any knowledge of blockchain technology.

For learning resources related to smart contracts, see the official Ethereum documentation. For tutorials related to applied ZK and Circom, see the learning resources from 0xPARC.

Why ML + zkSNARK?

In a typical supervised machine learning scenario, an input is provided to a ML model (comprised of model parameters that have been trained), and this results in an output that can be used by other entities downstream. With lightweight machine learning frameworks and interoperable formats such as ONNX, we can now perform ML inference on edge devices such as mobile phones or IOT devices without sending the (potentially sensitive) input to centralized servers. This improves scalability and privacy. However:

There is often a desire to hide the input and/or the model paramaters from public view.
- The input to the ML model may be sensitive and private (e.g., personal financial info, personal biometric info, private video/audio/image data).
- The model parameters may contain sensitive, private information (e.g., biometric authentication parameters).
At the same time, the downstream entities that make use of the ML model's output (e.g., on-chain smart contracts) need to be certain that the input was correctly processed by the ML model to yield the claimed output of the ML model.

ML + zkSNARK protocols can enable a new approach that satisfies these seemingly contradictory demands.

在典型的监督机器学习场景中，向 ML 模型（由已训练的模型参数组成）提供输入，这会产生可供下游其他实体使用的*输出。*借助轻量级机器学习框架和 ONNX 等可互操作格式，我们现在可以在手机或物联网设备等边缘设备上执行 ML 推理，而无需将（可能敏感的）输入发送到集中式服务器。这提高了可扩展性和隐私性。然而：

通常希望隐藏输入和/或模型参数以使公众看到。
- ML 模型的输入可能是敏感和私密的（例如，个人财务信息、个人生物特征信息、私人视频/音频/图像数据）。
- 模型参数可能包含敏感的私人信息（例如，生物特征认证参数）。