当前位置:网站首页>Ant group open source trusted privacy computing framework "argot": open and universal
Ant group open source trusted privacy computing framework "argot": open and universal
2022-07-05 18:40:00 【Zhiyuan community】
The data circulation industry has entered a dense era , The trusted privacy computing framework can meet the different needs of various scenarios .
7 month 4 Japan , Ant group announced the official open source trusted privacy computing framework for global developers “ Argot ”.
The argot is that ant group lasted 6 Independent research and development in , To be safe 、 Open as the core design concept to build a trusted privacy computing technology framework , It covers almost all current mainstream privacy computing technologies .
According to introducing , Built in argot MPC、TEE、 Homomorphism and other dense computing virtual devices , Provide multi class federated learning algorithm and differential privacy mechanism . Protect data analysis through layered design and out of the box privacy 、 Machine learning and other functions , Effectively reduce the technical threshold of developers' applications , It can help privacy computing apply to AI、 Data analysis and other fields , Solve the pain points of privacy protection, data islands and other industries .
After ant group's large-scale business and external finance 、 Successful application of medical scenes , Argot gives consideration to both safety and performance . In the press conference , Ant group introduced many characteristics of argot .
What kind of privacy computing open source framework we need ?
Privacy computing is a new interdisciplinary technology field , Involving cryptography 、 machine learning 、 Hardware 、BI Analysis etc. , Including multi-party secure computing (MPC)、 Federal learning (FL)、 Trusted execution environment (TEE)、 Trusted dense state computation (TECC)、 Homomorphic encryption 、 Differential privacy and other technical routes , Involving many professional technology stacks .
As a key technology to give consideration to data security and data circulation , Privacy computing can ensure that the data provider does not disclose the original data , Analyze and calculate the data , Realize the integration of data in the process of circulation and integration “ Available not visible ”“ It's not recognizable ”.
According to the practical experience of the past few years, the industry has found , There are various directions of privacy computing technology , Different scenarios have their own appropriate technical solutions , And it involves many fields , It needs the cooperation of experts in many fields . For practitioners , Privacy computing has a high learning curve , Users with non privacy computing backgrounds have difficulty using .
In actual technology development , Privacy computing solutions are often a combination of multiple technical routes , The process involves a lot of repetitive work . such as , If developers want to use federated learning , Then use A Framework to do research and development ; If you want to use multi-party secure computing (MPC), Then use B Framework to do research and development , If you want to use trusted hardware , You need to be familiar with the architecture of the selected hardware to really start using . But the real business needs are , It often requires multiple technologies to be used together , Then there will be tedious 、 Repetitive development work . This is a technological innovation , But it brings technology “ The chimney ” Trouble .
More deadly , In the solution of cross technology route , The introduction of an underlying new technology , It will affect all the work of the upper level , Drag down technical iterations . Introduce a new technology , It will certainly change many things on the top , For users , All deployments may have to be experienced again , Feel very bad .
The current open source privacy computing framework , Such as TensorFlow Federated(TFF)、FATE、FederatedScope、Rosetta、FedLearner、Primihub Almost all of them are for a single privacy computing route . These frameworks provide some support for community research and industrial applications related to privacy Computing . However , Increasingly diverse application requirements in actual scenarios , And the limitations of technology itself , It brings new challenges to the existing privacy computing framework .
for example , First proposed “ Federal learning ” Technology giant Google , It's also TensorFlow The maker of , Recently, we have increased our support for a new platform JAX Investment , This move caused speculation in the industry :TensorFlow Will gradually be replaced .
Google's response to this is :
In recent years , We found that , A single common framework is often not applicable to all scenarios —— In particular, the needs of production and cutting-edge research often conflict .
Solve the problem of privacy computing open source framework
The argot of ant group echoes the current situation of the industry , It opens a way to the generalization of privacy Computing .
The head of the argot framework 、 Wang Lei, general manager of privacy intelligent computing Department of ant group, said , Ants from 2016 Started doing argot in , Purely technology driven forward-looking layout , It is an experiment incubated within a company .
The evolution of argot technology begins with matrix transformation , To trusted execution environment (TEE), Then to multi-party secure computing 、 Federal learning, etc , Through internal and external application scenarios , In terms of performance, it has been able to support large-scale data sets . In Finance 、 There are also successful large-scale landing experience in medical and other fields 、 Support the inter agency data flow of Shanghai Pudong Development Bank 、 Medical insurance of a third-class hospital in Zhejiang DRG(Diagnosis Related Group, Disease diagnosis related grouping ) reform , It has been awarded by the China Academy of communications “ Xinghe case ” prize ,CCF Science and technology award, science and technology progress Excellence Award 、 China Cyberspace Security Association “ Typical practice cases of data security ”, Selected by the Ministry of industry and information technology 2021 List of pilot demonstration projects for big data industry development in .
6 Years of technology accumulation , After forming a comprehensive technical system and mature landing experience , Officially open source argot , What are the advantages ?
The design goal of argot is to make it very easy for data scientists and machine learning developers to use privacy computing technology for data analysis and machine learning modeling , Without knowing the underlying technical details . Its overall architecture is divided into five layers from bottom to top :
The bottom layer is the resource management layer . It mainly undertakes two responsibilities . The first is for the business delivery team , It can shield the differences in the underlying infrastructure of different institutions , Reduce the deployment, operation and maintenance cost of the business delivery team . On the other hand , Through the unified management of resources of different institutions , Solve the problems of high availability and stability after business scale .
Above is the Ming ciphertext computing device and primitive layer . Provides a unified programmable device abstraction , Multi party secure computing (MPC)、 Homomorphic encryption (HE)、 Trusted hardware (TEE) And other privacy computing technologies are abstracted as dense devices , Abstract unilateral local computing into plaintext devices . meanwhile , It provides some basic algorithms that are not suitable for device abstraction , Such as differential privacy (DP)、 Secure aggregation (Secure Aggregation) etc. . In the future, when new dense state computing technologies appear , This loosely coupled design can be integrated into the privacy framework .
Continuing up is the Ming ciphertext hybrid scheduling layer . On the one hand, this layer provides the upper layer with an interface for mixed programming of Ming and ciphertext , It also provides a unified device scheduling abstraction . By describing the upper algorithm as a directed acyclic graph , Where the node represents the calculation on a device , Edges represent data flow between devices , Logic calculation diagram . Then the distributed framework further splits the logical calculation diagram and schedules it to physical nodes . At this point , Argot draws on the mainstream deep learning framework , The latter represents the neural network as a calculation diagram composed of operators on devices and tensor flows between devices .
Follow up is AI & BI Privacy algorithm layer . The purpose of this layer is to shield the details of privacy computing technology , But keep the concept of privacy Computing , Its purpose is to reduce the development threshold of privacy computing algorithm , Improve development efficiency . Students with privacy computing algorithm development demands , According to their own scenarios and business characteristics , Design some specialized privacy computing algorithms , To meet their own business and scenario security 、 Balance between computational performance and computational accuracy . On this level , Argot itself will also provide some general algorithmic capabilities , such as MPC Of LR/XGB/NN, Federated learning algorithm ,SQL Ability, etc .
The top layer is the user interface layer : The goal of argot is not to make an end-to-end product , But to enable different businesses to have comprehensive privacy computing capabilities through rapid integration of argots . Therefore, argot will provide a thin layer of products at the top API, And some atomized front and rear ends SDK, To reduce the cost of business integration argot .
Integrate the current mainstream privacy computing technologies and provide flexible assembly to meet the needs of scenarios , Is the most intuitive advantage of argot presentation . The bottom line is this , Under this framework , Developers have a variety of choices , Do experiments in their field through argot 、 Do iteration , Can lower the cost 、 Do technical verification more quickly . At the same time, the verified technology can also be used by other developers in other technical directions . Wang Lei thinks , Argot is more like a developer's platform , It is to gather these developers with different specialties , It is in line with the spirit of open source .
Take it apart in detail , The highlight of the first open source version of this argot , As shown in the figure, the lighting module .
- MPC equipment . Support most Numpy API, Support automatic derivation , Provide LR and NN dependent demo, Support pade High precision fixed-point number fitting algorithm , Support ABY3、 Cheetah agreement . Users can use the traditional algorithm programming mode , I don't know MPC Protocol based development MPC Agreed AI Algorithm ;
- HE equipment . Support Paillier Homomorphic encryption algorithm , Offer to the top Numpy Programming interface (API) , Users can use Numpy The interface performs matrix addition or ciphertext matrix multiplication . And realize the connection with MPC Data can be transferred between dense devices ;
- Differential privacy security primitive . Some differential privacy noise mechanisms are implemented 、 Safety noise generator 、 Privacy cost calculator ;
- Ming ciphertext mixed programming . Support centralized programming mode , Use @device Mark up the mixed computing diagram of plaintext and ciphertext devices , Parallel based on computational graph 、 Asynchronous task scheduling ;
- Data preprocessing . Provide data standardization in horizontal scenarios 、 discretization 、 Sub box function , Provide correlation coefficient matrix in vertical scene 、WOE Sub box function . Seamlessly connect existing dataframe, Provide and sklearn Consistent use of body feel ;
- AI & BI Privacy algorithms - Multiparty secure computing . Provide XGBoost Algorithm 、 Add HESS-LR Algorithm , Combined with differential privacy, the privacy protection of split learning is enhanced ;
- AI & BI Privacy algorithms - Federal learning . Provide federal learning model construction and include SecureAggregation,MPC Aggregation, PlaintextAggregation Gradient aggregation of multiple security modes including , Users only need to give the participants when building the model list And polymerization methods , Subsequent data reading , The experience from preprocessing to model training is almost the same as that of traditional plaintext programming .
In short , Mainly as follows :
- For algorithms / Model development : The programming ability provided by using argots , It can easily and quickly migrate more algorithms and models , And enhanced privacy protection .
- For the bottom Security Co Construction : The underlying password can be / Security research results are embedded in the argot , Improve the capability of dense equipment 、 Performance and safety , Transform actual business applications .
- The argot will also be updated in the subsequent open source version , Gradually light up more modules .
边栏推荐
- node_exporter内存使用率不显示
- Introduction to Resampling
- 爬虫01-爬虫基本原理讲解
- Common time complexity
- Thoroughly understand why network i/o is blocked?
- LeetCode 6109. Number of people who know the secret
- Exemple Quelle est la relation entre le taux d'échantillonnage, l'échantillon et la durée?
- buuctf-pwn write-ups (9)
- 生词生词生词生词[2]
- The 11th China cloud computing standards and Applications Conference | cloud computing national standards and white paper series release, and Huayun data fully participated in the preparation
猜你喜欢
SAP 特征 特性 说明
Isprs2022/ cloud detection: cloud detection with boundary nets
彻底理解为什么网络 I/O 会被阻塞?
How to write good code defensive programming
深入底层C源码讲透Redis核心设计原理
About Estimation with Cross-Validation
《2022中国信创生态市场研究及选型评估报告》发布 华云数据入选信创IT基础设施主流厂商!
《ClickHouse原理解析与应用实践》读书笔记(5)
Whether to take a duplicate subset with duplicate elements [how to take a subset? How to remove duplicates?]
Reptile 01 basic principles of reptile
随机推荐
【在優麒麟上使用Electron開發桌面應】
IDEA配置npm启动
Let more young people from Hong Kong and Macao know about Nansha's characteristic cultural and creative products! "Nansha kylin" officially appeared
ClickHouse(03)ClickHouse怎么安装和部署
Electron installation problems
How to obtain the coordinates of the aircraft passing through both ends of the radar
ConvMAE(2022-05)
The 11th China cloud computing standards and Applications Conference | cloud computing national standards and white paper series release, and Huayun data fully participated in the preparation
Personal understanding of convolutional neural network
The 10th global Cloud Computing Conference | Huayun data won the "special contribution award for the 10th anniversary of 2013-2022"
JDBC reads a large amount of data, resulting in memory overflow
技术分享 | 接口测试价值与体系
Insufficient picture data? I made a free image enhancement software
Find in MySQL_ in_ Detailed explanation of set() function usage
AI金榜题名时,MLPerf榜单的份量究竟有多重?
AI Open2022|基于异质信息网络的推荐系统综述:概念,方法,应用与资源
7-2 keep the linked list in order
node_ Exporter memory usage is not displayed
【Autosar 十四 启动流程详解】
Is it safe for golden sun to open an account? Can I open an account free of 5 in case?