当前位置:网站首页>Heavy! Ant open source trusted privacy computing framework "argot", flexible assembly of mainstream technologies, developer friendly layered design

Heavy! Ant open source trusted privacy computing framework "argot", flexible assembly of mainstream technologies, developer friendly layered design

2022-07-06 17:51:00 CSDN information

253e443ecde7642519c6446a94d4d170.gif

7 month 4 Japan , Ant group announced the official open source trusted privacy computing framework for global developers “ Argot ”, use Apache-2.0 agreement , Code managed to GitHub、Gitee Two big platform .“ Argot ” Through good extensible architecture design , A set of general framework is used to uniformly support the following MPC、TEE、FL、HE、DP Including a variety of mainstream privacy computing technologies , It can flexibly combine various technologies , Provide different solutions for different application scenarios .

70e4adb2264935f7dbb892be841bddf2.png

Six years of technological precipitation ,“ Argot ” Break through a privacy computing application problem

2016 year ,“ Argot ” As a “ Experimental projects ” In the birth of ants , Step on the first footprint from matrix transformation technology , To trusted execution environment (TEE), Then to multi-party secure computing (MPC)、 Federal learning (FL) etc. , We have been enriching our technical connotation all the way , In Finance 、 Successful landing application experience in practical application scenarios in medical and other fields .

Although privacy computing theory has developed for more than 40 years , At the application level , There are still many industries that must be crossed obstacle

  • There are various directions of privacy computing technology , Different scenarios have their own more appropriate technical solutions ;

  • Privacy computing has a high learning curve , Users with non privacy computing backgrounds have difficulty using ;

  • Privacy computing involves many fields It requires the cooperation of experts in the field .

Privacy computing is still a relatively new interdisciplinary field at this stage , Involving cryptography 、 machine learning 、 database 、 Trusted hardware and other fields , Including multi-party secure computing (MPC)、 Federal learning (FL)、 Trusted execution environment (TEE)、 Trusted dense state computation (TECC) And other technical routes , Involving many professional technology stacks , It is not easy to achieve perfection and ensure safety .“ Argot ” The design goal of , Is to make data scientists and machine learning developers do not need to understand the underlying technical details , It is very easy to use privacy computing technology for data analysis and machine learning modeling .

that , How can we adapt to the different needs of developers at different levels ?

To achieve this goal , Argots provide a layer of device abstraction , Multi party secure computing (MPC)、 Homomorphic encryption (HE) And trusted execution environment (TEE) And other privacy computing technologies are abstracted as ciphertext devices , Abstract unilateral computing into plaintext devices .

97c9eeb7cf048378ee31b3f90dfdbf95.png

Based on this level of abstraction , Data analysis and machine learning workflow can be represented as a calculation diagram , Where the node represents the calculation on a device , Edges represent data flow between devices , The data flow between different types of devices will automatically carry out protocol conversion . At this point , Argot draws on the mainstream deep learning framework , The latter represents the neural network as a calculation diagram composed of operators on devices and tensor flows between devices .

The above process corresponds to disassembly to frame layering ,“ Argot ” From bottom to top , The following design and research have been carried out :

52d8dd197e45a9713667f0e81ea86596.png

Resource Management : It mainly undertakes two responsibilities . The first is for the business delivery team , It can shield the differences in the underlying infrastructure of different institutions , Reduce the deployment, operation and maintenance cost of the business delivery team . On the other hand , Through the unified scheduling and management of resources of different institutions , It solves the problems of large-scale and high availability in production scenarios .

Ming ciphertext computing equipment and primitive layer : Provides a unified programmable device abstraction , Multi party secure computing (MPC)、 Homomorphic encryption (HE)、 Trusted hardware (TEE) And other privacy computing technologies are abstracted as dense devices , Abstract unilateral local computing into plaintext devices . meanwhile , It provides some basic algorithms that are not suitable for device abstraction , Such as differential privacy (DP)、 Secure aggregation (Secure Aggregation) etc. . In the future, when new dense state computing technologies appear , This loosely coupled design can be integrated into the privacy framework .

Ming ciphertext Mixed Scheduling layer : On the one hand, this layer provides the upper layer with an interface for mixed programming of Ming and ciphertext , It also provides a unified device scheduling abstraction . By describing the upper algorithm as a directed acyclic graph , Where the node represents the calculation on a device , Edges represent data flow between devices , Logic calculation diagram . Then the distributed framework further splits the logical calculation diagram and schedules it to physical nodes .

AI & BI Privacy algorithm layer : The purpose of this layer is to shield the technical details of privacy Computing , But preserve the essence of privacy Computing , The purpose is to reduce the development threshold of privacy computing algorithm , Improve development efficiency . Developers who have privacy computing algorithm development demands , According to their own scenarios and business characteristics , Design some specialized privacy computing algorithms , To meet their own business and scenario security 、 Balance between computational performance and computational accuracy . On this level , Argot itself will also provide some general algorithmic capabilities , such as MPC Of LR/XGB/NN, Federated learning algorithm ,SQL Ability, etc .

User interface layer : The goal of argot is not to make an end-to-end product , But to enable different businesses to have comprehensive privacy computing capabilities through rapid integration of argots . Therefore, argot will provide a thin layer of products at the top API, And some atomized front and rear ends SDK, To reduce the cost of business integration argot .

e78bee8d66bbbdd664c02a5cd84ca40a.png

Taking openness as the core “ Argot ” We are committed to making the developer experience the best

Summarize the structural layering of argot , It can be seen that the argot framework always revolves around The core idea of openness , Through different levels of design abstraction , It can provide good development experience for different types of developers :

Good equipment interface and protocol interface in the equipment layer , Support plug-in access of more devices and protocols , Pair cryptography 、 Trusted hardware 、 Developers with hardware acceleration and other backgrounds are friendly , It is conducive to expanding the types and functions of dense state Computing , Continuously improve the security and computing performance of the Protocol .

The algorithm layer provides a flexible programming interface for machine learning , Friendly to algorithm developers , They can define their own algorithms in the same way as using traditional machine learning frameworks .

So in The first open source version in , Argot has opened those modules ? What functions are supported ?

0b9bad5950399e2602c4bd2cea0e5181.png chart : Argot frame V0.6 Open source module

  • MPC equipment

Support most Numpy API, Support automatic derivation , Provide LR and NN dependent demo, Support pade High precision fixed-point number fitting algorithm , Support ABY3、 Cheetah agreement . Users can use the traditional algorithm programming mode , I don't know MPC Protocol based development MPC Agreed AI Algorithm

  • HE equipment

Support Paillier Homomorphic encryption algorithm , Offer to the top Numpy Programming interface (API) , Users can use Numpy The interface performs matrix addition or ciphertext matrix multiplication . And realize the connection with MPC Data can be transferred between dense devices .

  • Differential privacy security primitive

Some differential privacy noise mechanisms are implemented 、 Safety noise generator 、 Privacy cost calculator .

  • Ming ciphertext mixed programming

Support centralized programming mode , Use @device Mark up the mixed computing diagram of plaintext and ciphertext devices , Parallel based on computational graph 、 Asynchronous task scheduling .

  • Data preprocessing

Provide data standardization in horizontal scenarios 、 discretization 、 Sub box function , Provide correlation coefficient matrix in vertical scene 、WOE Sub box function . Seamlessly connect existing dataframe, Provide and sklearn Consistent use of body feel .

  • AI & BI Privacy algorithms - Multiparty secure computing

Provide XGBoost Algorithm 、 Add HESS-LR Algorithm , Combined with differential privacy, the privacy protection of split learning is enhanced .

  • AI & BI Privacy algorithms - Federal learning

Provide federal learning model construction and include SecureAggregation,MPC Aggregation, Gradient aggregation of multiple security modes including , Users only need to give the participants when building the model list And polymerization methods , Subsequent data reading , The experience from preprocessing to model training is almost the same as that of traditional plaintext programming .

In short , Mainly as follows :

For algorithms / Model development : The programming ability provided by using argots , It can easily and quickly migrate more algorithms and models , And enhanced privacy protection .

For the bottom Security Co Construction : The underlying password can be / Security research results are embedded in the argot , Improve the capability of dense equipment 、 Performance and safety , Transform actual business applications .

According to the release conference of Argyle open source ,“ Argot ” It will also be updated in the subsequent open source version , Gradually light up more modules .

b66d81e3e1e62fe065b296fb26425685.png

Go to developers , Penetrate technical barriers and practice “ A unique skill ”

Return to this Practical problems , There are many privacy computing frameworks on the market , such as TFE,CrypTen,MP-SPDZ etc. , Because the existing is based on AI Framework (TFE/CrypTen), It is also a framework starting from secure computing (SPDZ), There are certain limitations . The former is often difficult to deploy , It is difficult to make specific optimization in the security field . The latter often needs to write something Toy AI frame , High learning cost .

stay “ Argot ” A whole set of precipitated “ A unique skill ” in , Dense computing equipment SPU It is one of the highlights of innovative research and development .

SPU yes Secretflow Processing Unit For short , She acts as a cryptic computing unit of the argot platform , Provide secure computing services for argots :

efa60df0ab582da281b319225585cff9.png

In recent years, , Dense state calculation (MPC/HE) Great progress has been made in computing power , But dense computing power and AI The algorithm requirements of are still difficult to match . For example, federal learning , Implement a sub step of the algorithm with secure computing , Sacrifice local security for higher performance . When the computational power cannot match the algorithm ,“ Argot ” The idea is “ Ming ciphertext mixed ”, To achieve a balance between safety and performance .

Argot provides a very free Ming ciphertext hybrid programming paradigm , We do not restrict the plaintext engine , Nor does it restrict the ciphertext engine , Developers can use their familiar framework to develop , Then mark some part of it and run it with the plaintext engine , The other part uses SPU run . such as :

d4e49337bea9734a55e5a3f61acc2bf0.png【 notes 】 In the figure MPC Device Namely SPU Realized

As a contrast , From the perspective of safety and performance , No matter what TFE/CrypTen/SPDZ It's hard to make such a balance .

Besides ,SPU The deployment mode of is transparent , You don't have to change any line of code , The existing models can be safely and correctly implemented in any of the above deployment scenarios . also ( As opposed to based on AI Privacy computing framework of the platform )SPU The runtime is very lightweight , Unwanted Python runtime, It can be easily deployed and integrated .

As AI developer , No security background is required , The existing model can be safely applied to multi-party data .

As a security developer , No need for any AI background , Only the basic operators of secure computing , Can support a variety of front-end frameworks . also , You can easily deploy and operate , Compromise between safety and performance , Find the best landing plan .

SPU take AI Front end and MPC Back end decoupling , Make in SPU Any security protocol extended in can support a variety of front ends without feeling . This part , There are already teams “ Argot ” Some achievements have been made in the framework Build and realize , For example, Alibaba security Gemini laboratory will Cheetah( Cheetah ) The agreement is partly contributed to the argot , And better optimization .

Another bright spot is : At present, the fastest two-party secure computing protocol in the industry “ Cheetah ”, Contributed to the argot , Realize deep collaboration .

At present, the privacy computing demand scenario in the industry is dominated by two-party Computing :Alice( Data demanders ) With the help of Bob( data source ) Data to enhance their business capabilities , however Bob I don't want to give my own data directly . So how to efficiently implement secure two-party Computing (2PC), It has become the key to solve this problem . To solve this problem, Alibaba security Gemini laboratory has developed Cheetah( Cheetah ) Secure two-party computing framework , stay 2PC Breakthroughs have been made in many underlying bottlenecks , The overall performance of both parties' computing has been greatly improved , It can be faster than the best results before - Microsoft CryptFLOW2(CCS20) promote 5 More than times , Has been one of the four major international security conferences USENIX Security Symposium receive .

In addition to the public content of some papers , The cheetah is already “ Argot ” Implemented in the Better optimization ( Compared with public code support 30-40 Bit's secret sharing , What cheetah realizes in argot is to support such as 64 Bit's greater secret sharing ) And some algorithms not disclosed in the paper . The most important thing is that this implementation has no perception of the upper business logic of the argot , That is, the logic code of the argot already exists and does not need to be changed to adapt .

644d305186ff83ecfd0c2b7e3778b424.png

“ Argot ” Future planning of the open source community

“ Argot ” Logical device abstraction provides great flexibility for algorithm developers , They can freely combine these devices like building blocks , Customize the calculation on the device , So as to build their own privacy computing algorithm . at present ,“ Argot ”  Open source adoption Apache-2.0, Allow free download and use , Not only will more modules and functions be gradually opened to developers in the code base , Some have also been provided in the developer documentation Example of privacy protection algorithm development , Such as image classification tasks based on Federated learning , For developers to download, run and feel the effect .

In addition to focusing on technology itself , Programmability in the framework 、 Scalability is enhanced .“ Argot ” The open source community has also been officially established , Around the open source community , Ant group and argot will also cooperate with developers in many aspects 、 Researchers jointly build a privacy computing ecosystem :

One is to use words through various channels 、 Videos and other diverse content , Popularize the technology of privacy Computing , Enhance communication with developers through open communication discussion ;

The second is the linkage of scientific research institutions in Colleges and universities “ Online teaching ”, Form the combination of industrial perspective and Teaching Perspective , Create more diverse communication activities for developers , Precipitate systematic privacy computing learning materials , Share publicly , Help developers grow ;

besides , At the Argyle open source press conference , Ant group announced a joint venture with the Chinese computer society ( abbreviation CCF) To set up “CCF- Special scientific research fund for ant privacy Computing ”, Incubation support for privacy computing researchers , Open recruitment 、 Selection 、 Support the in-depth development of innovative and valuable topics , Support privacy computing frontier research .

Access content instantly , Explore more interesting uses based on argots :

Code :

https://github.com/secretflow

https://gitee.com/secretflow

file :

SecretFlow:https://secretflow.readthedocs.io

SPU:https://spu.readthedocs.io

原网站

版权声明
本文为[CSDN information]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060941216727.html