当前位置:网站首页>Netcf summary
Netcf summary
2022-06-23 19:07:00 【Bachuan Xiaoxiaosheng】
NetFC: Enable precise floating-point operations on programmable switches
NetFC: Enabling Accurate Floating-point Arithmetic on Programmable Switches
NetFC Importance
In modern data centers , Many data intensive applications ( Such as big data analysis 、 Distributed deep learning 、 Picture processing 、 Real time streaming . Due to frequent data exchange ) Performance may be degraded due to a large amount of network communication overhead . Reducing network communication has become a key factor in accelerating data intensive applications . And the network has been able to provide computing power . therefore , Some computing tasks traditionally performed on the host side can be transferred to network devices . In the process , Network traffic can be intercepted and processed by network devices in real time before reaching the host . Intra network computing The attraction of is :1、 Packets can be consumed and processed during data transmission , This greatly reduces the overhead of the network ( Such as network queuing delay and IO expenses ) 2、 Transfer the computing load to the network , Can reduce the server CPU burden ( Such as gradient aggregation in the network , Network telemetry system ).
Challenge
But the computing power of the network is very limited , Even the most advanced programmable switches only support simple integer arithmetic operations ( Such as addition and subtraction ).
The traditional floating-point operation method cannot be directly deployed on the programmable switch , because
- Limited computing power : Programmable switches only support some simple integer algorithms . in other words , Floating point numbers and multiplication 、 Arithmetic operations such as division have exceeded the capacity of the switch
- Scarce on-chip memory : The on-chip memory of the switch is very small , So it's impossible to provide huge memory for floating-point operations . Please note that , A portion of memory must be reserved for forwarding rule storage and lookup , This further exacerbates the problem .
- Limited pipeline stage : The exchange data plane usually consists of multiple stages , Each stage is a packet processing unit with certain computing and storage resources . However , The number of stages is small , Any two dependent packet processing operations cannot be assigned to the same stage .
This has become an obstacle to the acceleration of applications in the network , Because many applications usually need to deal with complex floating-point data and arithmetic operations ( Such as multiplication and division ). Previous studies mainly used two different ways to indirectly support floating-point operation to overcome this obstacle . One is to convert floating-point numbers to integers according to the complex negotiation mechanism on the server side , Floating point multiplication and division are not supported . The other is to unload the computing task to the local server of the switch CPU, But it introduces significant delays . At present, there is a lack of a scheme that can realize real-time floating-point arithmetic operation in the network with almost no loss of accuracy on the programmable switch .
May adopt Look-up table method To support floating-point operations on programmable switches . Intuitively speaking , A simple and direct way is to use a table to list all possible calculations . For an arithmetic operation , You can use its two operands as keys to look up the table , The corresponding value is the result .
However, the generated table is too large to be installed on a programmable switch , Because it needs to traverse all operands and enumerate their various combinations ( For the two 16 Bit floating point operand , About need 8GB Memory ).
programme
To solve the problem that the table is too large ,NetFC Adopted Divide and conquer method .

say concretely , It uses logarithmic projection and transformation to convert the original large table into several much smaller tables , These tables use built-in integer operations ( That is, addition and subtraction ) To operate .
NetFC Further adopted Scaling factor mechanism To improve the calculation accuracy . because NetFC Use ⌊ l o g 2 ( x ) ⌋ \lfloor log2(x)\rfloor ⌊log2(x)⌋ To approximate l o g 2 ( x ) log2(x) log2(x), This inevitably leads to a loss of accuracy , because l o g 2 ( x ) log2(x) log2(x) The decimal part of is ignored . To solve this problem ,NetFC Use a scale factor k k k And l o g 2 ( x ) log2(x) log2(x) Multiply , To enlarge its decimal part and avoid being ignored .NetFC The scale factor is also divided into subsequent steps , To ensure the correctness of floating-point operation .
And use Prefix based lossless compression Method to reduce the use of on-chip memory . say concretely , about NetFC One of the watches in , There may be many consecutive table entries with the same value , So their corresponding keys can be merged .

Open questions
Multiple floating point operations
NetFC Multiple floating-point operations can be supported by sequentially deploying lookup tables of different calculation types . for example , We can deploy addition and multiplication lookup tables sequentially , In order to realize, the first is addition , Then there is the operation of multiplication . Of course , There will be more stages . in other words ,NetFC The number of floating-point operations that can be supported for each packet depends on the available stages of the data plane . Besides , We can make further use of Barefoot Tofino The recirculation operation provided by the switch , Change the order of different floating-point operations .
32 Bit floating point operations
Due to the limitation of on-chip memory , at present NetFC The implementation of is not supported 32 Bit floating point . Theoretically, an approximate method based on Taylor series can be used to reduce memory consumption and support 32 Bit floating point operations . We'll save it for later work .
opinion
Intra network computing is an emerging trend to reduce network overhead by transferring some tasks to programmable switches . However , It is limited by the limited computing power of programmable switches ( For example, floating point operations ). To solve this problem , Designed NetFC, A table lookup method , In order to realize the dynamic floating-point operation with little loss of precision in the network .NetFC Prefix based lossless compressor system is adopted to reduce memory consumption . Experimental results show that ,NetFC The average accuracy of exceeds 99.94%, The memory consumption is only 448KB. Besides , The author will NetFC Integrated into the Sonata Medium test Slowloris attack , The detection delay is significantly reduced .NetFC It is expected to become the cornerstone of Network Computing .
边栏推荐
- [comparative learning] koa JS, gin and asp Net core - Middleware
- IOT platform construction equipment, with source code
- Ges graph computing engine hyg unveils the secrets of Graph Segmentation
- sed replace \tPrintf to \t//Printf
- 高级计网笔记(八)
- 三一重能科创板上市:年营收102亿 市值470亿
- [one by one series] identityserver4 (IV) authorization code process
- sed replace \tPrintf to \t//Printf
- Database migration tool flyway vs liquibase (I)
- 【One by One系列】IdentityServer4(七)授权码流程原理之MVC
猜你喜欢

函数的定义和函数的参数

杰理之串口通信 串口接收 IO 需要设置数字功能【篇】
![Develop small programs and official account from zero [phase II]](/img/09/8be73fdadc7b1f0fa1ee4db8267094.jpg)
Develop small programs and official account from zero [phase II]

Dataease template market officially released

今年,安徽母基金大爆发

Sany Heavy energy technology innovation board listed: annual revenue of RMB 10.2 billion and market value of RMB 47 billion

元宇宙大杀器来了!小扎祭出4款VR头显,挑战视觉图灵测试

Définition de la fonction et paramètres de la fonction

物流服务与管理主要学什么

Shengke communication IPO meeting: annual revenue of 460million China Zhenhua and industry fund are shareholders
随机推荐
CV convolution neural network
Obtain equipment information
Jerry's seamless looping [chapter]
CV image classification
杰理之添加定时器中断【篇】
Use of stream streams
杰理之DAC 输出方式设置【篇】
在Microsoft Exchange Server 2007中安装SSL证书的教程
Machine learning jobs
Jerry's adding timer interrupt [chapter]
【One by One系列】IdentityServer4(六)授权码流程原理之SPA
CV-卷积神经网络
#20Set介绍与API
【One by One系列】IdentityServer4(三)使用用户名和密码
重磅:国产IDE发布,由阿里研发,完全开源!(高性能+高定制性)
【NOI2014】15.起床困难综合症【二进制】
微机原理第六章笔记整理
Develop small programs and official account from zero [phase II]
Jerry's serial port communication serial port receiving IO needs to set digital function [chapter]
Jerry's SD card will reset after he enters soft off [chapter]