当前位置:网站首页>Upgrading of computing power under the coordination of software and hardware, redefining productivity
Upgrading of computing power under the coordination of software and hardware, redefining productivity
2022-07-28 11:47:00 【Baidu AI Cloud】

7 month 12 Japan , Baidu AI Cloud series dialogue column 《 Yunzhi face to face 》 The second phase was officially launched . This issue is to “AI Native cloud computing power dual engines , Redefine enterprise productivity ” The theme of , It is also a joint special session of Baidu smart cloud and Intel . Baidu intelligent cloud IaaS Product leader Zhou Lei 、 Baidu intelligent cloud IaaS Wang Peilong, the person in charge of the network 、 Wang Yanpeng, an outstanding system architect of Baidu infrastructure department, and Zhang Ran, senior technical manager of Intel's programmable solutions business department, and other guests gathered here , Discuss the frontier of computing development and how to empower enterprises .

AI Calculation has great potential , Software and hardware cooperate to deal with
Looking back over the past ten years , Computing has always been the core driving force for technological development . According to the report released by the Academy of communications and communications last year 《 White paper on China's computing power development index 》,“ Computing power is productivity , Algorithm is production relation , Data is the means of production ”, Calculate the force 、 Algorithm 、 Data constitutes the basis of production in the era of digital economy . Computing power brings about the continuous improvement of data processing ability , The development of network also makes progress . therefore , It can be said that computing and network are the twin engines of computing power development .
The development of computing makes more new scenarios and new formats possible , And get large-scale promotion . When Zhou Lei mentioned Baidu Baige , Speaking again about the three new trends of Cloud Computing :
Computational isomerization , Support heterogeneous computing power , Including support CPU、GPU And customized chips in various fields ;
Distributed computing deployment location , A lot of computing is distributed in large data centers 、 Various edge ends ;
AI Applications are becoming more and more universal , Already in Finance 、 Smart city 、 video 、 Widely used in agriculture and other fields .
Due to the expansion of server cluster size and the jump of access bandwidth , Network infrastructure is also facing increasing challenges . Wangpeilong put forward his own views on this , The development of network element facilities has experienced “ Hardware to software , Back to hardware ” History , As Moore's law lapses , The industry once again focuses on hardware . Currently, Baidu AI Cloud “ We are committed to developing network infrastructure combining software and hardware , adopt DPU And programmable hardware gateway , take CPU The virtualization overhead on is unloaded onto the hardware , Provide greater bandwidth access capability and lower network delay for computing and storage ”.
Behind the improvement of computing power , It is also inseparable from the support of the technology base . at present ,AI The demand for computing power is several orders of magnitude higher than before , Even in the future, I will improve 1000 times . In this context , Wang Yanpeng emphasized ,“ Computer architecture from general computing CPU Time , To parallel GPU Time , positive DSA Time (DSA, That is, domain specific architecture ) evolution .” only CPU、GPU Can't meet such a large demand for computing power , There will be more cars PU、AI PU、 video PU, Even if CPU There will be ARM、RISC-V And so on , Therefore support “ One cloud, many cores ” highly necessary .
meanwhile , The digital wave continues to accelerate , Industrial transformation constantly brings amazing opportunities . Zhang Ran thinks , To take the lead in this change largely depends on “ Innovation of technical architecture ”. Intel “IPU”(IPU, Infrastructure processing unit ) It is the architectural new products born from this challenge , By offloading all the costs related to infrastructure tasks from the server to the infrastructure processing unit , In order to optimize the overall performance . Besides , Intel's view on the future data center is highly consistent with Baidu , It is to provide users with higher security , To release more computing power , Increase bandwidth , Reduce delay .
“ Baige ” Fight for flow , Cloud around “ Taihang ”
In the face of the explosion of computing power demand ,AI Market changes such as the gradual popularization of applications , Baidu AI Cloud launched Baidu Baige AI Heterogeneous computing platforms , Provide industry leading AI Native cloud computing services . Baidu Baige you AI Calculation 、AI Storage 、AI Speed up 、AI The container consists of four parts , High performance 、 High elasticity 、 High speed interconnection and other capabilities . among AI Storage is based on aoten Technology , Realize large training set training 4 Speed up . stay AI Calculation part , This year, the calculation example has been comprehensively upgraded for RDMA Support capability of high-speed network , Newly released RDMA Network enhanced instances can support mount flexibility RDMA network card , It can realize the flexible access of instances RDMA The Internet , Greatly improve the relationship between different instances 、 Multiple instances GPU Between 、 Network performance from instance to storage , To be able to improve AI、HPC、 Cache database 、 The overall performance of big data and other scenarios . Baidu Baige platform uses multi computer network interconnection to realize distributed computing , Its server collocation is self-developed DPU, Provide high bandwidth and low delay RDMA The Internet , Support at the same time GDR(GPU Direct RDMA) technology , Massive computing power that can support large-scale heterogeneous computing clusters .
And these core performance optimization , Can't do without Baidu AI Cloud self research DPU Reconfiguration of virtualization architecture . Baidu AI Cloud is right DPU2.0 The core positioning of is “Cloud Native IO Engine”. The core problem under the cloud architecture is the huge increase in east-west traffic in the data center ,IO The burden is too great . Therefore, the key needs to be solved in multi tenancy 、 Fine grained computational force form 、 Under the hardware resource pool architecture with back-end decoupling , Massive IO Data movement 、 signal communication 、 Handle 、 Safety and so on . Redefine hardware and software boundaries , Baidu Taihang DPU2.0 It mainly includes 5 Big key technology :
Software defined virtualization , Support 10000 virtual devices ;
Network hardware acceleration , From software forwarding to hardware forwarding ;
High performance RDMA The Internet , Use self-developed agreement to solve the problem of blank flow control 、 Congestion and so on ;
Separation of memory and computing hardware acceleration , Flatten the difference between local and remote through super large resource pool ;
Cloud managed hardware channel , Ensure that the calculation examples of each form share a pool , Realize thermal migration 、 Hot upgrade 、 Hot plug and other features , Support 100 billion level model training .
For a long time , Baidu and Intel interact on products and technologies 、 Progress together . Take the latest 5th generation ECS instance currently on sale as an example , It carries the latest generation customized by Intel for Baidu IceLake CPU 8350c, fundamental frequency 2.6GHz, Rui frequency 3.1GHz, Compared with the fourth generation cloud server instance, the single core performance of the computing power part is improved 20%, The performance of the whole machine is improved 50%. While the performance is improved, the price of single instance is reduced 5%, The overall cost performance has been greatly improved . While improving the cost performance , The fifth generation instance supports the hot upgrade and downgrade capability without restart , It can realize the vertical expansion of computing performance without interruption of users' key businesses . in addition , Baidu AI Cloud is based on Intel Tofino Programmable switching chip , The programmable hardware gateway has been comprehensively upgraded , Reduce the bandwidth capacity of a single cluster from hundreds G Jump to dozens T, The forwarding delay of a single network element ranges from 30us Down to 1us Level ; single Tbps Decline in energy consumption 90% above .
Diversified supply and demand of computing power , Technology empowerment has no end
The starting point of products and technologies cannot be separated from the needs of customers , With the continuous development of cloud business , Both traditional and emerging industries will encounter some new problems .
First , Supply and demand of computing power will be more diversified . At present, the vast majority of customer needs are related to AI Computing power related , Like a car 、 Meta universe 、 Video isometric scenes , At the same time, low carbon 、 The demand for green energy will be higher and higher , This has led to the emergence of more computing architectures . In the future, a computing component will integrate multiple computing architecture units , Only “ One cloud, many cores ”, In order to efficiently provide all kinds of computing power to customers .
secondly , The requirements for security and data compliance are higher and higher . For example, the self driving industry for data collection 、 desensitization 、 The requirements of annotation and high compliance requirements for data security . This part not only involves the adjustment of the form of computing power deployment , The requirements for the physical level security protection ability of computing power itself are also significantly improved . Another example is that the metauniverse is based on blockchain , Plus virtual assets 、 Scenes such as mixed reality social interaction will inevitably generate more frequent calls for sensitive information of individuals or organizations .
Last , The short-term storage requirements of process data are difficult to meet . The current reasoning of the Internet industry 、 Real time anti fraud and other scenarios in the financial industry , The growth of data volume and the expansion of index scale over the years have greater requirements on memory capacity . in addition , For example, in the scene of life science , Especially for the synthesis of macromolecular drugs 、 Molecular dynamics scenario , More and more customers have generated feedback about insufficient memory capacity .OOM(Out Of Memory, out of memory ) The problem is gradually emerging , Memory wall becomes the bottleneck of some applications ,GPU The upgrade of video memory capacity is limited by hardware , And AI The transmission rate upgrade between accelerators cannot take into account the high-speed growth of low cost and computing power , Therefore, there must be an efficient 、 A cost-effective way to bypass the possible limitations of the memory wall .
Near the end , Zhou Lei predicted 7 In late September, Baidu AI Cloud will release a new distributed cloud IaaS product , It can satisfy users due to some time delay 、 exclusive 、 Requirements of security and other factors on localized computing deployment , Provide users with the same use experience as the public cloud .
Technological innovation has no end . Facing the future , Only by solving the worries of enterprises on the cloud , In order to really make enterprises dare to go to the cloud 、 May the clouds 、 Yi shangyun , Put digital intelligence into practice , Turn technological capabilities into productive forces .
More live dry playback , Waiting for you at Baidu AI Cloud video number !
Click on Read the original Learn more about Baidu AI Cloud products .

边栏推荐
- P5472 [NOI2019] 斗主地(期望、数学)
- Are interviews all about memorizing answers?
- Blackboard cleaning effect shows H5 source code + very romantic / BGM attached
- [half understood] zero value copy
- Byte side: how to realize reliable transmission with UDP?
- Dialogue with Zhuang biaowei: the first lesson of open source
- Several reincarnation stories
- R language - some metrics for unbalanced data sets
- Router firmware decryption idea
- How to effectively implement a rapid and reasonable safety evacuation system in hospitals
猜你喜欢

Good use explosion! The idea version of postman has been released, and its functions are really powerful

Update dev (development version) of the latest win11

Jupiter、spyder、Anaconda Prompt 、navigator 快捷键消失的解决办法

Shell (I)

保障邮箱安全,验证码四个优势
![[pyGame practice] when the end of the world comes, how long can you live in a cruel survival game that really starts from scratch?](/img/2b/1eb02249ab9ad0b4e1bfeeee87418c.png)
[pyGame practice] when the end of the world comes, how long can you live in a cruel survival game that really starts from scratch?

Are interviews all about memorizing answers?

一种比读写锁更快的锁,还不赶紧认识一下

Shell (II)

Open source huizhichuang future | 2022 open atom global open source summit openatom openeuler sub forum was successfully held
随机推荐
PHP detects whether the URL URL link is normally accessible
Using C language to realize bidirectional linked list
R language uses LM function to build regression model, uses the augmented function of bloom package to store the model results in dataframe, and uses ggplot2 to visualize the regression residual diagr
WPF layout controls are scaled up and down with the window, which is suitable for multi-resolution full screen filling applications
Excel shortcut keys (letters + numbers) Encyclopedia
Shell (I)
Google Earth engine - use geetool to download single scene images in batches and retrieve NDSI results with Landsat 8
"Node learning notes" koa framework learning
ZBrush 2022 software installation package download and installation tutorial
Tiktok programmer confession special code tutorial (how to play Tiktok)
A lock faster than read-write lock. Don't get to know it quickly
108. Introduction to the usage of SAP ui5 image display control Avatar
什么是WordPress
中国业务型CDP白皮书 | 爱分析报告
Introduction to the usage of SAP ui5 image display control avatar trial version
Dialogue with Zhuang biaowei: the first lesson of open source
Installing sqlmap on win10 (Windows 7)
Database advanced learning notes cursor
Several reincarnation stories
P5472 [NOI2019] 斗主地(期望、数学)


