当前位置:网站首页>Pytorch 1.12 was released, officially supporting Apple M1 chip GPU acceleration and repairing many bugs
Pytorch 1.12 was released, officially supporting Apple M1 chip GPU acceleration and repairing many bugs
2022-07-05 20:21:00 【3D vision workshop】
Author Chen Ping
Source: heart of machine
PyTorch 1.12 Official release , Friends who have not been updated can be updated .
distance PyTorch 1.11 Just a few months after launch ,PyTorch 1.12 came ! This version is provided by 1.11 Since version 3124 many times commits form , from 433 Contributors complete .1.12 The version has been significantly improved , And fixed a lot Bug.
With the release of the new version , The most talked about is probably PyTorch 1.12 Support for apple M1 chip .
As early as this year 5 month ,PyTorch Officials have announced their official support for M1 Version of Mac on GPU Accelerated PyTorch Machine learning model training . before ,Mac Upper PyTorch Training can only use CPU, But as the PyTorch 1.12 Release of version , Developers and researchers can take advantage of apple GPU Greatly speed up model training .
stay Mac Introduction of acceleration PyTorch Training
PyTorch GPU Training acceleration is using apple Metal Performance Shaders (MPS) Implemented as a back-end .MPS The back end extends PyTorch frame , Provided in Mac Scripts and functions for setting up and running operations on .MPS Use for each Metal GPU Series of unique features to fine tune the kernel capability to optimize computing performance . The new device maps machine learning computing diagrams and primitives to MPS Graph The framework and MPS The provided tuning kernel .
Each machine equipped with Apple's self-developed chip Mac All have a unified memory architecture , Give Way GPU You can directly access the complete memory storage .PyTorch Official expression , This makes Mac Become an excellent platform for machine learning , Enables users to train larger networks or batch sizes locally . This reduces the costs associated with cloud based development or for additional local resources GPU Computing power demand . The unified memory architecture also reduces data retrieval latency , Improved end-to-end performance .
You can see , And CPU Compared to baseline ,GPU Acceleration has doubled the training performance :
With GPU Blessing , Train and evaluate faster than CPU
The picture above shows apple in 2022 year 4 Monthly use is equipped with Apple M1 Ultra(20 nucleus CPU、64 nucleus GPU)128GB Memory ,2TB SSD Of Mac Studio Test results of the system . The test model is ResNet50(batch size = 128)、HuggingFace BERT(batch size = 64) and VGG16(batch size = 64). Performance testing is carried out using a specific computer system , Reflects Mac Studio General performance of .
PyTorch 1.12 Other new features
front end API:TorchArrow
PyTorch The official has released a new Beta Version for users to try :TorchArrow. This is a machine learning preprocessing library , Batch data processing . It has high performance , Both of them Pandas style , It also has easy to use API, To speed up user preprocessing workflow and development .
(Beta)PyTorch Medium Complex32 and Complex Convolutions
at present ,PyTorch Native support for plural 、 The plural autograd、 Complex number module and a large number of complex number operations ( Linear algebra and fast Fourier transform ). Include torchaudio and ESPNet In many libraries , Have used the plural , also PyTorch 1.12 Through complex convolution and experimental complex32 Data types further expand the complex function , This data type supports half precision FFT operation . because CUDA 11.3 There is... In the package bug, If the user wants to use the plural , Official recommended use CUDA 11.6 package .
(Beta)Forward-mode Automatic differentiation
Forward-mode AD It is allowed to calculate the directional derivative in the forward transfer ( Or equivalent Diya comparable vector product ).PyTorch 1.12 Significantly improved forward-mode AD Coverage of .
BetterTransformer
PyTorch Now support multiple CPU and GPU fastpath Realization (BetterTransformer), That is to say Transformer Encoder module , Include TransformerEncoder、TransformerEncoderLayer and MultiHeadAttention (MHA) The implementation of the . In the new version ,BetterTransformer In many common scenes, the speed is fast 2 times , It also depends on the model and input characteristics . The new version API Support with previous PyTorch Transformer API compatible , If the existing model meets fastpath Carry out the requirements , They will accelerate existing models , And read using the previous version PyTorch Training model .
Besides , There are some updates in the new version :
modular : A new method of module calculation beta Feature is functionality API. This new functional_call() API It allows the user to fully control the parameters used in the module calculation ;
TorchData:DataPipe Improved communication with DataLoader The compatibility of .PyTorch Support is now based on AWSSDK Of DataPipes.DataLoader2 Has been introduced as a management DataPipes And others API A way to interact with the backend ;
nvFuser: nvFuser It's new 、 Faster default fuser, Used to compile to CUDA equipment ;
Matrix multiplication accuracy : By default ,float32 Matrix multiplication on data types will now work in full precision mode , This mode is slow , But it will produce more consistent results ;
Bfloat16: It provides faster calculation time for less precise data types , So in 1.12 Chinese vs Bfloat16 New improvements have been made to data types ;
FSDP API: As a prototype in 1.11 It's published in ,FSDP API stay 1.12 The beta version has been released , And added some improvements .
See more :https://pytorch.org/blog/pytorch-1.12-released/
This article is only for academic sharing , If there is any infringement , Please contact to delete .
3D Visual workshop boutique course official website :3dcver.com
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat
16. be based on Open3D Introduction and practical tutorial of point cloud processing
blockbuster !3DCVer- Academic paper writing contribution Communication group Established
Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .
Be sure to note : Research direction + School / company + nickname , for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .
▲ Long press and add wechat group or contribute
▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days
There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment ~
边栏推荐
- Go language learning tutorial (XV)
- ICTCLAS word Lucene 4.9 binding
- Leetcode: binary tree 15 (find the value in the lower left corner of the tree)
- Solve the problem that the database configuration information under the ThinkPHP framework application directory is still connected by default after modification
- Scala basics [HelloWorld code parsing, variables and identifiers]
- 【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
- 小程序项目结构
- Leetcode(695)——岛屿的最大面积
- 【数字IC验证快速入门】2、通过一个SoC项目实例,了解SoC的架构,初探数字系统设计流程
- Informatics Olympiad 1340: [example 3-5] extended binary tree
猜你喜欢
港股将迎“最牛十元店“,名创优品能借IPO突围?
.Net分布式事務及落地解决方案
About the priority of Bram IP reset
Elk distributed log analysis system deployment (Huawei cloud)
计算lnx的一种方式
Oracle tablespace management
Informatics Orsay all in one 1339: [example 3-4] find the post order traversal | Valley p1827 [usaco3.4] American Heritage
Ros2 topic [01]: installing ros2 on win10
Introduction to dead letter queue (two consumers, one producer)
Fundamentals - configuration file analysis
随机推荐
Bzoj 3747 poi2015 kinoman segment tree
Schema and model
全国爱眼教育大会,2022第四届北京国际青少年眼健康产业展会
ByteDance dev better technology salon was successfully held, and we joined hands with Huatai to share our experience in improving the efficiency of web research and development
【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
Dry goods navigation in this quarter | Q2 2022
Ros2 topic [01]: installing ros2 on win10
Scala基础【HelloWorld代码解析,变量和标识符】
Codeforces Round #804 (Div. 2) - A, B, C
Leetcode skimming: binary tree 17 (construct binary tree from middle order and post order traversal sequence)
ffplay文档[通俗易懂]
银河证券在网上开户安全吗?
[quick start to digital IC Verification] 8. Typical circuits in digital ICs and their corresponding Verilog description methods
Document method
IC科普文:ECO的那些事儿
How to select the Block Editor? Impression notes verse, notation, flowus
What is PyC file
本季度干货导航 | 2022年Q2
c語言oj得pe,ACM入門之OJ~
PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug