当前位置:网站首页>Use of OpenCL thread algebra library viennacl
Use of OpenCL thread algebra library viennacl
2022-06-30 05:31:00 【Crazy banana Nicky】
ViennaCL
Introduce
http://viennacl.sourceforge.net/
ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).
ViennaCL Is a free open source linear algebra library , For multi-core architecture (gpu, MIC) And multicore cpu The calculation of . The library uses c++ Compiling , Support CUDA、OpenCL and OpenMP( Including the switch during operation ).
newest 1.7 Highlights of the version .X The release series is :
- Fast sparse matrix-matrix multiplications, outperforming CUBLAS and MKL.
- Fast sparse matrix - Matrix multiplication , be better than CUBLAS and MKL.
- Fine-grained parallel algebraic multigrid preconditioners for CPUs, Xeon Phis, and GPUs.
- Fine grained parallel algebraic multigrid preprocessor cpu, Xeon phi Coprocessors and gpu.
- Fine-grained parallel incomplete LU factorization preconditioners for CPUs, Xeon Phis, and GPUs.
- Fine grained parallelism is not complete LU Disassembly preprocessor , be used for cpu, Xeon phi Coprocessors and gpu.

download
http://viennacl.sourceforge.net/viennacl-download.html
Download connection as above , Please use the corresponding version if you need it .ViennaCL-1.7.1.tar.gz
compile
- Unzip the downloaded source code
unzip ViennaCL-1.7.1.zip
The following directory will be generated , Get into build
[email protected]:ViennaCL-1.7.1$ ls
build CL CMakeLists.txt examples libviennacl README viennacl
changelog cmake doc external LICENSE tests
perform cmake-gui ../ And then click Configure.
At this time, it will automatically be in usr Look for OpenCL Compile the header and library files required by the environment (libOpenCL.so). If not, please install the corresponding graphics card driver SDK, If it is an embedded board , Please select cross compile , Then specify but for the desired opencl Header and library files .
Make the header file path and library file path , Click on Generate. Generate Makefile, And then execute make
Or reference build/README.txt
Wait for the compilation to succeed , Move to the corresponding platform .
Use
We see :
In the compiled code examples There will be many executable examples under , For user's use .
benckmarks
#./dense_blas-bench-opencl
----------------------------------------------
Device Info
----------------------------------------------
Vendor: Vivante Corporation
Type: GPU
Available: 1
Max Compute Units: 1
Max Work Group Size: 1024
Global Mem Size: 268435456
Local Mem Size: 32768
Local Mem Type: 2
Host Unified Memory: 1
Benchmark : BLAS
----------------
sCOPY : 2 GB/s
sAXPY : 2 GB/s
sDOT : 4 GB/s
sGEMV-N : 1.97 GB/s
sGEMV-T : 1.48 GB/s
sGEMM-NN : 0.246 GFLOPs/s
sGEMM-NT : 0.246 GFLOPs/s
sGEMM-TN : 0.246 GFLOPs/s
sGEMM-TT : 0.246 GFLOPs/s
----
Benchmark : BLAS
----------------
sCOPY : 0.195 GB/s
sAXPY : 0.196 GB/s
sDOT : 0.171 GB/s
sGEMV-N : 0.0303 GB/s
sGEMV-T : 0.0517 GB/s
sGEMM-NN : 0.00863 GFLOPs/s
sGEMM-NT : 0.234 GFLOPs/s
sGEMM-TN : 0.00868 GFLOPs/s
sGEMM-TT : 0.00863 GFLOPs/s
----
dCOPY : 0.381 GB/s
dAXPY : 0.377 GB/s
dDOT : 0.327 GB/s
dGEMV-N : 0.0596 GB/s
dGEMV-T : 0.101 GB/s
dGEMM-NN : 0.00797 GFLOPs/s
dGEMM-NT : 0.00774 GFLOPs/s
dGEMM-TN : 0.00821 GFLOPs/s
dGEMM-TT : 0.00797 GFLOPs/s
#./opencl-bench-opencl
----------------------------------------------
Device Info
----------------------------------------------
Name: Vivante OpenCL Device VIP8000-OI.8102.0000
Vendor: Vivante Corporation
Type: GPU
Available: 1
Ma[ 5243.331300] VIP8000 SetPower 0
x Compute Units: 1
Max Work Group Size: 1024
Global Mem Size: 268435456
Local Mem Size: 32768
Local Mem Type: 2
Host Unified Memory: 1
----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
Time for building scalar kernels: 4e-06
Time for building vector kernels: 1.446
Time for building matrix kernels: 2.98157
Time for building compressed_matrix kernels: 1.88953
Time for 100000 entry accesses on host: 0.004118
Time per entry: 4.118e-08
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 35.0961
Time per entry: 0.000350961
Result of operation via OpenCL: 104839
bandwidth-reduction
#./bandwidth-reduction
-- Generating matrix --
* Unknowns: 262144
* Initial bandwidth: 8192
* Randomly reordered bandwidth: 262051
-- Cuthill-McKee algorithm --
* Reordered bandwidth: 6207
-- Advanced Cuthill-McKee algorithm --
* Reordered bandwidth: 6207
-- Gibbs-Poole-Stockmeyer algorithm --
* Reordered bandwidth: 6207
!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
fft
Computing FFT Matrix
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0))
Done
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((32,40,-16,0,-8,-8,-9.53674e-07,-16),(-8,0,0,0,0,0,0,0),(0,-8,0,0,0,0,0,0),(-4.76837e-07,-8,0,0,0,0,0,0))
Transpose
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((0,0,0,1,1,1,1,2),(1,1,1,2,2,2,2,3),(2,2,2,3,3,3,3,4),(3,3,3,4,4,4,4,5))
---------------------
Computing FFT bluestein
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](28,2.38419e-07,-4,9.65685,-4,4,-4,1.65685,-4,-3.2981e-07,-4,-1.65685,-4,-4,-4,-9.65685)
---------------------
Computing FFT
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](28,0,-4,9.65685,-4,4,-4,1.65685,-4,0,-4,-1.65685,-4,-4,-4,-9.65685)
---------------------
Computing inverse FFT...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,0,1,4.56956e-08,2,-2.78181e-08,3,-1.64905e-07,4,0,5,-7.35137e-08,6,2.78181e-08,7,1.92723e-07)
---------------------
Computing real to complex...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,0,0,0,1,0,0,0,2,0,0,0,3,0,0,0)
---------------------
Computing complex to real...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,1,2,3,4,5,6,7,2,0,0,0,3,0,0,0)
---------------------
Computing multiply complex
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
input2_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
output_vec: [16](0,0,1,0,4,0,9,0,16,0,25,0,36,0,49,0)
---------------------
!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
There are also many examples that can be compiled , You can explore .
边栏推荐
- Rotation, translation and scaling of unity VR objects
- 2021-10-31
- PKCs 12:personal information exchange syntax v1.1 translation part I
- Unity 3D model operation and UI conflict Scrollview
- How can the international trading platform for frying US crude oil guarantee capital security?
- Xi'an Jiaotong 21st autumn "computerized accounting" online homework answer sheet (I) [standard answer]
- Sound net, debout dans le "sol" de l'IOT
- 如何写论文
- Question mark (?) in Cron expression Use of
- VFPBS上传EXCEL并保存MSSQL到数据库中
猜你喜欢

图扑软件基于钻孔数据的三维地质模型可视化

Unit asynchronous jump progress

Database base (Study & review for self use)

Unity screenshot method

Unity scroll view element drag and drop to automatically adsorb centering and card effect

Rotation, translation and scaling of unity VR objects
![[learning notes] AssetBundle, xlua, hot update (use steps)](/img/59/9d9f31cfe55a908f2f0705e95ecc05.jpg)
[learning notes] AssetBundle, xlua, hot update (use steps)

Sound net, debout dans le "sol" de l'IOT

pytorch中常用损失函数总结

Digital signature——
随机推荐
PKCs 12:personal information exchange syntax v1.1 translation part I
Xi'an Jiaotong 21st autumn online expansion resources of online trade and marketing (III) [standard answer]
Rotation, translation and scaling of unity VR objects
Pyinstaller flash back
GoLand No Tests Were Run : 不能使用 fmt.Printf() <BUG>
Database SQL language 05 SQL exercise
Solidity - 安全 - 重入攻击(Reentrancy)
Xi'an Jiaotong automation control theory test simulation question [standard answer]
pytorch中常用损失函数总结
如何写论文
《谁动了我的奶酪》读后感
Intellj idea jars projects containing external lib to other project reference methods - jars
Word frequency statistics (string, list)
RedisTemplate 常用方法汇总
Nestjs configures static resources, template engine, and post examples
Is it safe to open an account and trade with a compass?
旋转框目标检测mmrotate v0.3.1 学习配置
[notes] unity Scrollview button page turning
The minecraft server address cannot be refreshed.
Responsive flow layout