当前位置:网站首页>Use of OpenCL thread algebra library viennacl
Use of OpenCL thread algebra library viennacl
2022-06-30 05:31:00 【Crazy banana Nicky】
ViennaCL
Introduce
http://viennacl.sourceforge.net/
ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).
ViennaCL Is a free open source linear algebra library , For multi-core architecture (gpu, MIC) And multicore cpu The calculation of . The library uses c++ Compiling , Support CUDA、OpenCL and OpenMP( Including the switch during operation ).
newest 1.7 Highlights of the version .X The release series is :
- Fast sparse matrix-matrix multiplications, outperforming CUBLAS and MKL.
- Fast sparse matrix - Matrix multiplication , be better than CUBLAS and MKL.
- Fine-grained parallel algebraic multigrid preconditioners for CPUs, Xeon Phis, and GPUs.
- Fine grained parallel algebraic multigrid preprocessor cpu, Xeon phi Coprocessors and gpu.
- Fine-grained parallel incomplete LU factorization preconditioners for CPUs, Xeon Phis, and GPUs.
- Fine grained parallelism is not complete LU Disassembly preprocessor , be used for cpu, Xeon phi Coprocessors and gpu.

download
http://viennacl.sourceforge.net/viennacl-download.html
Download connection as above , Please use the corresponding version if you need it .ViennaCL-1.7.1.tar.gz
compile
- Unzip the downloaded source code
unzip ViennaCL-1.7.1.zip
The following directory will be generated , Get into build
[email protected]:ViennaCL-1.7.1$ ls
build CL CMakeLists.txt examples libviennacl README viennacl
changelog cmake doc external LICENSE tests
perform cmake-gui ../ And then click Configure.
At this time, it will automatically be in usr Look for OpenCL Compile the header and library files required by the environment (libOpenCL.so). If not, please install the corresponding graphics card driver SDK, If it is an embedded board , Please select cross compile , Then specify but for the desired opencl Header and library files .
Make the header file path and library file path , Click on Generate. Generate Makefile, And then execute make
Or reference build/README.txt
Wait for the compilation to succeed , Move to the corresponding platform .
Use
We see :
In the compiled code examples There will be many executable examples under , For user's use .
benckmarks
#./dense_blas-bench-opencl
----------------------------------------------
Device Info
----------------------------------------------
Vendor: Vivante Corporation
Type: GPU
Available: 1
Max Compute Units: 1
Max Work Group Size: 1024
Global Mem Size: 268435456
Local Mem Size: 32768
Local Mem Type: 2
Host Unified Memory: 1
Benchmark : BLAS
----------------
sCOPY : 2 GB/s
sAXPY : 2 GB/s
sDOT : 4 GB/s
sGEMV-N : 1.97 GB/s
sGEMV-T : 1.48 GB/s
sGEMM-NN : 0.246 GFLOPs/s
sGEMM-NT : 0.246 GFLOPs/s
sGEMM-TN : 0.246 GFLOPs/s
sGEMM-TT : 0.246 GFLOPs/s
----
Benchmark : BLAS
----------------
sCOPY : 0.195 GB/s
sAXPY : 0.196 GB/s
sDOT : 0.171 GB/s
sGEMV-N : 0.0303 GB/s
sGEMV-T : 0.0517 GB/s
sGEMM-NN : 0.00863 GFLOPs/s
sGEMM-NT : 0.234 GFLOPs/s
sGEMM-TN : 0.00868 GFLOPs/s
sGEMM-TT : 0.00863 GFLOPs/s
----
dCOPY : 0.381 GB/s
dAXPY : 0.377 GB/s
dDOT : 0.327 GB/s
dGEMV-N : 0.0596 GB/s
dGEMV-T : 0.101 GB/s
dGEMM-NN : 0.00797 GFLOPs/s
dGEMM-NT : 0.00774 GFLOPs/s
dGEMM-TN : 0.00821 GFLOPs/s
dGEMM-TT : 0.00797 GFLOPs/s
#./opencl-bench-opencl
----------------------------------------------
Device Info
----------------------------------------------
Name: Vivante OpenCL Device VIP8000-OI.8102.0000
Vendor: Vivante Corporation
Type: GPU
Available: 1
Ma[ 5243.331300] VIP8000 SetPower 0
x Compute Units: 1
Max Work Group Size: 1024
Global Mem Size: 268435456
Local Mem Size: 32768
Local Mem Type: 2
Host Unified Memory: 1
----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
Time for building scalar kernels: 4e-06
Time for building vector kernels: 1.446
Time for building matrix kernels: 2.98157
Time for building compressed_matrix kernels: 1.88953
Time for 100000 entry accesses on host: 0.004118
Time per entry: 4.118e-08
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 35.0961
Time per entry: 0.000350961
Result of operation via OpenCL: 104839
bandwidth-reduction
#./bandwidth-reduction
-- Generating matrix --
* Unknowns: 262144
* Initial bandwidth: 8192
* Randomly reordered bandwidth: 262051
-- Cuthill-McKee algorithm --
* Reordered bandwidth: 6207
-- Advanced Cuthill-McKee algorithm --
* Reordered bandwidth: 6207
-- Gibbs-Poole-Stockmeyer algorithm --
* Reordered bandwidth: 6207
!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
fft
Computing FFT Matrix
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0))
Done
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((32,40,-16,0,-8,-8,-9.53674e-07,-16),(-8,0,0,0,0,0,0,0),(0,-8,0,0,0,0,0,0),(-4.76837e-07,-8,0,0,0,0,0,0))
Transpose
m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
o: [4,8]((0,0,0,1,1,1,1,2),(1,1,1,2,2,2,2,3),(2,2,2,3,3,3,3,4),(3,3,3,4,4,4,4,5))
---------------------
Computing FFT bluestein
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](28,2.38419e-07,-4,9.65685,-4,4,-4,1.65685,-4,-3.2981e-07,-4,-1.65685,-4,-4,-4,-9.65685)
---------------------
Computing FFT
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](28,0,-4,9.65685,-4,4,-4,1.65685,-4,0,-4,-1.65685,-4,-4,-4,-9.65685)
---------------------
Computing inverse FFT...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,0,1,4.56956e-08,2,-2.78181e-08,3,-1.64905e-07,4,0,5,-7.35137e-08,6,2.78181e-08,7,1.92723e-07)
---------------------
Computing real to complex...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,0,0,0,1,0,0,0,2,0,0,0,3,0,0,0)
---------------------
Computing complex to real...
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
output_vec: [16](0,1,2,3,4,5,6,7,2,0,0,0,3,0,0,0)
---------------------
Computing multiply complex
input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
input2_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
Done
output_vec: [16](0,0,1,0,4,0,9,0,16,0,25,0,36,0,49,0)
---------------------
!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
There are also many examples that can be compiled , You can explore .
边栏推荐
- Operation of JSON file
- Use the code cloud publicholiday project to determine whether a day is a working day
- RedisTemplate 常用方法汇总
- Revit二次开发---未打开项目使用面板功能
- 企事业单位源代码防泄露工作该如何进行
- [learning notes] AssetBundle, xlua, hot update (use steps)
- 2021-10-31
- Digital signature——
- Baiwen.com 7 days Internet of things smart home learning experience punch in the third day
- PyGame. Why can't I exit when I click X in the window? I can only exit when I return idle
猜你喜欢

图扑软件基于钻孔数据的三维地质模型可视化
![[Motrix] download Baidu cloud files using Motrix](/img/d3/f3d29468367cf5011781f20f27a5c8.jpg)
[Motrix] download Baidu cloud files using Motrix

Summary of common loss functions in pytorch

Visualization of 3D geological model based on borehole data by map flapping software

企事业单位源代码防泄露工作该如何进行

OpenCL线程代数库ViennaCL的使用

The minecraft server address cannot be refreshed.
![[learning notes] AssetBundle, xlua, hot update (use steps)](/img/59/9d9f31cfe55a908f2f0705e95ecc05.jpg)
[learning notes] AssetBundle, xlua, hot update (use steps)

【板栗糖GIS】global mapper—如何把栅格的高程值赋予给点

How to prevent source code leakage in enterprises and institutions
随机推荐
聲網,站在物聯網的“土壤”裏
Very nervous. What should I do on the first day of software testing?
How to use js to control the scroll bar of moving div
Golan no tests were run: fmt Printf() < BUG>
Unity ugui text value suspended enlarged display add text background
Revit二次開發---未打開項目使用面板功能
Rotation, translation and scaling of unity VR objects
C语言基础小操作
Go Land no tests were Run: FMT cannot be used. Printf () & lt; BUG & gt;
pytorch中常用损失函数总结
Pytorch的安装以及入门使用
What are membrane stress and membrane strain
E: Topic focus
如何制作CSR(Certificate Signing Request)文件?
旋转框目标检测mmrotate v0.3.1 训练DOTA数据集(二)
Unity 3D model operation and UI conflict Scrollview
使用码云PublicHoliday项目判断某天是否为工作日
2022年,谁在推动音视频产业的新拐点?
mmdet之Loss模块详解
Unityshader learning notes - Basic Attributes