当前位置:网站首页>High performance computing framework for image processing
High performance computing framework for image processing
2022-06-12 10:04:00 【zhoukehu91】
| frame | Introduce | |
| GPU | NPP | NVIDIA Performance Primitives,NVIDIA The company aims at GPU Developed GPU Accelerated images 、 video 、 Signal processing library , After installation CUDA Environment will be automatically installed . By calling NPP function , You can dispense with handwriting CUDA Kernel functions , Fast development . |
| CUDA | Compute Unified Device Architecture: NVIDIA Developed a parallel computing platform and programming model . It uses a graphics processor (GPU) Handling capacity of , Can greatly improve computing performance . You need to know its development language first CUDA C, And then develop . | |
| CPU | IPP | Integrated Performance Primitives,Intel High performance multimedia function library , Contains many functions optimized from the bottom , It covers a variety of applications including image processing , Its interface form is the same as NPP The library is similar to . For image processing ,IPP The library functions provided are introduced and referenced Blog . |
| TBB | Threading Building Blocks,Intel Developed for parallel programming based on C++ The framework of language , It's a set C++ Template library . It provides a lot of features , Have a higher level of abstraction than threads , Mainly used for multi-core CPU Multi thread processing acceleration under the platform . | |
Summarized below :
1)NPP and IPP Are provided by the encapsulated library functions , It mainly provides general algorithms . For example, filtering in image processing 、 Color space conversion, etc .NPP be used for GPU Parallel acceleration of the platform ,IPP be used for CPU Multi thread parallel acceleration of the platform .
2)CUDA and TBB Namely GPU and CPU Parallel development framework under the platform . Typically , For image processing for loop ( Pixel by pixel ) Handle ,CUDA You can accomplish multiple tasks by writing kernel functions CUDA Parallel acceleration of cores , and TBB You can accomplish multiple tasks through its specific interface CPU Parallel processing acceleration .
3) In the process of development , use first OpenMP Conduct CPU Image processing acceleration of the platform , But found CPU High occupancy , And the processing speed has not improved . Subsequent use TBB Development , Achieved the expected goal .IPP and TBB You can go from Intel Download on , see here .
Finally, the author uses TBB Accelerated critical code snippets , It mainly completes the color correction of color image , stay Xeon E3-1230 v2 platform (4 The core 8 Threads ) On , The execution speed of the algorithm is significantly improved . The code is as follows :
// Color image color correction
void ColorCorrect8UC3(Mat source, Mat& dst, int nR, int nG, int nB)
{
dst = source.clone();
if ((nR == 100) && (nG == 100) && (nB == 100))
return;
Mat src = source.clone();
if (nR < 0)
nR = 0;
if (nR > 100)
nR = 100;
if (nG < 0)
nG = 0;
if (nG > 100)
nG = 100;
if (nB < 0)
nB = 0;
if (nB > 100)
nB = 100;
int width = src.cols;
int height = src.rows;
unsigned char* pSrc = src.ptr();
unsigned char* pDst = dst.ptr();
//parallel_for coordination blocked_range2d It will be of great help to image processing
//blocked_range2d Parameter description of :
//(y Starting value ,y End value ,y Step value ,x Starting value ,x End value ,x Step value )
tbb::parallel_for(tbb::blocked_range2d<int>(0, height, 1, 0, width, 1),
[&](const tbb::blocked_range2d<int>& r)
{
for (int i = r.rows().begin(); i < r.rows().end(); ++i)
{
for (int j = r.cols().begin(); j < r.cols().end(); ++j)
{
pDst[i*width * 3 + j * 3 + 2] = (unsigned char)(pSrc[i*width * 3 + j * 3 + 2] * nR / 100.0);
pDst[i*width * 3 + j * 3 + 1] = (unsigned char)(pSrc[i*width * 3 + j * 3 + 1] * nG / 100.0);
pDst[i*width * 3 + j * 3 + 0] = (unsigned char)(pSrc[i*width * 3 + j * 3 + 0] * nB / 100.0);
}
}
});
}边栏推荐
- 003:AWS认为什么是数据湖?
- Implementation of fruit mall wholesale platform based on SSM
- [path of system analyst] Chapter 18 security analysis and design of double disk system
- Auto. JS learning notes 7: JS file calls functions and variables in another JS file to solve various problems of call failure
- JVM (III) Virtual machine performance monitoring & fault handling tool
- 古董级MFC/GDI+框架LCD显示控件
- MySQL 4 Database table storage structure & tablespace
- 【云原生 | Kubernetes篇】Kubernetes 网络策略(NetworkPolicy)
- [untitled]
- 7-13 地下迷宫探索(邻接表)
猜你喜欢
![[preview of the open class of Jishu] arm's strongest MCU core cortex-m85 processor helps the innovation of the Internet of things in an all-round way (there is a lottery)](/img/25/c3af3f51c04865820e3bbe2f010098.png)
[preview of the open class of Jishu] arm's strongest MCU core cortex-m85 processor helps the innovation of the Internet of things in an all-round way (there is a lottery)

Papaya Mobile has a comprehensive layout of cross-border e-commerce SaaS papaya orange. What are the opportunities for this new track?

Introduction to on-line circuit simulation and open source electronic hardware design

QQ, wechat chat depends on it (socket)?

Auto. JS learning note 4: after autojs is packaged, most Huawei and other big brand mobile phones cannot be installed? This problem can be solved by using the simulator to remotely sign and package in

markdown_图片并排的方案

How to do industry analysis

7-13 地下迷宫探索(邻接表)
![[cloud native] what exactly does it mean? This article shares the answer with you](/img/82/f268adcbdbe8195a066d065eb560d7.jpg)
[cloud native] what exactly does it mean? This article shares the answer with you

Explanation of the principle of MySQL's leftmost matching principle
随机推荐
Common tree summary
Papaya Mobile has a comprehensive layout of cross-border e-commerce SaaS papaya orange. What are the opportunities for this new track?
HALCON联合C#检测表面缺陷——仿射变换(三)
【926. 将字符串翻转到单调递增】
机器学习之数据处理与可视化【鸢尾花数据分类|特征属性比较】
Differences among list, set and map
markdown_图片并排的方案
Quickly build oncyber io
Transport layer protocol -- TCP protocol
002:数据湖有哪些特征
Explanation of the principle of MySQL's leftmost matching principle
Briefly introduce the difference between threads and processes
2022 pole technology communication - anmou technology ushers in new opportunities for development
UEFI edkii programming learning
5 most common CEPH failure scenarios
2021-09-13
C 语言仅凭自学能到什么高度?
Periodic pains of cross-border e-commerce? Papaya mobile power as an independent station enabler
[preview of the open class of Jishu] arm's strongest MCU core cortex-m85 processor helps the innovation of the Internet of things in an all-round way (there is a lottery)
传输层协议 ——— TCP协议