当前位置:网站首页>CUDA realizes L2 European distance
CUDA realizes L2 European distance
2022-06-30 08:31:00 【Wu lele~】
List of articles
Preface
This tutorial implements a A[5] and B[3][5] The Euclidean distance between two matrices CUDA Code .
#include <stdio.h>
#define N 5
#define D 3
#define SIZE N*D
void __global__ cpt(int *da, int *db, int *dres);
void __global__ cpt(int *da, int *db, int *dres)
{
int tid = threadIdx.x; // tid = 0,1,2
int sum=0; // register
for(int i=0; i<N; ++i)
{
sum += (da[i]-db[tid*N + i]) * (da[i]-db[tid*N]+i);
}
dres[tid] = sum;
}
int main(int arc, char *argv[])
{
// host memory and assignment
int *ha, *hb, *hres;
ha =(int *)malloc(sizeof(int)*N);
hb =(int *)malloc(sizeof(int)* SIZE);
hres = (int *)malloc(sizeof(int)*D);
for(int i=0; i<N; ++i)
{
ha[i] = 1;
}
for(int i=0; i<SIZE; ++i)
{
hb[i] = 0;
}
for(int i=0; i<D; ++i)
{
hres[i] = 0;
}
// device memory and copy
int *da, *db, *dres;
cudaMalloc((void **)&da, sizeof(int)*N);
cudaMalloc((void **)&db, sizeof(int)*SIZE);
cudaMalloc((void **)&dres, sizeof(int)*D);
cudaMemcpy(da, ha, sizeof(int)*N, cudaMemcpyHostToDevice);
cudaMemcpy(db, hb, sizeof(int)*SIZE, cudaMemcpyHostToDevice);
cudaMemcpy(dres, hres, sizeof(int)*D, cudaMemcpyHostToDevice);
// set threads and global kerner fun
const dim3 grid_size(1);
const dim3 block_size(D);
cpt<<<grid_size,block_size>>>(da,db,dres);
// cpy device to host
cudaMemcpy(hres, dres, sizeof(int)*D, cudaMemcpyDeviceToHost);
printf("%d\n",hres[0]);
// free memory
free(ha);
free(hb);
free(hres);
cudaFree(da);
cudaFree(db);
cudaFree(dres);
return 0;
}
边栏推荐
- C preliminary chapter learning route
- Camera
- 2021-02-19
- [untitled]
- 【NVMe2.0b 14-4】Directive Send/Receive command
- 【NVMe2.0b 14-3】Doorbell Buffer Config command、Device Self-test command
- JS中的this指向
- Unity简单shader
- Cesium learning notes (IV) visual image & Terrain
- Is the reverse repurchase of treasury bonds absolutely safe? How to open an account online
猜你喜欢

【NVMe2.0b 14-5】Firmware Download/Commit command

An example of a single service in a cloud project driven by a domain

【JUC系列】Fork/Join框架之概览

【NVMe2.0b 14-3】Doorbell Buffer Config command、Device Self-test command

Swagger use
![[nvme2.0b 14-8] set features (Part 2)](/img/fe/67fd4f935237f9aa835e132e696b98.png)
[nvme2.0b 14-8] set features (Part 2)

【NVMe2.0b 14-7】Set Features(上篇)

云服务器上部署仿牛客网项目

Do you know the IP protocol?

【NVMe2.0b 14-6】Format NVM、Keep Alive、Lockdown command
随机推荐
【NVMe2.0b 14-7】Set Features(上篇)
Rendering engine development
【JUC系列】Fork/Join框架之概览
Redis设计与实现(七)| 发布 & 订阅
电流探头电路分析
C preliminary chapter learning route
Axure make menu bar effect
【NVMe2.0b 14-4】Directive Send/Receive command
Swagger use
【NVMe2.0b 14-1】Abort、Asynchronous Event Request、Capacity Management command
【NVMe2.0b 14-3】Doorbell Buffer Config command、Device Self-test command
微信公众号第三方平台开发,零基础入门。想学我教你啊
【NVMe2.0b 14-8】Set Features(下篇)
Flink Sql -- toAppendStream doesn‘t support consuming update and delete changes which
Be careful of this hole in transmittable thread local
[untitled]
vite项目require语法兼容问题解决require is not defined
[nvme2.0b 14-7] set features (Part 1)
【kotlin 协程】万字协程 一篇完成kotlin 协程进阶
codeforces每日5题(均1700)-第三天