当前位置:网站首页>CUDA realizes L2 European distance
CUDA realizes L2 European distance
2022-06-30 08:31:00 【Wu lele~】
List of articles
Preface
This tutorial implements a A[5] and B[3][5] The Euclidean distance between two matrices CUDA Code .
#include <stdio.h>
#define N 5
#define D 3
#define SIZE N*D
void __global__ cpt(int *da, int *db, int *dres);
void __global__ cpt(int *da, int *db, int *dres)
{
int tid = threadIdx.x; // tid = 0,1,2
int sum=0; // register
for(int i=0; i<N; ++i)
{
sum += (da[i]-db[tid*N + i]) * (da[i]-db[tid*N]+i);
}
dres[tid] = sum;
}
int main(int arc, char *argv[])
{
// host memory and assignment
int *ha, *hb, *hres;
ha =(int *)malloc(sizeof(int)*N);
hb =(int *)malloc(sizeof(int)* SIZE);
hres = (int *)malloc(sizeof(int)*D);
for(int i=0; i<N; ++i)
{
ha[i] = 1;
}
for(int i=0; i<SIZE; ++i)
{
hb[i] = 0;
}
for(int i=0; i<D; ++i)
{
hres[i] = 0;
}
// device memory and copy
int *da, *db, *dres;
cudaMalloc((void **)&da, sizeof(int)*N);
cudaMalloc((void **)&db, sizeof(int)*SIZE);
cudaMalloc((void **)&dres, sizeof(int)*D);
cudaMemcpy(da, ha, sizeof(int)*N, cudaMemcpyHostToDevice);
cudaMemcpy(db, hb, sizeof(int)*SIZE, cudaMemcpyHostToDevice);
cudaMemcpy(dres, hres, sizeof(int)*D, cudaMemcpyHostToDevice);
// set threads and global kerner fun
const dim3 grid_size(1);
const dim3 block_size(D);
cpt<<<grid_size,block_size>>>(da,db,dres);
// cpy device to host
cudaMemcpy(hres, dres, sizeof(int)*D, cudaMemcpyDeviceToHost);
printf("%d\n",hres[0]);
// free memory
free(ha);
free(hb);
free(hres);
cudaFree(da);
cudaFree(db);
cudaFree(dres);
return 0;
}
边栏推荐
- Graffiti Wi Fi & ble SoC development slide strip
- Experiment 2 LED button PWM 2021/11/22
- 【NVMe2.0b 14-6】Format NVM、Keep Alive、Lockdown command
- [nvme2.0b 14-8] set features (Part 2)
- 【NVMe2.0b 14】NVMe Admin Command Set
- Tidb 6.0: making Tso more efficient tidb Book rush
- 云服务器上部署仿牛客网项目
- MIME类型大全
- Be careful of this hole in transmittable thread local
- Gilbert Strang's course notes on linear algebra - Lesson 3
猜你喜欢

Gilbert Strang's course notes on linear algebra - Lesson 2

【kotlin 协程】万字协程 一篇完成kotlin 协程进阶

涂鸦Wi-Fi&BLE SoC开发幻彩灯带

An example of a single service in a cloud project driven by a domain

El input limit can only input numbers

Redis design and Implementation (I) | data structure & object
![[flower carving experience] 13 build the platformio ide development environment of esp32c3](/img/32/2c30afe77bf82774479a671ff16898.jpg)
[flower carving experience] 13 build the platformio ide development environment of esp32c3

Cesium learning notes (IV) visual image & Terrain

Wechat applet reports errors using vant web app

Redis design and Implementation (IV) | master-slave replication
随机推荐
Dlib database face
A troubleshooting of CPU bottom falling
Experiment 3 remote control
Tidb 6.0: making Tso more efficient tidb Book rush
Be careful of this hole in transmittable thread local
Opencv4.2.0+vs2015 configuration
Does the oscilloscope probe affect the measurement of capacitive load?
Sword finger offer II 076 The kth largest number in the array (use heap to solve TOPK problem)
Redis design and Implementation (VII) | publish & subscribe
layer.open 当传值为数组或值太长时处理方法
【JUC系列】Fork/Join框架之概览
El input limit can only input numbers
2021-04-29
【NVMe2.0b 14-4】Directive Send/Receive command
Redis design and Implementation (V) | sentinel sentry
1163 Dijkstra Sequence
Redis design and Implementation (II) | database (deletion strategy & expiration elimination strategy)
C # about Net cognition
Niuke White Moon race 52
2021-02-22