
Printf("Computing result using CUDA Kernel.\n") Allocate CUDA events that we'll use for timingĬheckCudaErrors(cudaEventCreate(&start)) ĬheckCudaErrors(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking)) ĬheckCudaErrors(cudaMemcpyAsync(d_A, h_A, mem_size_A, cudaMemcpyHostToDevice, stream)) ĬheckCudaErrors(cudaMemcpyAsync(d_B, h_B, mem_size_B, cudaMemcpyHostToDevice, stream)) ĭim3 grid(dimsB.x / threads.x, dimsA.y / threads.y) Unsigned int mem_size_C = dimsC.x * dimsC.y * sizeof(float) įloat *h_C = reinterpret_cast(malloc(mem_size_C)) įprintf(stderr, "Failed to allocate host matrix C!\n") ĬheckCudaErrors(cudaMalloc(reinterpret_cast(&d_A), mem_size_A)) ĬheckCudaErrors(cudaMalloc(reinterpret_cast(&d_B), mem_size_B)) ĬheckCudaErrors(cudaMalloc(reinterpret_cast(&d_C), mem_size_C)) Unsigned int mem_size_B = sizeof(float) * size_B įloat *h_B = reinterpret_cast(malloc(mem_size_B)) Unsigned int mem_size_A = sizeof(float) * size_A įloat *h_A = reinterpret_cast(malloc(mem_size_A)) Allocate host memory for matrices A and B Int MatrixMultiply(int argc, char **argv, * Run a simple test of matrix multiplication using CUDA
#Dim3 cuda sample code code#
Code looks like as if written for a discrete GPU /** So i expect to see similar modifications for cuda codes for Tegra devices.ģ) But when i check the “/usr/local/cuda/samples/0_Simple” folder in the Xavier and investigate the matrixMul.cu code, doesn’t really look like modified. In the sample code of the document, cudaMallocManaged used, instead of standard cudaMalloc, for unified memory allocation. Hence checking Cuda for Tegra, applications notes, DA-06762-001_v10.2, page 7 sample code Nvcc version on my Xavier is : “Cuda compilation tools, release 10.2, V10.2.89”

Since Cuda applications require modifications to perform efficiently on Tegra systems, because of the unified memory, Cuda codes for Jetson Xavier should also be modified for better performance. In tegra devices CPU and GPU (iGPU) share the SoC DRAM memory

Could you please confirm and help on these?ġ) Jetson Xavier uses Tegra architecture (?)
