Module for computing prefix scans. More...
Functions | |
| template<typename T > | |
| __device__ void | scan_dev (T *r, T x, int i, int n) |
| Device code for prefix scan of a single thread block. More... | |
| template<typename T > | |
| __global__ void | scan_blocks (T *w, T *x, T *blocksum, int inclusive) |
| Kernel for doing prefix scan. More... | |
Module for computing prefix scans.
scan example in Nvidia CUDA toolkit. | __global__ void mycuda_scan::scan_blocks | ( | T * | w, |
| T * | x, | ||
| T * | blocksum, | ||
| int | inclusive | ||
| ) |
Kernel for doing prefix scan.
Inclusive scan: y(i) = x(0) + x(1) + y(i-1),
Exclusive scan: y(i) = x(0) + x(1) + y(i),
j scans elements x[j*blocksize], ... x[(j+1)*blocksize-1]sharedMemorySize = 2*n*sizeof(float)n <= blocksize | w | (out) scanned vector |
| x | (in) vector to be scanned |
| blocksum | (out) Block sums |
| inclusive | (in) flag indicating if scan is to be inclusive or exclusive |
Definition at line 63 of file mycuda_scan.h.
| __device__ void mycuda_scan::scan_dev | ( | T * | r, |
| T | x, | ||
| int | i, | ||
| int | n | ||
| ) |
Device code for prefix scan of a single thread block.
| r | (out) shared memory |
| x | (in) ith element of vector to be scanned |
| i | (in) index of element to be scanned |
| n | (in) size of vector to be scanned |
Definition at line 22 of file mycuda_scan.h.
1.8.4