GPU Workshop Sample Code
 All Data Structures Namespaces Files Functions Variables Macros Pages
Functions
mycuda_scan Namespace Reference

Module for computing prefix scans. More...

Functions

template<typename T >
__device__ void scan_dev (T *r, T x, int i, int n)
 Device code for prefix scan of a single thread block. More...
 
template<typename T >
__global__ void scan_blocks (T *w, T *x, T *blocksum, int inclusive)
 Kernel for doing prefix scan. More...
 

Detailed Description

Module for computing prefix scans.

Notes

Function Documentation

template<typename T >
__global__ void mycuda_scan::scan_blocks ( T *  w,
T *  x,
T *  blocksum,
int  inclusive 
)

Kernel for doing prefix scan.

Notes

  • Inclusive scan: y(i) = x(0) + x(1) + y(i-1),

    Exclusive scan: y(i) = x(0) + x(1) + y(i),

  • block j scans elements x[j*blocksize], ... x[(j+1)*blocksize-1]

Requires

  • sharedMemorySize = 2*n*sizeof(float)
  • n <= blocksize
Parameters
w(out) scanned vector
x(in) vector to be scanned
blocksum(out) Block sums
inclusive(in) flag indicating if scan is to be inclusive or exclusive

Definition at line 63 of file mycuda_scan.h.

template<typename T >
__device__ void mycuda_scan::scan_dev ( T *  r,
x,
int  i,
int  n 
)

Device code for prefix scan of a single thread block.

Notes

  • Uses shared memory to perform scan.
Parameters
r(out) shared memory
x(in) ith element of vector to be scanned
i(in) index of element to be scanned
n(in) size of vector to be scanned

Definition at line 22 of file mycuda_scan.h.