GPU Workshop Sample Code
 All Data Structures Namespaces Files Functions Variables Macros Pages
Data Structures | Functions
mycuda_reduce Namespace Reference

Reduction operators and kernels. More...

Data Structures

class  SUM
 
class  MAX
 
class  MIN
 

Functions

template<typename T , typename R >
__device__ void reduce_dev (T *x, int n, R op)
 Reduces a vector x of length n. More...
 
template<typename T , typename R >
__global__ void reduce (T *xsum, T *x, int n, int stride, R op)
 Reduces an M x N matrix. Given a K x N grid of blocks, returns a K x N array of sums. More...
 

Detailed Description

Reduction operators and kernels.

Function Documentation

template<typename T , typename R >
__global__ void mycuda_reduce::reduce ( T *  xsum,
T *  x,
int  n,
int  stride,
op 
)

Reduces an M x N matrix. Given a K x N grid of blocks, returns a K x N array of sums.

Notes

  • Uses dynamic allocation for shared memory. Allocation size is determined by configuration of kernel launch.

Requires

  • blocksize >=32
  • shared_memory_size = 2*blocksize*sizeof(T)

Definition at line 111 of file mycuda_reduce.h.

template<typename T , typename R >
__device__ void mycuda_reduce::reduce_dev ( T *  x,
int  n,
op 
)

Reduces a vector x of length n.

Notes

  • x should be stored in shared memory for best performance.
  • sum is computed in place and returned in x[0].
  • reduction operator implemented as a functor class.

Requires

  • n <= 2*blocksize
  • blocksize >= 32

Definition at line 73 of file mycuda_reduce.h.