GPU Workshop Sample Code
 All Data Structures Namespaces Files Functions Variables Macros Pages
Functions | Variables
reduce1.cu File Reference

Compute blockwise sums of a vector x of length n. More...

#include "../include/mycuda.h"

Go to the source code of this file.

Functions

__device__ void reduce1_dev (float *x, int n)
 __device__ function that does actual reduction More...
 
__global__ void reduce1 (float *xsum, float *x, int stride)
 Reduction kernel. More...
 
int main ()
 

Variables

const int blocksize = 256
 

Detailed Description

Compute blockwise sums of a vector x of length n.

Definition in file reduce1.cu.

Function Documentation

int main ( void  )

Definition at line 85 of file reduce1.cu.

__global__ void reduce1 ( float *  xsum,
float *  x,
int  stride 
)

Reduction kernel.

Given an M x N array, return the column sums.

Notes

  • Uses static shared memory allocation.
  • Each block reduces one column.

Requires

  • gridsize = N
  • blocksize = M

Definition at line 65 of file reduce1.cu.

__device__ void reduce1_dev ( float *  x,
int  n 
)

__device__ function that does actual reduction

Notes

  • The sum is computed in place and returned in x[0].
  • x should be stored in shared memory for best performance (accessing shared memory is much faster than global memory).
  • Note use of synchronization!!

Requires

  • n <= 2*blocksize

Definition at line 25 of file reduce1.cu.

Variable Documentation

const int blocksize = 256

Definition at line 8 of file reduce1.cu.