CUDA kernel error when increasing thread number

I am developing a CUDA ray-plane intersection kernel.

Let's suppose, my plane (face) struct is:

typedef struct _Face {
    int ID;
    int matID;

    int V1ID;
    int V2ID;
    int V3ID;

    float V1[3];
    float V2[3];
    float V3[3];

    float reflect[3];

    float emmision[3];
    float in[3];
    float out[3];

    int intersects[RAYS];

} Face;

I pasted the whole struct so you can get an idea of it's size. RAYS equals 625 in current configuration. In the following code assume that the size of faces array is i.e. 1270 (generally - thousands).

Now until today I have launched my kernel in a very naive way:

const int tpb = 64; //threads per block
dim3 grid = (n +tpb-1)/tpb; // n - face count in array
dim3 block = tpb;
//.. some memory allocation etc.
theKernel<<<grid,block>>>(dev_ptr, n);

and inside the kernel I had a loop:

__global__ void theKernel(Face* faces, int faceCount) {
    int offset = threadIdx.x + blockIdx.x*blockDim.x;
    if(offset >= faceCount)
    Face f = faces[offset];
    //..some initialization
    int RAY = -1;
    for(float alpha=0.0f; alpha<=PI; alpha+= alpha_step ){ 
        for(float beta=0.0f; beta<=PI; beta+= beta_step ){ 
            //..calculation per ray in (alpha,beta) direction ...
            faces[offset].intersects[RAY] = ...; //some assignment

This is about it. I looped through all the directions and updated the faces array. I worked correctly, but was hardly any faster than CPU code.

So today I tried to optimize the code, and launch the kernel with a much bigger number of threads. Instead of having 1 thread per face I want 1 thread per face's ray (meaning 625 threads work for 1 face). The modifications were simple:

dim3 grid = (n*RAYS +tpb-1)/tpb;  //before launching . RAYS = 625, n = face count

and the kernel itself:

__global__ void theKernel(Face *faces, int faceCount){

int threadNum = threadIdx.x + blockIdx.x*blockDim.x;

int offset = threadNum/RAYS; //RAYS is a global #define
int rayNum = threadNum - offset*RAYS;

if(offset >= faceCount || rayNum != 0)

Face f = faces[offset];
//initialization and the rest.. again ..

And this code does not work at all. Why? Theoretically, only the 1st thread (of the 625 per Face) should work, so why does this result in bad (hardly any) computation?

Kind regards, e.


The maximum size of a grid in any dimension is 65535 (CUDA programming guide, Appendix F). If your grid size was 1000 before the change, you have increased it to 625000. That's bigger than the limit, so the kernel won't run correctly.

If you define the grid size as

dim3 grid((n + tpb - 1) / tpb, RAYS);

then all grid dimensions will be smaller than the limit. You'll also have to change the way blockIdx is used in the kernel.

As Heatsink pointed out you are probably exceeding available resources. Good idea is to check after kernel execution whether there was no error.

Here is C++ code I use:

#include <cutil_inline.h>

check_error(const char* str, cudaError_t err_code) {
    if (err_code != ::cudaSuccess)
        std::cerr << str << " -- " << cudaGetErrorString(err_code) << "\n";

Then when I invole kernel:

my_kernel <<<block_grid, thread_grid >>>(args);
check_error("my_kernel", cudaGetLastError());

Need Your Help

mpdf generates pdf without images

php pdf mpdf

I use mpdf for a php project but I am having difficulties generating pdf file with images. it displays the pdf file correctly on browsers but after I save the pdf file it doesnt display the images. I

On Row Click disable/enable a button depending on content of a cell in that row

javascript jquery-plugins flexigrid

I'm currently using the flexigrid.js plugin and there is a button that I wish to enable/disable depending on whether a certain cell in the currently selected row is equal to a certain value.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.