Understanding GPU architecture (NVIDIA)

Specifications of graphics card:

Device 0: "GeForce GTX 650

CUDA Driver Version / Runtime Version 6.0 / 6.0

CUDA Capability Major/Minor version number: 3.0

Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores

GPU Clock rate: 1058 MHz (1.06 GHz)

Memory Clock rate: 2500 Mhz

Memory Bus Width: 128-bit

L2 Cache Size: 262144 bytes

Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)

Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers

Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 65536

Warp size: 32

Maximum number of threads per multiprocessor: 2048

Maximum number of threads per block: 1024

Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and kernel execution: Yes with 1 copy engine(s)

I don't understand it completely. I tried looking up in the internet and ended up getting more confused.

What I know is I can launch kernels with blocks having maximum threads 1024 which can be of the form (1024,1,1) or (32,32,1) and so on.

  1. What is the significance of ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores? How can I use this information for the benefit of the program? If the GPU takes cares of it, I guess I shouldn't bother.

  2. What do we mean by Max dimension size of a thread block (x,y,z): (1024, 1024, 64)? particularly, in contrast to the fact that max no. of threads in a block is 1024.

Answers


  1. What is the significance of ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores? How can I use this information for the benefit of the program? If the GPU takes cares of it, I guess I shouldn't bother.

It is useful as it tells you how powerful your GPU is, and lets you compare with others. A GPU on the Kepler K10 board, for example, has the same compute capability, but 8 multiprocessors (SM) of 192 cores each. This is obviously going to have rather significantly more performance.

There's little you can do with this information yourself in most cases. When you have a number of blocks which is of the same order of magnitude as the number of SMs you have, there can be some rather sharp performance spikes if the blocks are scheduled such that in the "tail" some SMs are idle, but the way blocks map to SMs is unspecified, so optimisations based on this knowledge may be invalidated at any time by, for example, a driver update.

2 . What do we mean by Max dimension size of a thread block (x,y,z): (1024, 1024, 64)? particularly, in contrast to the fact that max no. of threads in a block is 1024.

As JackOLantern said in his comment:

... it means that you can have (1024,1,1) or (1,1024,1) blocks, but not (1,1,1024) blocks. Along z, the maximum number of threads is limited to 64.


Need Your Help

Android outputStream.write send multiple messages

android bluetooth

is there a way to send multiple messages with OutputStream.write(bytes[]), for example when i call twice my function to write func.write("hi"); func.write(" how are you");, I receive the message

Chrome Extension Installation Silently

google-chrome google-chrome-extension

I am making a installer for a chrome extension which will install silently.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.