Updated: Dec. 3, 2010, 7:27 p.m.

Compute Unified Device Architecture

Some notes on legoing with CUDA on an ASUS EB1012 running Xubuntu 10.10 64-bit.

Installation is straightforward and well-documented in the Getting Start Guide. Download a bunch of code (Drivers/Toolkit/SDK) from CUDA 3.2 Downloads, run the installation, compile some stuff and you are ready to start legoing!

I highly recommend compiling/installing everything from scratch, it is by far the fastest path to a working environment! One lazy approach would be to skim the Getting Start Guide and following the copy-pastable notes below.

WARNING If you try to take a "shortcut" by re-using your systems currently installed driver or try installing a driver via some arbitrary PPA. Then you need to keep following in mind:


Required software:

sudo apt-get install vim lynx g++ ia32-libs libx11-dev libglut3-dev libgl1-mesa-dev libglu-dev libXmu-dev libxi-dev

Acquire the Toolkit, Device driver and GPU Computing SDK from

# Fetch files
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_64_260.19.21.run
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_64_ubuntu10.04.run
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run

# Set Execution Rights
chmod +x *.run

# Install
sudo ./devdriver_3.2_linux_64_260.19.21.run
sudo ./cudatoolkit_3.2.16_linux_64_ubuntu10.04.run

# Set library Paths
echo '' >> ~/.bashrc
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/lib"' >> ~/.bashrc

# Fix glut-library
sudo ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so

# Load driver
sudo modprobe nvidia



./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "ION"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 534052864 bytes
  Multiprocessors x Cores/MP = Cores:            2 (MP) x 8 (Cores/MP) = 16 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.10 GHz
  Concurrent copy and execution:                 No
  Run time limit on kernels:                     Yes
  Integrated:                                    Yes
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   No
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 1, Device = ION


Press <Enter> to Quit...


./bandwidthTest Starting...

Running on...

 Device 0: ION
 Quick Mode

 Host to Device Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         1240.4

 Device to Host Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         795.0

 Device to Device Bandwidth, 1 Device(s)
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         6806.1

[bandwidthTest] - Test results:

Press <Enter> to Quit...