Saturday, October 11, 2014

Run OpenCL on the new Samsung Chromebook 2 in 5(-ish) simple steps

Recently a colleague and friend of mine posted a great tutorial on how to run OpenCL on Samsung's Chromebook in 30 minutes. He has tested this tutorial on the older (Series 3) Chromebook.

I bought myself the newer version, the Samsung Chromebook 2 (11" version). The main difference between these two laptops is that the former chromebook hosts a Mali-T604 GPU while the latest model uses a bifier Mali T628-MP6 chip. The Mali-T604 GPU has 4 cores vs the 6 in the latest chip. The latter is definitely an interesting chip for OpenCL's folks since the 6 cores are going to be split into two physical devices with 4 and 2 cores respectively.

In this blog I will present a slightly different way of setting up a working OpenCL environment for the Chromebook 2 (which should be also working on the former chromebook).

Prerequisites

  • A Samsung Chromebook 2 (or any other ARM-based device running ChromeOS)
  • Some free space on your drive (2/4 GBs)
  • Knowledge of Arch Linux package manager (pacman)
  • Optional: I will install the development system in the internal SSD. If you think you are going to need more space for development, then you can use the microSD)

Step 1: Enable Developer mode


Enter Recovery Mode by holding the ESC and REFRESH (↻ or F3) buttons, and pressing the POWER button. In Recovery Mode, press Ctrl+D and ENTER to confirm and enable Developer Mode.


Step 2: Install chroarg

chroarg is a fork of the cruton process. It is based on the chroot command available in ChromeOS which allows to spawn lightweight virtual OSs, a more technical explanation follows:
What's a chroot?
Like virtualization, chroots provide the guest OS with their own, segregated file system to run in, allowing applications to run in a different binary environment from the host OS. Unlike virtualization, you are not booting a second OS; instead, the guest OS is running using the Chromium OS system. The benefit to this is that there is zero speed penalty since everything is run natively, and you aren't wasting RAM to boot two OSes at the same time. [...] 
While cruton will install by default Ubuntu, chroarg is based on Arch Linux. I personally prefer Arch Linux, but if you feel more confident with Ubuntu feel free the use cruton. Follows the creation of a chroot (more options are available from the project github page)

  1. Launch a crosh shell (Ctrl+Alt+T, you can paste in the console using Ctrl+Shift+V), then enter the command shell.
  2. Download and extract chroagh:
    $ cd ~/Downloads
    $ wget https://api.github.com/repos/drinkcat/chroagh/tarball -O chroagh.tar.gz
    $ tar xvf chroagh.tar.gz
    $ cd drinkcat-chroagh-*
    
    
  3. Create the rootfs:
    $ sudo sh -e installer/main.sh -r arch -t cli-extra 
The tool will install a minimal Arch, at some point it will ask to give the user name and password for the main user. If everything went fine (it often does), then you are ready to start your Arch installation within ChromeOS.

NOTE: If you want to install the chroot to a different location (e.g., an SD/microSD card or USB) then use the -p option to specify a destination folder.

Step 3: Enter-chroot and environment setup

After chroarg finishes installing a base Arch Linux installation we can enter this virtual environment using the command (from any crosh shell):

$ sudo enter-chroot

You should see the following output:
chronos@localhost ~/Downloads/drinkcat-chroagh-380f361 $ sudo enter-chroot
Entering /mnt/stateful_partition/crouton/chroots/arch...
[motonacciu@localhost ~]$ 

And magic magic, we are not in Arch linux. At this point you can install a bounce of packages which are going to be useful for OpenCL development.

$ sudo pacman -S gcc vim cmake base-devel git opencl-headers

Next (and final) step is downloading the Mali userspace drivers from. They are available from malideveloper.com and continuously updated. This is the moment where you should be aware of the Mali device you have installed in your chromebook. In my case the driver marked as MALI-T62x will do the trick. For the older chromebook the MALI-T604 driver should be used instead. Since we are using the command line we can download the fbdev version of the drivers:

$ wget http://malideveloper.arm.com/downloads/drivers/binary/r4p0-02rel0/mali-t62x_r4p0-02rel0_linux_1+fbdev.tar.gz
$ tar -xf mali-t62x_r4p0-02rel0_linux_1+fbdev.tar.gz
$ ls fbdev
libEGL.so  libGLESv1_CM.so  libGLESv2.so  libmali.so  libOpenCL.so


Next we can either edit the ~/.bashrc file to include this folder among the LD_LIBRARY_PATH, alternatively you can copy the libraries in your /usr/lib folder (but you will need to add the user among the sudoers). Or simply manually specify its path to GCC.

Step 4: Compile and Run your CL program

When you compile your program make sure the linker can find the libmali.so. The libOpenCL.so library is just a wrapper, the only library needed for running CL program is the libmali.so.

Compile your program as follows:
$ g++ -std=c++11 main.cpp -Iinclude -L/home/compute/fbdev -lmali -o clInfo

This is a simple CL program which prints the list of devices. We run using the following command (you can avoid specifying the LD_LIBRARY_PATH if you placed the libmali.so in a default location):

$ LD_LIBRARY_PATH=/home/compute/fbdev/:$LD_LIBRARY_PATH ./clInfo
Total number of CL devices: 2

Device 0 info
 - Vendor:            'ARM'
 - Name:              'Mali-T628'
 - Type:              'CL_DEVICE_TYPE_GPU'
 - Max frequency:     '533'
 - Max compute units: '2'
 - Global mem size:   '2097192960'
 - Local mem size:    '32768'
 - Profile:           'FULL_PROFILE'
 - Driver version:    '1.1'
 - Extensions:        'cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf'

Device 1 info
 - Vendor:            'ARM'
 - Name:              'Mali-T628'
 - Type:              'CL_DEVICE_TYPE_GPU'
 - Max frequency:     '533'
 - Max compute units: '4'
 - Global mem size:   '2097192960'
 - Local mem size:    '32768'
 - Profile:           'FULL_PROFILE'
 - Driver version:    '1.1'
 - Extensions:        'cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf'

Step 5: Done

Yes, that's all. You can now start writing you multi-device OpenCL codes for ARM's Mali GPU. Let's verify that everything is in order... shall we?

OpenMP 'matmul_1024x1024' on ARM-CPU   [cores:8] => 16632 msecs
OpenCL 'matmul_1024x1024' on Mali-T628 [cores:2] => 616 msecs
OpenCL 'matmul_1024x1024' on Mali-T628 [cores:4] => 319 msecs

Stay tuned for more experiments!

C++ <3