Special instructions for ARM big/little systems

The stable version of ATLAS is not well-suited to use big/little systems efficiently. The problem is that it can use only one block size and kernel, and one optimized for the little systems will not be optimized for the big, and vice versa. If you are only using the serial interface, the best idea is probably to install two versions, one optimized for the big and one optimized for the little CPUs.

In order to install ATLAS to run only a subsection of cores, you can use the -force-tids flag (see §[*] for details). Unless you use something like tasksel to invoke configure, configure may detect the big core as the architecture, when you want the little, and vice versa. Therefore, the easist way to build a library to use only the little or only the big cores is to explicitly tell configure which architecture you want to support by using the -A flag to configure, as described in §[*]. ATLAS 3.10.3 currently has configure support for the following -A ARM32 strings: ARMa7, ARMa9, ARMa15, and the following ARM64 strings: ARM64a53, ARM64a57, ARM64xgene1. For instance, to use only the little cores on my 8-core odroid, I would enter:

   ../configure -b 32 -A ARMa7 --force-tids="4 0 1 2 3"

Whereas to use the big cores exclusively, I would enter:

   ../configure -b 32 -A ARMa9 --force-tids="4 4 5 6 7"

In order to find out which were the big and little cores, I had to examine /proc/cpuinfo.

The more common case is that you want to use all the parallel cores at once to do parallel BLAS calls. Given 3.10's limitations, I would recommend tuning for the small cores. The reason is that 3.10 uses a lot of statically scheduled parallel algorithms, which means the BLAS will run at the speed of the slowest processor. Therefore, it makes sense to get the slow cores to run at peak speed, while the big cores take a performance hit by using the kernel and block factor optimized for the little cores.

So, as an example, my odroid system has 4 a7 (little) cores and 4 a9 (bit) cores. Assuming I want to use all 8 cores, but use the a7 architectural defaults, the configure line is simply:

   ../configure -b 32 -A ARMa7

R. Clint Whaley 2016-07-28