Handling hyperthreading, SMT, modules, and other horrors

Most modern machines include multiple virtual processors on each core; this basic idea is called hyperthreading by Intel, SMT by IBM, and other names by others. AMD took it to a new level with the dozer architecture, where two integer units share an FPU.

All of these techniques mainly help for codes that are extremely inefficient: by allowing multiple threads that cannot drive the architecture's backend at its maximal rate, you can get the backend running nearer to peak. However, for efficient codes that can drive the bottleneck backend functional units at their maximal rate, these strategies can cause slowdowns that range from slight to catastrophic, depending on the situation. For ATLAS, the main problem is usually that the increased contention on the caches caused by the extra threads tends to thrash the caches.

The only architecture where I have seen the use of these virtual processors yield speedups on most ATLAS operations is the Sun Niagara; I believe the machine I observed speedups on was a T2, but this might be true for any of the T-series.

I recommend that HPC users turn off these virtual processors on all other systems, which is usually done either in the BIOS or by OS calls. If you do not have root, or if you have less optimized applications that are getting speedup from these virtual cores, you can tell ATLAS to use only the real cores if you learn a little about your machine. Unfortunately, ATLAS cannot presently autodetect these features, but if you experiment you can discover which affinity IDs are the separate cores, and tell ATLAS to use only these cores. The general form is to add the following to your usual configure flags:

    --force-tids="# <thread ID list>"

R. Clint Whaley 2016-07-28