ATLAS 3.10.2 errata

[Home] [Docs] [FAQ] [Errata] [Software] [Install] [Support] [Lists] [Developer home] [Timings]

ATLAS v3.10.2 released 07/10/14

ATLAS v3.10.0 released 07/10/12
- Errata for the previous stable, ATLAS 3.10.1 (no longer maintained).
- Errata for the previous stable, ATLAS 3.10.0 (no longer maintained).

ATLAS errors:

HERK/SYRK sometimes reads C for beta=0 case, causing NaN propogation
Error in ROTMG causes failures in modern lapack tests
SYRK2K/HER2K sometimes read C for beta=0 case, causing spurious NaN propogation

ATLAS 3.10.2 Performance:

Performance is terrible for AVX or later SIMD vectorization or large scale parallel architectures. These platforms should probably use the developer release, which can more than double your performance.

Help for particular systems:

Building using OpenMP
Help and architectural defaults for 64-bit Windows
Help and architectural defaults for HARDFP ARM

Known issues:

Search doesn't work with CPU throttling

Noteworthy compiler issues:

Basic info on compilers.
Gcc's violation of the x86 ABI causes seg faults/bus errors when mixed with other compilers
During config, I get a bunch of 'warning: the use of `tmpnam' is dangerous' warnings.
Installing gcc under unix without being root
Make shared fails with something like: ld: cannot find -lgfortran
How about C++ header files for the C interfaces?

System problems & user tips:

How do I improve performance after an install?
Improving ATLAS small case performance via malloc
My performance drops off for very large problem (N > 1500)
Your install dies with "unable to get timings in tolerance"
Installing with a non-default f77 compiler
Installing additional f77 interfaces, or changing the f77 interface after installation
What happens if I install with no Fortran compiler?
How do I link with all these libraries?
My system doesn't have the -f option to cp
How do I restart a install from scratch?
How do I restart an interrupted install?
How do I do I get rid of all the object files?
When linking ATLAS's testers, I'm getting a bunch of undefined BLAS symbols (eg. dgemm_, dgemv_, etc).
I'm linking with C, and getting missing symbols (such as w_wsfe, do_fio, w_esfe or s_stop).
Problems with linking/missing LAPACK routines on OS X

[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home] [Timings]

HERK/SYRK sometimes reads C for beta=0 case, causing NaN propogation

This error is caused by errors in the packed level 3. Upgrade to ATLAS 3.10.3 to fix.

Error in ROTMG causes failures in modern lapack tests

ATLAS 3.10.2 has a bug in ROTMG. To fix, save this file to ATLAS/src/blas/level1/ATL_lamch.c, and then in BLDdir/src/blas/level1 issue: make lib.

SYRK2K/HER2K sometimes read C for beta=0 case, causing spurious NaN propogation

To fix, edit ATLAS/src/threads/blas/level3/ATL_tgemm_rkK.c, lines 72-73 should presently read:

   if (mb != NB && nb != NB) 
      mmk_bX = mmk = genmm;

Change this to:

   if (mb != NB && nb != NB) {  if (SCALAR_IS_ZERO(beta))  Mjoin(PATL,gezero)(mb, nb, C, ldc);
      mmk_bX = mmk = genmm; }

Building using OpenMP

In general, this is a bad idea, since OpenMP tends to much slower than pthreads for ATLAS use. However, if your own application uses OpenMP, sometimes pthread usage can slow down your own threads, making it worthwhile to damage ATLAS performance in order to improve your OpenMP performance. In ATLAS, thread affinity is the main reason pthreads wins against OpenMP, and so OS~X (and probably FreeBSD), which don't support real affinity, are the platforms where it makes sense to use OpenMP regardless. To force ATLAS to use OpenMP rather than pthreads, you must add the following flags to your configure line:

   -Si omp 1 -F alg -fopenmp

Help and architectural defaults for 64-bit Windows

When installing the 64-bit libraries under Windows, you cannot use any of the ATLAS assembly files since Windows decided not to use the standard AMD64 assembly ABI. Without architectural defaults a typical Windows install can take most of a day. In order to help with this, we are therefore providing a tarfile containing 64-bit Windows architectural defaults:

Download the tarfile
Untar it in some directory (tar xvf win64_archdef.tar), for example /home/whaley/TEST
Add the following flag to configure (substitute your path for mine):
```
      -Ss ADdir /home/whaley/TEST/WINAD
```

Right now, the tarfile just has defaults for Core2SSE364, which is the only Windows machine that I have access to. It is my hope that Windows users will submit their own architectural default to me, so that I can expand this tarfile for the Windows community.

If you have a good install on 64-bit Windows, please create the architectural defaults as outlined in the atlas developer guide, and submit it to me, either via e-mail or the patches tracker.

Installing ATLAS on a HARDFP ARM

As part of the unfathomable mass of complexity that is the ARM ecosystem, there are serveral ABIs. ATLAS natively supports the "softfp" ABI, where floating point arguments are passed through the integer registers. Ubuntu has recently switched from softfp ABI to hardfp, where floating point arguments are instead passed through floating point registers. Thanks to Tom Wallace of Vesperix, ATLAS now supports this ABI as well. To install for a HARDFP ABI system:

Download the architectural defaults tarfile
Untar it in some directory (tar xvf armhardfp_archdef.tar), for example /home/whaley/TEST
Edit the file ATLAS/CONFIG/src/atlcomp.txt and change all appearances of -mfloat-abi=softfp to -mfloat-abi=hard
- In vi, you can do this with the command: :g/=softfp/s//=hard/g

Add the following flags to configure (substitute your path for mine):

      -D c -DATL_ARM_HARDFP=1 -Ss ADdir /home/whaley/TEST/ARMHARDFP

Search doesn't work with cpu throttling.

By default, newer Linuxes (and probably other OSes) have CPU throttling turned on even for desktops in order to save power. Since the speed of your CPU is constantly changing, the ATLAS timing results become essentially meaningless. Therefore, to get good performance numbers (and thus a fast ATLAS library), be sure to turn off cpu throttling in your BIOS before installation. If you are using the machine for high performance code a lot, you may want to leave it off. You can also usually turn off CPU throttling in your OS, see the ATLAS install guide (ATLAS/doc/atlas_install.pdf) for further details. Windows users may find these directions helpful.

Unkillable and relentless 'warning: the use of `tmpnam' is dangerous' warnings from gcc.

During configure you will get a lot of warnings of the following form:

/tmp/ccq5b8sE.o(.text+0x852): In function `CmndResults':
config.c: warning: the use of `tmpnam' is dangerous,
better use `mkstemp'

This is normal, and not an error. Let me translate this message out of gnu-speak:

  Hey, idiot, would you stop using that pesky ANSI/ISO C standard and
  use this non-standard routine instead?

For maximal compatability, ATLAS hews to the ANSI/ISO 9899-1990 standard, and so I cannot make the proposed change. Unfortunately, this warning message is literally immortal: there exists in gcc no flag combination that I can discover that can turn the freaking thing off. So, every time you link one of the config programs that calls this standard routine, the linker outputs this message, even if you turn on the strict ANSI compatibility flag. I reported it as an error to the gcc folks, but they point out it is the linker/glibc people that generate the "warning" and immediately closed the tracker. Still seems wrong to me that the strict ANSI flag with all warning messages turned off insists on printing out a message warning about standard usage, but there appears to be little for me to do about it. Therefore, just ignore the hectoring, and don't worry about these immortal, bogus, annoying and repetitive "warnings".

Basic compiler information

The default compilers for ATLAS are always the freely-available GNU compilers. These are typically the fastest compilers to use with ATLAS. In particular, clang/LLVM presently fails to produce correct code for some operations, and is not performance competitive with gcc for any. Intel's compilers are not freely available, and do not guarantee IEEE compliance, and therefore are not recommended for installing ATLAS (you should be able to link gcc-compiled ATlAS libraries with your own Intel compiler-built application without problems).

The ATLAS architectural defaults were all built with gcc 4.7.0, except on on PowerPCG4, where we lost machine access after supporting 4.6.2. Newer versions are likely OK, but earlier ones are usually not. If you use an older or newer version, be sure to run make time after completing your install to ensure that your compiler has not compromised your performance.

What happens if I install with no Fortran compiler?

You enable this by passing --nof77 to configure. In this case, ATLAS will still install correctly, but it will obviously not create the Fortran77 interface libraries. You will not be able to run the testers under the BLDdir/interfaces/ directory, since these testers are written in Fortran. Further, ATLAS expects that you will be comparing against a Fortran77 interface BLAS, and this will obviously not be the case, and so you will need to make the following changes if you want to run any of the ATLAS tester/timers, even the ones written in C:

Edit ATLAS/bin/l1blastst.c and change line 54 from:
```
#define USE_F77_BLAS
```
to:
```
#define USE_L1_REFERENCE
```
Edit ATLAS/bin/l2blastst.c and change line 46 from:
```
#define USE_F77_BLAS
```
to:
```
#define USE_L2_REFERENCE
```
Edit ATLAS/bin/l3blastst.c and change line 46 from:
```
#define USE_F77_BLAS
```
to:
```
#define USE_L3_REFERENCE
```
Edit ATLAS/bin/gemmtst.c and add the following line at the top of the file:
```
#define TRUST_SMALL
```

Installing with a non-default f77 compiler

The only Fortran routines in ATLAS are the Fortran77 interface routines, which do no computation. Therefore, the Fortran77 compiler has absolutely no effect on ATLAS's performance, and so the only reason you should need to use a non-default f77 compiler is if the f77 compiler you wish to use does not interoperate with ATLAS's default compiler.

To install with a non-default f77 compiler, simply override the default fortran compiler and flags from the command line when running config. This can be done by adding the following flags to configure:

   -C if /path/to/f77comp -F if 'f77 compiler flags'

Installing additional f77 interfaces

If you want to install ATLAS so it can be called from multiple, non-interoperable Fortran compilers (or indeed, have already installed with the wrong f77 compiler), you can do this with moderate ease, assuming you know how C and the given F77 compiler(s) interoperate. If you do not know this interoperational information, you must get configure to find it for you. To do this, create a bogus BLDdir directory (eg, mkdir bogus), and then run confgure from it, and overriding the default fortran compiler and flags from the command line as described here. You can then look at the generated Make.inc's settings for the macro F2CDEFS and replicate them, along with the new F77 compiler/linker information, into your original Make.inc.

For those user's already aware of the information needed for C/F77 interoperation, ATLAS needs three pieces of information in order to correctly handle F77/C interoperation, and this information appears as defines to the C compiler, set in your Make.inc's F2CDEFS.

The first macro controls the name space alterations necessary to make a C routine callable from Fortran77. The options are:

Add_: All F77-callable C routines should be lowercase, and have an underscore suffixed to their names.
Add__: All F77-callable C routines should be lowercase, have an underscore suffixed to their names, and if the F77 name itself posseses an underscore, two underscores should be suffixed.
NoChange: All F77-callable C routines should be lowercase, with no name alteration.
UpCase: All F77-callable C routines should be made uppercase, with no further name alteration.

The second macro provides a mapping between F77's INTEGER and the appropriate C integral type. Options are:

No definition: Default case where C's int corresponds to F77's INTEGER.
F77_INTEGER=long: F77's INTEGER corresponds to C's long.
F77_INTEGER=short: F77's INTEGER corresponds to C's short.

The third macro deals with F77 string handling. The options are:

StringSunStyle

The string's address is passed at the string's location on the stack, and the string's length is then passed as an F77_INTEGER after all explicit stack arguments.

CrayStyle

Special option for CRAY machines, which uses Cray's fcd (fortran character descriptor) for interoperation.

StringStructPtr

The address of a structure is passed by a Fortran77 string, and the structure is of the form:

      struct {char *cp; F77_INTEGER len;};

StringStructVal

A structure is passed by value for each Fortran77 string, and the structure is of the form:

      struct {char *cp; F77_INTEGER len;};

By default, ATLAS builds the F77 interface to the BLAS into the file pointed at by Make.inc's F77BLASlib, and so changing this macro before recompiling the interface will allow you to build multiple F77 interfaces.

For example, say on a Solaris machine I want to build the f77 interface for both Sun's f77 and gfortran. First, I install ATLAS as normal, with the default gfortran compiler. Now, to get a f77 interface lib, I edit my ATLAS/Make.SunOS_SunUS2, and I find that ATLAS has detected the C/F77 interface for gfotran as: Sun's f77 compiler as:

   F2CDEFS = -DAdd__ -DStringSunStyle

I then change this to match f77:

   F2CDEFS = -DAdd_ -DStringSunStyle

Now, so that my gfortran interface will not be overwritten, I also change:

   F77BLASlib = $(LIBdir)/libf77blas.a

to:

   F77BLASlib = $(LIBdir)/libsunf77blas.a

If I had built the threaded BLAS, I would make a similar change to PTF77BLASlib.

Finally, I change the f77 compiler/linker information from:

   F77 = /usr/local/bin/gfortran
   F77FLAGS = -O3 -funroll-all-loops

to:

   F77 = /opt/SUNWspro/bin/f77
   F77FLAGS = -dalign -native -xarch=v8plusa -xO5

Now, I cd BLDdir/interfaces/blas/F77/src/, and issue:

   make clean
   make lib

If you are using threads, additionally issue:

   make ptlib

Now, when linking with Sun's f77, I link to -lsunf77blas.a -latlas.a, and when linking with g77 I use -lf77blas.a -latlas.a

You can essentially repeat this process for the LAPACK F77 interface, but change LAPACKlib rather than F77BLASlib, and go to BLDdir/interfaces/lapack/F77/src rather than BLDdir/interfaces/blas/F77/src. Also, LAPACK does not have a separate entry point for threads, so do not issue any of the additional threading instructions.

Finally, in your BLDdir/src/testing directory, issue :

   make clean ; make lib

How do I link with all these libraries?

The user libs created by ATLAS are:

liblapack.a: The serial LAPACK routines provided by ATLAS.
libcblas.a: The ANSI C interface to the BLAS.
libf77blas.a: The Fortran77 interface to the BLAS.
libptlapack.a: The threaded (parallel) LAPACK routines provided by ATLAS.
libptcblas.a: The ANSI C interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libptf77blas.a: The Fortran77 interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libatlas.a: The main ATLAS library, providing low-level routines for all interface libs.

If you have missing symbols on link, make sure you are linking in all of the libraries you need, and remember that order *is* significant. For instance, a code calling the Fortran77 interface to the BLAS would need:

   -L$(MY_BLDdir)/lib/ -lf77blas -latlas

The full LAPACK library created by merging ATLAS and netlib LAPACK requires both C and Fortran77 interfaces, and thus that serial link line would be:

   -L$(MY_BLDdir)/lib/ -llapack -lf77blas -lcblas -latlas

While the threaded LAPACK link would be:

   -L$(MY_BLDdir)/lib/ -lptlapack -lptf77blas -lptcblas -latlas

Where $(MY_BLDdir) should be replaced by the directory where you have built your ATLAS.

Why am I failing more LAPACK tester cases?

The LAPACK testers have been hand-tuned to work with the reference BLAS, and thus these BLAS almost always produce the least amount of failures (though you typically get some failures even with LAPACK's BLAS). If you scope the output files, you can quickly get an idea of which failures are serious by looking at the residuals. Residuals that are of size O(100) are typically not real failures, but merely a result of differing order of flops from what the ref blas do (which is legal and expected). In fact, on many platforms ATLAS achieves noticably less error than the reference BLAS, but since the tester has been so heavily tuned to the reference BLAS, these more accurate results cause more failures. You can see a simple example of this on most x86 platforms by compiling the reference BLAS with SSE only, which results in 64/32-bit precision. Then compile the same BLAS to use the x87 unit (which has 80-bit precision, though the gcc or the code can sometimes drop back to 64/32-bit precision briefly), and you will find you have more errors, despite having at least the same precision in all cases, and usually much more precision.

Real errors have residuals very large residuals (eg., 10e15). However, even these kinds of errors may be a result of the LAPACK tester being completely tuned to the LAPACK BLAS implementation. For instance, the DGER operation is supposed to do A += alpha * x * y. The reference BLAS perform A += x*(alpha*y). ATLAS does this or A += (alpha*x)*y, whichever is cheaper. This causes is the LIN testers to fail quite a few residual checks with size 10e15, even though both are legal. I have modified ATLAS to avoid these spurious GER problems, but not all of the LAPACK testers' reliance on fixed orderings can be fixed.

Installing gcc under unix without being root

You do not need to be root to install a gcc that will deliver decent performance for ATLAS. Here is the shell script I used to install gcc4.7.0 on all my machines:

#!/bin/sh
# ***************************************************************************
# This script adapted from:
#    http://advogato.org/person/redi/diary/240.html
# For use in getting a standard gcc 4.7 for ATLAS stable 3.10 testing
#
# Assumes M4 is already installed; can get it from package manager on
# any Linux that I know of.
# ***************************************************************************
instd=/home/whaley/local/gcc4.7.0   # change this to your path
bldd=/home/whaley/TEST              # change to your build dir, /tmp OK
np=12                               # set this to your number of cores
# ------------------------------------------------------------
# Get and unpack the files, and stick them in common directory
# ------------------------------------------------------------
#cd ${bldd}
#mkdir GCC4.7.0
#cd GCC4.7.0
#wget http://www.netgull.com/gcc/releases/gcc-4.7.0/gcc-4.7.0.tar.bz2
#wget http://www.netgull.com/gcc/infrastructure/gmp-4.3.2.tar.bz2
#wget http://www.netgull.com/gcc/infrastructure/mpc-0.8.1.tar.gz
#wget http://www.netgull.com/gcc/infrastructure/mpfr-2.4.2.tar.bz2
#  ---------------------------------------------------------
#  Assuming we have all needed packages in current directory
#  ---------------------------------------------------------
bunzip2 -c gmp-4.3.2.tar.bz2 | tar xf -
bunzip2 -c mpfr-2.4.2.tar.bz2 | tar xf -
gunzip -c mpc-0.8.1.tar.gz | tar xf -
bunzip2 -c gcc-4.7.0.tar.bz2 | tar xf -
mv gmp-4.3.2 gcc-4.7.0/gmp
mv mpfr-2.4.2 gcc-4.7.0/mpfr
mv mpc-0.8.1 gcc-4.7.0/mpc
mkdir MyObj
cd MyObj
../gcc-4.7.0/configure --prefix=${instd} --enable-languages=c,fortran
make -j ${np}
make install

Assuming you wget the exact same versions of the tarfiles that I show above, and put them in the common directory, then you should be able to use this shell script directly. Simply copy the above to a text file (say instgcc.sh), set the execute permission (chmod a+x instgcc.sh), and run it (./instgcc.sh).

NOTE: if your default gcc version is a lot older than the new one, there will often by library incompatabilities, which can cause linking, particularly of FORTRAN codes, to fail. If this occurs, you'll need to update your LD_LIBRARY_PATH. For instance, on my desktop lubuntu machine, I added the following line to my .cshrc after installing as above:

setenv LD_LIBRARY_PATH /home/whaley/local/gcc4.7.0/lib64:/home/whaley/local/gcc4.7.0/lib

(change line to export and put it in your .bashrc if you use the bash shell rather than the tcsh shell, as I do).

Post install tuning.

Here are some tips to improving ATLAS performance after an install:

To improve small-case performance, change the way malloc behaves.
Tune CachEdge, as shown here.

Improving ATLAS small case performance by changing malloc behavior

ATLAS allocates a buffer space for most GEMM calls. When I wrote it, my assumption was that only first call requires a switch to kernel space to do the allocation, and incurs the unneeded overhead of zeroing out the memory. However, by default Linux (as well as some other OSes, such as OS X) allocates non-trivial sized allocations using mmap, which means that when free is called, the memory is immediately returned to the system. Thus all malloc calls have extremely high overheads.

This is not a big problem if you are doing a large matrix multiply, where the cubic computation disguises this square cost. For small problems, though, the O(N**2) costs are actually dominant, and this type of malloc behavior effectively doubles them (at least). You should be able to change Linux's malloc behavior by setting these environment variables:

   setenv MALLOC_TRIM_THRESHOLD_ -1
   setenv MALLOC_MMAP_MAX_ 0

Once this is done, malloc should be cheaper, but ATLAS was tuned with the expensive malloc. Therefore, you may be able to get better small-case performance by rerunning the crossover search with these environment variables set (don't do this unless you are going to keep these settings whenever you use this library). You can rerun the search from the ATLAS/tune/blas/gemm/ARCH directory by issuing:

   make sRun_tfc pre=s
   make dRun_tfc pre=d
   make cRun_tfc pre=c
   make zRun_tfc pre=z

This search takes a *loooong* time, then to build the changes into the libraries, go to ATLAS/bin/ARCH, and issue:

   make xsl3blastst
   make xdl3blastst
   make xcl3blastst
   make xzl3blastst

Tuning CacheEdge.

CacheEdge is an Level 2 Cache blocking parameter; because it's effects are fairly subtle on most machines, it often goes wrong on machines experiencing any kind of load, causing performance to be be suboptimal. CacheEdge can improve performance by as much as 15%, and it can reduce ATLAS's memory usage as well.

In ATLAS/tune/blas/gemm/ARCH, issue make xdfindCE. Run

   ./xdfindCE -m [N] -n [N] -k [N]

where [N} is replaced by a very large number that is a multiple of your blocking factor. You want to make this number as large as you can stand to wait on, and this varies a great deal from machine to machine. A good guestimate for most machines might be around 2000.

You want to run this program several times to get a consensus idea of what a good setting would be. If a CacheEdge setting gets performance in the same range as no CacheEdge (CacheEdge of 0 is no CacheEdge in printout of xdfindCE), it is still recommended that you use that setting, since ATLAS with CacheEdge set will use less memory as problem sizes grows.

Once you have gotten an idea of what to set CacheEdge to, you can change it by editing ATLAS/include/ARCH/atlas_cacheedge.h. xdfindCE prints out data in KB, but atlas_cacheedge.h needs bytes, so multiply the xdfindCE result by 1024 to get the number you want to use in atlas_cacheedge.h.

Let's take an example. Say xdfindCE printed out this:

TA  TB       M       N       K   alpha    beta  CacheEdge       TIME    MFLOPS
==  ==  ======  ======  ======  ======  ======  =========  =========  ========

 T   N    1000    1000    1000    1.00    1.00          0      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         16      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         32      5.460    366.30
 T   N    1000    1000    1000    1.00    1.00         64      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00        128      5.260    380.23
 T   N    1000    1000    1000    1.00    1.00        256      5.240    381.68

Initial CE=256KB, mflop=381.68


Best CE=256KB, mflop=381.68

So we want to set CacheEdge to 1024*256 = 262144. atlas_cacheedge will look something like:

#ifndef ATLAS_CACHEEDGE_H
   #define ATLAS_CACHEEDGE_H
   #define CacheEdge 196608
#endif

If your initial install did not use CacheEdge, line 3 will be missing completely. If you don't have this line, you would simply add it, using the new value of 262144. In the above example, we would simply replace 196608 with 262144.

By successively editing this file and recompiling, for instance ATLAS/bin/ARCH/x[d,s,z,c]mmtst you can tune this value further. Many users expect that they should set CacheEdge to the actual size of their L2 cache. This is only rarely the best setting, mainly because L2 caches are normally combined data/instruction, and so a smaller setting, leaving room for instruction caching, is usually best. On some machines with large L2 caches, things like associativity, or even TLB issues, can make it more efficient to use a very small subset of the available cache.

Once you have set CacheEdge to the value you need, update all libs with the new setting by issuing make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your ATLAS/bin/ARCH directory.

x[pre]findCE usually takes the smallest CacheEdge setting possible, since this saves memory. For multiprocessor systems, however, it is vital to use as much of the available cache as possible so that the processors spend as little time contending for the bus as possible. Thus, you want to set CacheEdge to the largest value that gives decent results. I usually run xdfindCE a few times to get an idea of ranges, and then try the larger settings by running x[pre]l3blastst_pt. Remember that threaded timings have to use walltime, so make sure any speedup is repeatable before changing CacheEdge.

My performance drops off for very large problem (N > 1500)

A serial performance cliff is usually due to the normal install failing to set CacheEdge to any value, and then eventually ATLAS winds up using memory-saving algorithms that hurt performance. The solution is to set CacheEdge, so we use less workspace, while improving overall performance.

What you want to do is tune CachEdge, as shown here, but be sure to use very large problem sizes in order find CacheEdge.

Your install dies with "unable to get timings in tolerance"

This means that ATLAS could not get repeatable timings. There are several things that could cause this to happen. This could occur if the machine is heavily loaded or experiences a sudden surge in usage from another program, for instance. If this is the problem, simply keep restarting the install (as discussed below) until it finishes.

Another problem that could cause this is that ATLAS misdetected the peak of your machine, and is thus using an inadequate timing interval. You can see if this is happening by scoping how long each timing is taking. If it is very quick, and thus unrepeatable, you need to tell ATLAS to pump up the timing granularity. To do this, edit the files BLDdir/include/atlas_?sysinfo.h. Each of these four files will have a quantity called ATL_nkflop. Pump this quantity up by some significant factor until timings are regular. I usually increase it by a factor of 5 or 10. If the individual timings are then too slow, interrupt the process, and decrease the values.

Finally, the Level 1 timings very often display this problem even when the timing interval is sufficient. The most likely explaination of this non-repeatable timing problem involves inadequate cache flushing, but it has not been tracked down for sure. Regardless, the only way is to keep restarting the interrupted install until it completes, as explained below.

Most of the time, when an install dies in this way, you can just restart it, as outlined here. If this dies right away in the exact same timing, but without actually running the timing again, it means that the install process kept a record of the bad timing, and is just rereading it. You then need to remove the bogus timing record file. This file will be in the appropriate BLDdir directory under the res/ subdirectory. For instance, if you are dying in the level 1 tuning, the result files are stored in BLDdir/tune/blas/level1/res, and if you are in the gemm tuning they are in BLDdir/tune/blas/gemm/ARCH/res, etc. Your last message from the dying install should give you most info you need to figure out the result directory and the file. Just remove the file (or all the files of that precision, if you cannot figure out the specific file that is bad), and restart the install.

How do I restart an interrupted install?

If your ATLAS install was interrupted, and you have fixed the problem, you can usually safely (there are always exceptions; if the install died in the middle of an ar command, for instance, many systems cannot recover) restart the install by:

Edit your Make.inc and if the INSTFLAGS macro includes the flags -a 1 change them to: -a 0. This tells ATLAS not to recopy the arch defaults over your partially completed results.
Issuing "make" from your BLDdir directory.

How do I do I get rid of all the .o's?

Once you have done make install (and/or manually copied the libraries and include files you want), you can simply delete the entire OBJdir directory.

Make shared fails with something like: `ld: cannot find -lgfortran`

This is due to configure choosing a bad F77SYSLIB (in Make.inc) due to a faulty probe. This macro should include the path where your libgfortran.so can be found. You can fix this problem by manually changing this macro to have the correct path after configure. If you have this problem, simply fix this macro, and then reissue your make shared commands.

What you should set it to will vary by system, as you need to locate the correct library path. You can usually do this using the locate command. For instance, here's what I see on my system:

   drteeth>locate libgfortran.so
   /usr/lib/libgfortran.so.2
   /usr/lib/libgfortran.so.2.0.0
   /usr/lib/gcc/x86_64-linux-gnu/4.2/libgfortran.so
   /usr/lib/gcc/x86_64-linux-gnu/4.2/32/libgfortran.so
   /usr/lib32/libgfortran.so.2
   /usr/lib32/libgfortran.so.2.0.0

My ATLAS install is 64-bit, so I would set (in Make.inc):

   F77SYSLIB = -L/usr/lib/gcc/x86_64-linux-gnu/4.2/ -lgfortran

If you have multiple choices and don't know how to choose, it should work to try them until your shared build works.

Gcc's violation of the x86 ABI causes seg faults/bus errors when mixed with other compilers

Gcc violates the x86-32 ABI by mandating a 16-byte aligned stack, where the ABI mandates a 4-byte aligned stack. Therefore, if gcc is used to compile some routines that may be called from an ABI-compliant compiler, you may (depending on how lucky you are) get a seg fault or a bus error. Therefore, if you plan on mixing gcc with any compiler that does not extend the ABI in the same way, or if you want to use windows threads, you must tell gcc to turn off this misfeature, which may be done by passing -mpreferred-stack-boundary=2 to all gnu compilers. This problem has been noticed on Windows, but might occur anywhere gcc is not used to generate all of the object files. If you are compiling ATLAS with all gnu compilers, you can just pass:

   -Fa alg -mpreferred-stack-boundary=2

to configure. If you are building ATLAS with some mix of non-gnu and gnu compilers, then you need the above flag added only to the gnu compilers. You can do this at configure time by setting it for individual compilers, as described here, or you can configure as normal, but edit Make.inc before starting the build, and add the flag to all gnu compilers' flags.

Note that if you add -mpreferred-stack-boundary=2 to the fortran compiler flags, then g77/gfortran will cause the BLAS tester to die with a bus error/seg fault, because gfortran then aligns double precision arrays to 4-byte rather than 8-byte boundaries. ATLAS assumes native alignment on data types, and so the code seg faults. As long as you don't need to call the Fortran interface from a non-gnu compiler, then you can solve this problem by simply leaving the -mpreferred-stack-boundary flag off of for the Fortran interface. At configure time, you would pass:

   -Fa acg -mpreferred-stack-boundary=2

I reported these related problems (ABI non-conformance and non-native alignment) to the gcc folks, and got a range of opinions, the most startling of which was that whatever they do is the standard. On that link, one of the more helpful replies mentions that the non-native alignment problem should be fixed in gcc 4.4, but that the ABI non-compliance must be maintained in order to match the prior failure to comply with the ABI.

When linking ATLAS's testers, I'm getting a bunch of undefined BLAS symbols (eg. `dgemm_`, `dgemv_`, etc).

The ATLAS BLAS testers (x[s,d,c,z]l[1,2,3]blastst) expect to compare against a F77 interface BLAS library for performance and testing purposes. You get these missing symbols when your Make.ARCH's BLASlib is left blank, or does not point at a complete BLAS library. If you have a non-ATLAS BLAS built somewhere, point the BLASlib macro at it. If you don't, probably the easiest fix is probably to grab the Fortran77 reference BLAS tarfile, and build it into the required lib. If you don't want to do this, or don't have access to Fortran77, then you can have ATLAS test against its own C reference as discussed here.

I'm linking with C, and getting missing symbols (such as `w_wsfe`, `do_fio`, `w_esfe` or `s_stop`).

These kinds of symbols are Fortran library calls. The problem is that the C linker does not automatically find the Fortran libraries. The most common fix is to either link using your fortran linker, or to rewrite your code so that Fortran routines are not called. If you know where they are, you can also choose to link in the Fortran libraries explicitly.

Problems with linking/missing LAPACK routines on OS X

OS X has a built-in version of ATLAS, and uses the standard names for them. They may be less up-to-date and/or have less libs than something you install yourself; in particular, if you have a Fortran compiler, you can build a full lapack library, which Apple does not currently provide, and so many users want to install the standard ATLAS. Unfortunately, when searching for libs the compiler looks in the system areas where apple keeps its ATLAS libs before looking in directories supplied by -L. This means that if you use -L and -l for your linking, you always get Apple's modified ATLAS, rather than the one you installed. There are two fixes for this problem that I know of. First, you can just link to the full name and path, rather than using -L. For instance, change something like:

   gcc -o xtst test.c -L /home/whaley/TEST/ATLAS/build64/lib -lcblas -latlas

to:

   gcc -o xtst test.c /home/whaley/TEST/ATLAS/build64/lib/libcblas.a \
          /home/whaley/TEST/ATLAS/build64/lib/libatlas.a

The only other trick I'm aware of is to rename your ATLAS libraries so that the Apple versions will not override them.

How about C++ header files for the C interfaces?

Since ATLAS does not provide full OO C++ interfaces, I am reluctant to raise the expectation that it does by providing C++ specific header files. What I have always envisioned is the C++ programmer creating his own include files, such as:

>cat cppblas.h
extern "C" {
#include cblas.h
}

>cat cpplapack.h
extern "C" {
#include clapack.h
}

If you are a C++ programmer using ATLAS, and think differently, let me know.

How do I restart an install from scratch?

Simply do a rm -rf * in your BLDdir directory, and then reconfigure and build.

My system doesn't have the -f option to cp

If you take the following line, and put it in a file cp you make executable, and then put it in your path before your system cp, it should get rid of the -f option:

/bin/cp `echo $* | sed -e 's/-f / /'`