ATLAS 3.6.0 errata


[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home] [Timings]

ATLAS v3.6 released 12/22/03 (Merry Christmas!)

ATLAS errors:

Known issues:

Noteworthy compiler issues:

System problems & user tips:


[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home] [Timings]

SourceForge Logo ICL Logo

Error in complex gemm for A*A^T.

ATLAS has a special case to detect and exploit when the general C = A*B + C is really C = A*A^T + C. Essentially, in this case you only have one matrix to copy, not two, which means both a workspace and performance win. There is an error in this special case handling code which will cause ATLAS to produce the wrong answer most of the time for this case. The error affects only complex GEMM (CGEMM, ZGEMM), not real. To fix, change line 193 of ATLAS/src/blas/gemm/ATL_cmmJIK.c from:
         else Mjoin(PATL,col2blk_a1)(K, M, A, lda, pA, alpha);
to:
         else Mjoin(PATL,col2blk2_a1)(K, M, A, lda, pA, alpha);
(i.e., col2blk becomes col2blk2).

If you have already built the library before applying this fix, you can force a recompile by doing a make xcl3blastst xzl3blastst in ATLAS/bin after making the fix.

Large blocking factors hurt LAPACK performance for small N

If you build the combined lapack/ATLAS library, the present ILAENV always uses ATLAS's GEMM blocking factor for all routines. This causes a slowdown in the cases where the minimum dimension of the calling routine is small compared to NB. For instance, on the Itanium, QR factorization will run at the speed of GEMV until something like N > 500. To fix this problem, save this file over the top of your present ATLAS/interfaces/lapack/F77/src/ilaenv.f file. If you have already built ATLAS and the combined LAPACK/ATLAS library, put this file into your library in a fashion similar to this:
   cd ATLAS/interfaces/lapack/F77/src/
   f77 -O -c ilaenv.f
   ar r /path/to/lapacklib/liblapack.a ilaenv.o

ATLAS threaded performance dies for large problems

On some platforms, if you time large problems, you'll see that ATLAS's threaded library does well, and then suddenly drops to below serial performance. This is probably an error in how we recover from lack of memory in the threaded case, but in the meantime the easy fix is to increase the total amount of memory ATLAS is allowed to allocate. To do this, edit the file ATLAS/include/atlas_lvl3.h, and pump up the macro ATL_MaxMalloc, which is the maximal size (in bytes) ATLAS is allowed to allocate. It is presently set to be 16MB (or less, for older releases); make it as large as you think you can afford.

Search doesn't work with cpu throttling.

By default, newer Linuxes (and probably other OSes) have CPU throttling turned on even for desktops in order to save power. Since the speed of your CPU is constantly changing, the ATLAS timing results become essentially meaningless. Therefore, to get good performance numbers (and thus a fast ATLAS library), be sure to turn off cpu throttling in your BIOS before installation. If you are using the machine for high performance code a lot, you may want to leave it off. You can also usually turn off cpu throttling in your OS, but this varies by OS. The file ATLAS/INSTALL.txt from the latest developer release has further details.

Problems installing on a G5.

3.6 does not have explicit support for the G5, but the developer releases, starting with 3.7.10, have full G5 support, including 32 bit OS X, and 32 and 64 bit Linux support. I recommend you use a developer release for a G5 install.

Apple's cpp prevents install on newer G4 and G5

For newer versions of Apple's modified gcc, you need to add the flag -no-cpp-precomp to all your various C flags. If you still have problems during installation using Apple's modified gcc, your best bet at the moment is to install the newest gcc from gnu and use it instead. Some directions for installing gcc without being root are given here, though note that unlike x86 users, you want the newest gcc.

Misdetection of nregs causes some x86 installs to go awry

Using later gcc compilers, ATLAS's mmsearch may misdetect that you have more than 15 registers available, which can cause it to skip searching the most optimal cases. To fix, add the following lines to line 824 of ATLAS/tune/blas/gemm/mmsearch.c before your install:
   #if defined(ATL_GAS_x8632) || defined(ATL_GAS_x8664)
      return(0);
   #endif
Note that this problem will not affect you if are using the architectural defaults.

make kill from topdir removes libraries

The documentation says "make killall" from TOPdir removes all architecture-specific subdirectires, and "make kill" removes all except the libraries. Actually, due to an error, they both kill everything. To fix, edit ATLAS/Make.top, and delete line 215, which is:
        rm -rf lib/$(arch)

Error in UltraSPARC cleanup

There is an error in the cleanup code for the UltraSPARC processors, that can cause segmentation faults when [d,z]GEMM is called with K mod NB = 8. To fix, change line 275 of ATLAS/tune/blas/gemm/CASES/ATL_dmm4x4x8_US.c from
#if (KB != 2)
to:
#if (KB != 8)
before installation. If you have already installed, make the change, and then issue the following commands (substituting your arch string for ARCH, and giving full path where indicated):
   cd ATLAS/tune/blas/gemm/ARCH
   rm [d,z]install
   make dinstall
   make zinstall
   cd ATLAS/bin/ARCH
   make dmmlib zmmlib dl3lib zl3lib

Error in compiler flags for gcc/USIII arch defaults

When using the architectural defaults for gcc/USIII, we need to add additional compiler flags for gcc installs that don't have them on by default. Change line 8 of the two files ATLAS/tune/blas/gemm/CASES/[s,c]cases.flg from:
-O -fomit-frame-pointer -fno-schedule-insns -fno-schedule-insns2
to:
-mcpu=ultrasparc -mtune=ultrasparc -O -fomit-frame-pointer -fno-schedule-insns -fno-schedule-insns2

atlas_prefetch.h won't compile using Sun CC v 5 or earlier.

In order to use prefetch with Sun's cc, ATLAS includes the Sun header file sun_prefetch.h, which did not exist until Sun CC version 6. My guess is that without prefetch on the UltraSparc III, you are better off using gcc to compile your stuff than cc, and that the cc arch defaults are probably not good. However, if you want to use Sun CC anyway, you will need to modify atlas_prefetch.h. I haven't had time to scope this problem myself (and I'm not sure I have access to a machine with an old enough Sun CC anyway), but here's a fix the user who originally found the error mentioned: Change line 50 of ATLAS/include/atlas_prefetch.h from:
#elif defined(__SUNPRO_C) && defined(__sparc)
to:
#elif defined(__SUNPRO_C) && defined(__sparc) && __SUNPRO_CC >= 0x600
If you choose to do this, you will probably want to say no to arch defaults. I would also install using gcc, and compare performance to see which is better.

Assembler renaming problem for Windows machines

This problem primarily affects PIIISSE1 and ATHLONSSE1 installs under Windows, but any Windows user should apply it. To fix, change lines 94-95 of ATLAS/tune/blas/gemm/CASES/ATL_smm6x1x60_sse.c from:
.global ATL_USERMM
ATL_USERMM:
to:
#ifndef Mjoin
   #define Mjoin(pre, nam) my_join(pre, nam)
   #define my_join(pre, nam) pre ## nam
#endif

#if defined(ATL_OS_WinNT) || defined(ATL_OS_Win9x)
   #define ATL_AUSERMM Mjoin(_,ATL_USERMM)
#else
   #define ATL_AUSERMM ATL_USERMM
#endif
.global ATL_AUSERMM
ATL_AUSERMM:
and then install as normal.

String overrun in config for long compiler paths

Thanks to Yozo Hida for noticing that compilers with path length &ge 64 can cause a memory overwrite in config. To fix, Change line 3414 of ATLAS/config.c from:
   char comp[64], cflg[512], ln[512], tnam[256], archdef[256], mmdef[256];
to:
   char comp[512], cflg[512], ln[512], tnam[256], archdef[256], mmdef[256];

Applying a patch to your ATLAS directory

To apply an ATLAS patch to your existing directory tree, save the patch file (we will call it patchfile from here on out) in your ATLAS/ directory, and then issue:
   patch -p1 < patchfile

Unkillable and relentless 'warning: the use of `tmpnam' is dangerous' warnings from gcc.

During config you will get a lot (and if using a developer release, even more) warnings of the following form:
/tmp/ccq5b8sE.o(.text+0x852): In function `CmndResults':
config.c: warning: the use of `tmpnam' is dangerous,
better use `mkstemp'

This is normal, and not an error. Let me translate this message out of gnu-speak:

  Hey, idiot, would you stop using that pesky ANSI/ISO C standard and
  use this non-standard routine instead?

For maximal compatability, ATLAS hews to the ANSI/ISO 9899-1990 standard, and so I cannot make the proposed change. Unfortunately, this warning message is literally immortal: there exists in gcc no flag combination that I can discover that can turn the freaking thing off. So, every time you link one of the config programs that calls this standard routine, the linker outputs this message, even if you turn on the strict ANSI compatibility flag. I reported it as an error to the gcc folks, but they point out it is the linker/glibc people that generate the "warning" and immediately closed the tracker. Still seems wrong to me that the strict ANSI flag with all warning messages turned off insists on printing out a message warning about standard usage, but there appears to be little for me to do about it. Therefore, just ignore the hectoring, and don't worry about these immortal, bogus, annoying and repetitive "warnings".

Overriding config's default compiler info from the command line.

For some systems, ATLAS actually has good architectural defaults for more than one set of compilers. ATLAS usually defaults to the best performing set, as explained here. However, if you want to use a secondary compiler, config will fill in the flags and so on if you specify the compiler. Config will print a brief usage message if you make xconfig ; ./xconfig --help.

Examples always help. Here's the config line if I want to use Intel's compilers to build my Pentium 4 library:

   make xconfig ; ./xconfig -m icc -c icc -f ifort
Here's the command to get Intel fortran compiler, but gcc for all C routines:
   make xconfig ; ./xconfig -m gcc -c gcc -f ifort
On UltraSparc's, ATLAS defaults to using Sun's cc/f77. If you want gcc/g77 instead:
   make xconfig ; ./xconfig -m gcc -c gcc -f g77

You can also override the compiler to one that ATLAS does not know about, in which case you are likely to get garbage for your flags unless you override them as well. You will then have to compile without architectural defaults, which is likely to produce slower libraries, and guaranteed to produce a much longer install time. But, hey, feel free.

If you override the flags as well as the compiler, be aware that differing flags using a supported compiler may well decrease the performance of the library.

Windows users have a few windows-specific options, as explained here.

What should I do with my ev7/21364?

During config, say it is a ev6/21264 when prompted for the architecture. These architectural defaults get good performance on the ev7 (the ev7 uses the same core as the ev6).

Testing ATLAS with the LAPACK testers.

It takes a bit of hoop-jumping to get ATLAS to pass the LAPACK testers. First, the LAPACK testers have an error in them that causes them to flag the ATLAS TRSM as bad. This is explained briefly in the LAPACK errata. You can either ignore the incorrect errors generated by lapack in the xlintst? testers, or you can override ATLAS's TRSM in order to make them go away. To override, edit LAPACK's make.inc, and link in the F77 [c,s,d,z]trsm.o in BLASLIB before the ATLAS libs.

After this, you'll still get failures reported (in our last run on Linux_PIIISSE1, we had failures in cnep.out, csep.out, and ced.out), but these failures will happen with stock LAPACK and the F77BLAS.

Threaded code incorrect on alpha systems

The threaded code hangs on both OSF1/Tru64 and Linux OSes running on Dec/Compaq Alphas. The odd thing, is so does the previous stable release, 3.2, which used to run fine on the same machines. Considerable effort was spent attempting to figure this out, to no avail.

Your install dies with "unable to get timings in tolerance"

This means that ATLAS could not get repeatable timings. There are several things that could cause this to happen. This could occur if the machine is heavily loaded or experiences a sudden surge in usage from another program, for instance. If this is the problem, simply keep restarting the install (as discussed below) until it finishes.

Another problem that could cause this is that ATLAS misdetected the peak of your machine, and is thus using an inadequate timing interval. You can see if this is happening by scoping how long each timing is taking. If it is very quick, and thus unrepeatable, you need to tell ATLAS to pump up the timing granularity. To do this, edit the files ATLAS/include/ARCH/atlas_?sysinfo.h. Each of these four files will have a quantity called ATL_nkflop. Pump this quantity up by some significant factor until timings are regular. I usually increase it by a factor of 5 or 10. If the individual timings are then too slow, interrupt the process, and decrease the values.

Finally, the Level 1 timings very often display this problem even when the timing interval is sufficient. The most likely explaination of this non-repeatable timing problem involves inadequate cache flushing, but it has not been tracked down for sure. Regardless, the only way is to keep restarting the interrupted intall until it completes, as explained below.

Most of the time, when an install dies in this way, you can just restart it, as outlined here. If this dies right away in the exact same timing, but without actually running the timing again, it means that the install process kept a record of the bad timing, and is just rereading it. You then need to remove the bogus timing record file. This file will be in the appropriate architecture directory under the res/ subdirectory. For instance, if you are dying in the level 1 tuning, the result files are stored in /ATLAS/atlas3.4/ATLAS3.4.1/tune/blas/level1/ARCH/res, and if you are in the gemm tuning they are in /ATLAS/atlas3.4/ATLAS3.4.1/tune/blas/gemm/ARCH/res, etc. Your last message from the dying install should give you most info you need to figure out the result directory and the file. Just remove the file (or all the files of that precision, if you cannot figure out the specific file that is bad), and restart the install.

Install dies in tfc / ?Xover.h is incomplete

Change line 58 of ATLAS/tune/blas/gemm/tfc.c, from:
#define MAXALLOC (3*1024*1024*8)
to:
#define MAXALLOC (8*1024*1024*8)
and then restart your install from scratch.

ATLAS IA64 performance cut by more than 1/4 using Red Hat gcc

Using Red Hat's gcc 2.96-ia64-000717 compiler, ATLAS performance is decreased by almost a factor of five over previous or later compilers. I'm not sure what all versions of Red Hat are effected, so if you experience very poor ATLAS performance, the best solution appears to be to install gcc 3.2.

ATLAS build dies on Red Hat 7.0 and/or gcc 2.9[6,7]

Red Hat 7.0 shipped with a version of gcc not supported by GNU (GCC 2.96 and/or 2.97). It contains error(s) causing the ATLAS build to fail. Redhat has released a patch fixing the problems in the RH7.0 version, available here. If this doesn't work for you, the recommended fix for the problem is to install gcc 2.95.3 or 3.2.

Installing with a non-default f77 compiler

The only Fortran routines in ATLAS are the Fortran77 interface routines, which do no computation. Therefore, the Fortran77 compiler has absolutely no effect on ATLAS's performance, and so the only reason you should need to use a non-default f77 compiler is if the f77 compiler you wish to use does not interoperate with ATLAS's default compiler.

To install with a non-default f77 compiler, simply override the default fortran compiler and flags from the command line when running config as explained here.

Installing additional f77 interfaces

The only Fortran routines in ATLAS are the Fortran77 interface routines, which do no computation. Therefore, the Fortran77 compiler has absolutely no effect on ATLAS's performance, and so the only reason you should need to use a non-default f77 compiler is if the f77 compiler you wish to use does not interoperate with ATLAS's default compiler.

If you want to install ATLAS so it can be called from multiple, non-interoperable Fortran compilers (or indeed, have already installed with the wrong f77 compiler), you can do this with moderate ease, assuming you know how C and the given F77 compiler(s) interoperate. If you do not know this interoperational information, you must get config to find it for you. To do this, in your ATLAS/ directory, run config again overriding the default fortran compiler and flags from the command line as explained here. You will also want to use a different architecture name so as not to overwrite your good Make.ARCH, and tell config not to create all the unneeded architecture subdirectories (since you won't be using this Make.ARCH anyway). You can do this from the command line by adding something like -a F2C_BOGUS -D 0 to the ./xconfig command. You can then look at the generated Make.F2C_BOGUS's F2CDEFS for the appropriate settings, and replicate them, along with the new F77 compiler/linker information, into your original Make.ARCH.

For those user's already aware of the information needed for C/F77 interoperation, ATLAS needs three pieces of information in order to correctly handle F77/C interoperation, and this information appears as defines to the C compiler, set in your Make.ARCH's F2CDEFS.

The first macro controls the name space alterations necessary to make a C routine callable from Fortran77. The options are:

Add_
All F77-callable C routines should be lowercase, and have an underscore suffixed to their names.
Add__
All F77-callable C routines should be lowercase, have an underscore suffixed to their names, and if the F77 name itself posseses an underscore, two underscores should be suffixed.
NoChange
All F77-callable C routines should be lowercase, with no name alteration.
UpCase
All F77-callable C routines should be made uppercase, with no further name alteration.

The second macro provides a mapping between F77's INTEGER and the appropriate C integral type. Options are:

No definition
Default case where C's int corresponds to F77's INTEGER.
F77_INTEGER=long
F77's INTEGER corresponds to C's long.
F77_INTEGER=short
F77's INTEGER corresponds to C's short.

The third macro deals with F77 string handling. The options are:

StringSunStyle
The string's address is passed at the string's location on the stack, and the string's length is then passed as an F77_INTEGER after all explicit stack arguments.
CrayStyle
Special option for CRAY machines, which uses Cray's fcd (fortran character descriptor) for interoperation.
StringStructPtr
The address of a structure is passed by a Fortran77 string, and the structure is of the form:
      struct {char *cp; F77_INTEGER len;};
StringStructVal
A structure is passed by value for each Fortran77 string, and the structure is of the form:
      struct {char *cp; F77_INTEGER len;};

By default, ATLAS builds the F77 interface to the BLAS into the file pointed at by Make.ARCH's F77BLASlib, and so changing this macro before recompiling the interface will allow you to build multiple F77 interfaces.

For example, say on a Solaris machine I want to build the f77 interface for both Sun's f77 and g77. First, I install ATLAS as normal, with the default f77 compiler. Now, to get a g77 interface lib, I edit my ATLAS/Make.SunOS_SunUS2, and I find that ATLAS has detected the C/F77 interface for Sun's f77 compiler as:

   F2CDEFS = -DAdd_ -DStringSunStyle
I then change this to match g77:
   F2CDEFS = -DAdd__ -DStringSunStyle
Now, so that my Sun f77 interface will not be overwritten, I also change:
   F77BLASlib = $(LIBdir)/libf77blas.a
to:
   F77BLASlib = $(LIBdir)/libg77blas.a
If I had built the threaded BLAS, I would make a similar change to PTF77BLASlib.

Finally, I change the f77 compiler/linker information from:

   F77 = /opt/SUNWspro/bin/f77
   F77FLAGS = -dalign -native -xarch=v8plusa -xO5
to:
   F77 = /usr/local/bin/g77
   F77FLAGS = -O3 -funroll-all-loops
Now, I cd ATLAS/interfaces/blas/F77/src/SunOS_SunUS2, and issue:
   make clean
   make lib
If you are using threads, additionally issue:
   make ptlib
Now, when linking with Sun's f77, I link to -lf77blas.a -latlas.a, and when linking with g77 I use -lg77blas.a -latlas.a

You can essentially repeat this process for the LAPACK F77 interface, but change LAPACKlib rather than F77BLASlib, and go to ATLAS/interfaces/lapack/F77/src/SunOS_SunUS2 rather than ATLAS/interfaces/blas/F77/src/SunOS_SunUS2. Also, LAPACK does not have a separate entry point for threads, so do not issue any of the additional threading instructions.

Finally, in your ATLAS/src/testing/ARCH directory, issue :

   make clean ; make lib

How do I link with all these libraries?

The user libs created by ATLAS are:
liblapack.a
The LAPACK routines provided by ATLAS. If you want a full lapack library, the .o in this lib can be archived into the f77 lapack lib without error.
libcblas.a
The ANSI C interface to the BLAS.
libf77blas.a
The Fortran77 interface to the BLAS.
libptcblas.a
The ANSI C interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libptf77blas.a
The Fortran77 interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libatlas.a
The main ATLAS library, providing low-level routines for all interface libs.
If you have missing symbols on link, make sure you are linking in all of the libraries you need, and remember that order *is* significant. For instance, a code calling the Fortran77 interface to the BLAS would need:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -lf77blas -latlas
The full LAPACK library created by merging ATLAS and netlib LAPACK requires both C and Fortran77 interfaces, and thus that link line would be:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lf77blas -lcblas -latlas
If you wish to use threaded BLAS, you simply indicate those interface libs rather than the sequential. The above line for SMP would be:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lptf77blas -lptcblas -latlas

Basic compiler information

ATLAS has support for one or more compilers for every platform. In general, we provide gcc/g77 for most supported architectures, since these compilers are freely available. The only exception is IBM AIX platforms where we provide xlc defaults only.

For each platform, ATLAS defaults to using the fastest available compiler. If the two compilers deliver roughly the same ATLAS performance, we then pick the one we think is most standard for users of that platform, and if we are unsure, we pick gcc/g77, since they are freely available. Here's a small table outlining some of ATLAS's present architectural default compiler support:

ARCHCOMP1COMP2FASTER?
PIII/P42.95&le gcc &ge 3.1icc 8.0SAME
other x862.95&le gcc &ge 3.1NONENA
Itanium 1gcc 3.xNONENA
Itanium 2icc 8.0gcc 3.3MUCH
Ultra2/5Sun ccgcc 3.2SAME
Ultra IIISun ccgcc 3.2YES
PPCG4Apple ccgcc &ge 3.3SOME

You can vary the compiler config selects as described here.

In this table, COMP1 is the compiler ATLAS defaults to using if no arguments are passed to config. FASTER indicates if the default choice is faster, or roughly the same speed as the secondary choice. For some archs, we specify ranges of acceptable compilers, where we know they get good performance. For others, we just list what compiler version was used in generating the defaults. If a range is noted, make absolutely sure you use a compiler in this range, as failure to do so may cut your performance in half.

As of this writing, gcc 3.2 or greater works well on all architectures. gcc 2.95 works just as well for x86 archs, if that's what you have installed. For x86 platforms, you will get pretty much the same performance if you choose to install with Intel's icc compiler. On the Itanium 2, however, we highly recommend using icc, as it provides a significantly faster ATLAS library than gcc. We do not have access to an Itanium 1 any longer, so were unable to add icc architectural defaults. It is likely it would get better performance from icc, though.

Compilers to avoid

We have already discussed the various gcc versions, so now we turn our attention to other compilers.

For x86, gcc 2.95.x gcc is substantially better than Portland Group's pgcc (the worst compiler to compile ATLAS with), MSVC++, or Watcom C. If you wish to use these compilers with your own code, they interoperate with gcc if the correct flags are chosen. We suggest you compile ATLAS with gcc, and then link with your compiler of choice.

For UltraSparcs, gcc and cc are roughly the same, with cc having a slight edge, particularly for Level 1 and 2.

The Dec/Compaq alpha compiler is very good for most codes, but not so good for compiling ATLAS's matmul kernels. It has some optimizations that consume resources that cannot be turned off; in many cases, ATLAS is actually optimal already, so these optimizations prevent ATLAS from getting good performance.

My system doesn't have the -f option to cp

If you take the following line, and put it in a file cp you make executable, and then put it in your path before your system cp, it should get rid of the -f option:
/bin/cp `echo $* | sed -e 's/-f / /'`

How do I install on a Intel Celeron?

Config will ask you what your hardware is. It is recommended that you set it to the underlying type of Pentium your Celeron actually is. For instance, very old Celerons are Pentium II, and the newest are Pentium 4s.

ATLAS fails Level 1 BLAS tester when compiled with gcc 2.95.2 on Compaq/DEC alphas

We have observed this problem, but not yet tracked it down. Since the exact same testers and code work correctly with older gccs (eg, 2.8 or 2.7), we suspect a compiler error. For now, the fix is to install gcc 2.8 or 2.7.

Building a complete LAPACK library

ATLAS does not provide a full LAPACK library. However, there is a simple way to get ATLAS to provide its faster LAPACK routines to a full LAPACK library. ATLAS's internal routines are distinct from LAPACK's, so it is safe to compile ATLAS's LAPACK routines directly into a netlib-style LAPACK library. First, download and install the standard LAPACK library from the LAPACK homepage. Then, in your ATLAS/lib/ARCH directory (where you should have a liblapack.a), issue the following commands:
  mkdir tmp
  cd tmp
  ar x ../liblapack.a
  cp <your LAPACK path & lib> ../liblapack.a
  ar r ../liblapack.a *.o
  cd ..
  rm -rf tmp

Just linking in ATLAS's liblapack.a first will not get you the best LAPACK performance, mainly because LAPACK's untuned ILAENV will be used instead of ATLAS's tuned one. So, if you use any LAPACK routine that is not provided by ATLAS, it is essential that you create this hybrid LAPACK/ATLAS library in order to get the best performance.

How do I restart a install from scratch?

From your ATLAS directory, issue :
   make killall arch=ARCH
   make startup arch=ARCH
   make install arch=ARCH

How do I restart an interrupted install?

If your ATLAS install was interrupted, and you have fixed the problem, you can usually safely (there are always exceptions; if the install died in the middle of an ar command, for instance, many systems cannot recover) restart the install by:

How do I do I get rid of all the .o's?

ATLAS does not have a working "make clean" that leaves the architecture-specific directory structure in place. Issuing "make kill arch=ARCH" in your ATLAS directory, however, will remove all architecture-specific subdirectories, with the exception of ATLAS/lib/ARCH, along with all related object files. Issuing "make killall arch=ARCH" gets rid of all architectural-specific subdirectories.

Do NOT use the -fno-f2c flag with g77

Haven't tracked this down in a while, but it appears to break quite a few things in fairly non-obvious ways for mixed g77/gcc libs.

What happens if I install with no Fortran compiler?

ATLAS will still install correctly, though it will obviously not create the Fortran77 interface libraries. You will not be able to run the testers under the ATLAS/interfaces/ directory, since these testers are written in Fortran. Further, ATLAS expects that you will be comparing against a Fortran77 interface BLAS, and this will obviously not be the case, and so you will need to make the following changes if you want to run any of the ATLAS tester/timers, even the ones written in C:

My performance drops off for very large problem (N > 1500)

This is usually due to the normal install failing to set CacheEdge to any value, and then eventually ATLAS winds up using memory-saving algorithms that hurt performance. The solution is to set CacheEdge, so we use less workspace, while improving overall performance.

What you want to do is tune tune CachEdge, as shown here, but be sure to use very large problem sizes in order find CacheEdge.

Post install tuning.

Here are some tips to improving ATLAS performance after an install:

Tuning CacheEdge.

CacheEdge is an Level 2 Cache blocking parameter; because it's effects are fairly subtle on most machines, it often goes wrong on machines experiencing any kind of load, causing performance to be be suboptimal. CacheEdge can improve performance by as much as 15%, and it can reduce ATLAS's memory usage as well.

In ATLAS/tune/blas/gemm/ARCH, issue make xdfindCE. Run

   ./xdfindCE -m [N] -n [N] -k [N]
where [N} is replaced by a very large number that is a multiple of your blocking factor. You want to make this number as large as you can stand to wait on, and this varies a great deal from machine to machine. A good guestimate for most machines might be around 2000.

You want to run this program several times to get a consensus idea of what a good setting would be. If a CacheEdge setting gets performance in the same range as no CacheEdge (CacheEdge of 0 is no CacheEdge in printout of xdfindCE), it is still recommended that you use that setting, since ATLAS with CacheEdge set will use less memory as problem sizes grows.

Once you have gotten an idea of what to set CacheEdge to, you can change it by editing ATLAS/include/ARCH/atlas_cacheedge.h. xdfindCE prints out data in KB, but atlas_cacheedge.h needs bytes, so multiply the xdfindCE result by 1024 to get the number you want to use in atlas_cacheedge.h.

Let's take an example. Say xdfindCE printed out this:

TA  TB       M       N       K   alpha    beta  CacheEdge       TIME    MFLOPS
==  ==  ======  ======  ======  ======  ======  =========  =========  ========

 T   N    1000    1000    1000    1.00    1.00          0      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         16      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         32      5.460    366.30
 T   N    1000    1000    1000    1.00    1.00         64      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00        128      5.260    380.23
 T   N    1000    1000    1000    1.00    1.00        256      5.240    381.68

Initial CE=256KB, mflop=381.68


Best CE=256KB, mflop=381.68
So we want to set CacheEdge to 1024*256 = 262144. atlas_cacheedge will look something like:
#ifndef ATLAS_CACHEEDGE_H
   #define ATLAS_CACHEEDGE_H
   #define CacheEdge 196608
#endif
If your initial install did not use CacheEdge, line 3 will be missing completely. If you don't have this line, you would simply add it, using the new value of 262144. In the above example, we would simply replace 196608 with 262144.

By successively editing this file and recompiling, for instance ATLAS/bin/ARCH/x[d,s,z,c]mmtst you can tune this value further. Many users expect that they should set CacheEdge to the actual size of their L2 cache. This is only rarely the best setting, mainly because L2 caches are normally combined data/instruction, and so a smaller setting, leaving room for instruction caching, is usually best. On some machines with large L2 caches, things like associativity, or even TLB issues, can make it more efficient to use a very small subset of the available cache.

Here are some CacheEdge settings that the ATLAS team has chosen: NOTE: these are out of date!
ArchL2 CacheCacheEdge
PPRO256K147456
PII512K262144
PIII512K262144
PIII256K249856
P4256K131072
Athlon256K131072
Athlon512K307200

Once you have set CacheEdge to the value you need, update all libs with the new setting by issuing make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your ATLAS/bin/ARCH directory.

Special hints for setting CacheEdge for multiprocessor machines

CacheEdge turns out to be very important to threaded performance. Unfortunately most of the default CacheEdge settings were obtained on single processor machines. So, you may well be able to see a substantial speedup by changing CacheEdge for your multiprocessor system.

The basic technique for finding CacheEdge is given here. Unfortunately, xdfindCE presently operates only on uniprocessor code, so what you want to use instead is varying CacheEdge and iteratively compiling and running x[pre]l3blastst_pt until you have a number you are happy with. It is vital to use a large problem. Use the largest problem you can stand to wait on for this many timing runs.

x[pre]findCE usually takes the smallest CacheEdge setting possible, since this saves memory. For multiprocessor systems, however, it is vital to use as much of the available cache as possible so that the processors spend as little time contending for the bus as possible. Thus, you want to set CacheEdge to the largest value that gives decent results. I usually run xdfindCE a few times to get an idea of ranges, and then try the larger settings by running x[pre]l3blastst_pt. Remember that threaded timings have to use walltime, so make sure any speedup is repeatable before changing CacheEdge.

Changing ATLAS's maximum buffer space.

Another way to tune ATLAS to your system is to vary the amount of buffer space ATLAS is allowed to allocate. In general, you want to set this as high as you can without causing swapping. If you have a machine with low memory, and you see dramatic slowdowns as the problem size goes up, you should definitely choose a smaller max size. ATLAS defaults to 4MB (except on the x86-64, where we default to 8MB). To vary this value, edit the file:
   ATLAS/include/atlas_lvl3.h
And set ATL_MaxMalloc to the maximal number of bytes that you want ATLAS to allocate internally. After editing, rebuild the libs by issuing:
   make xdl3blastst xcl3blastst xzl3blastst xsl3blastst
in your ATLAS/bin/ARCH directory.

Improving ATLAS small case performance by changing malloc behavior

ATLAS allocates a buffer space for most GEMM calls. When I wrote it, my assumption was that only first call requires a switch to kernel space to do the allocation, and incurs the unneeded overhead of zeroing out the memory. However, by default Linux (as well as some other OSes, such as OS X) allocates non-trivial sized allocations using mmap, which means that when free is called, the memory is immediately returned to the system. Thus all malloc calls have extremely high overheads.

This is not a big problem if you are doing a large matrix multiply, where the cubic computation disguises this square cost. For small problems, though, the O(N**2) costs are actually dominant, and this type of malloc behavior effectively doubles them (at least). You should be able to change Linux's malloc behavior by setting these environment variables:

   setenv MALLOC_TRIM_THRESHOLD_ -1
   setenv MALLOC_MMAP_MAX_ 0

Once this is done, malloc should be cheaper, but ATLAS was tuned with the expensive malloc. Therefore, you may be able to get better small-case performance by rerunning the crossover search with these environment variables set (don't do this unless you are going to keep these settings whenever you use this library). You can rerun the search from the ATLAS/tune/blas/gemm/ARCH directory by issuing:

   make sRun_tfc pre=s
   make dRun_tfc pre=d
   make cRun_tfc pre=c
   make zRun_tfc pre=z
This search takes a *loooong* time, then to build the changes into the libraries, go to ATLAS/bin/ARCH, and issue:
   make xsl3blastst
   make xdl3blastst
   make xcl3blastst
   make xzl3blastst

When linking ATLAS's testers, I'm getting a bunch of undefined BLAS symbols (eg. dgemm_, dgemv_, etc).

The ATLAS BLAS testers (x[s,d,c,z]l[1,2,3]blastst) expect to compare against a F77 interface BLAS library for performance and testing purposes. You get these missing symbols when your Make.ARCH's BLASlib is left blank, or does not point at a complete BLAS library. If you have a non-ATLAS BLAS built somewhere, point the BLASlib macro at it. If you don't, probably the easiest fix is probably to grab the Fortran77 reference BLAS tarfile, and build it into the required lib. If you don't want to do this, or don't have access to Fortran77, then you can have ATLAS test against its own C reference as discussed here.

I'm linking with C, and getting missing symbols (such as w_wsfe, do_fio, w_esfe or s_stop).

These kinds of symbols are Fortran library calls. The problem is that the C linker does not automatically find the Fortran libraries. The most common fix is to either link using your fortran linker, or to rewrite your code so that Fortran routines are not called. If you know where they are, you can also choose to link in the Fortran libraries explicitly

ATLAS performance is very bad using gcc 3.0 or Red Hat 7.[1-3]'s gcc ( 2.96-85)

On Athlons, the ATLAS group has confirmed a performance drop of almost a factor of 2 when using gcc 3.0 or 2.96-85. A user has reported a similar (though less severe) drop on a Pentium III. More details on this problem are given here. For now, the solution is to use a one of the older gccs. Any of the 2.x series previous to 2.96-85 should do; complete instructions are given here As far as ATLAS is concerned, gcc 3.0 is more like a new compiler than a new gcc version. This means that if you want to try using gcc 3.0 (only makes since on non-x86 platforms, at this point), you will need to say "no" to architectural defaults, since they are unlikely to be optimal. The only other platform aside from the Athlon we've tried gcc on is the ev6, where it beats earlier versions of gcc performance-wise, but only if you use different settings than the current architectural defaults.

Installing gcc under unix without being root

You do not need to be root to install a gcc that will deliver decent performance for ATLAS. I include below the exact steps I use to install the C compiler only in my own home area. Changing my home area path (given in the --prefix command to configure) to yours should allow you to do the same. These directions are for x86 users, where ATLAS needs gcc 2.95.3 for decent performance. They work pretty much the same for gcc 3.x, which is needed for best ev5/6 and UltraSparc performance. Note that these directions will install g77 as well. The fortran compiler is not needed for ATLAS performance, so if you want to use a different fortran compiler than this version of g77, simply omit f77 from the --enable-languages step.

Help for building ATLAS under windows

I myself do not use Windows as my primary OS, and often have difficulty getting access to various hardware/software combinations, as well as not being able to afford the software itself. There are people out there who use ATLAS on windows, and I'm going to provide some links here. These links are not supported by the ATLAS team, and may come out of date, and I cannot warrent there accuracy, however they may provide help that is not covered here:

Building ATLAS with a non-cygwin compiler

If you want to build ATLAS with a non-cygwin compiler (i.e., a native windows compiler such as Intel's icl or Compac's CVF), you will need to perform the following steps:
  1. Set your LIB environment variable correctly
  2. Tell config what compilers and flags to use
  3. Do some post-config Make.ARCH fiddling

Setting your LIB enviroment correctly

You need to set your LIB variable to be the union of all the windows compilers you will be using. All windows compiler require the VC libs, so start your string (modifying path & version info appropriately) something like this:
   export LIB="C:/Program Files/Microsoft Developer Studio/VC98/LIB;"

Now, if you want to use Compaq Visual Fortran, you'd add something like:

   export LIB="C:/Program Files/Microsoft Developer Studio/DF98/LIB;"$LIB

If you are using the Intel C compiler, you would need:

export LIB="C:/Program Files/Intel/CPP/COMPILER80/Ia32/Lib;"$LIB

I can't run the Intel F77 compiler, but you'd need a similar line to it's lib directory as well.

Back to Window compiler overview

Telling config about your windows compilers

Config presently knows how to handle three windows-specific compilers (in addition to the cygwin compilers g77 & gcc): Compaq's Visual Fortran (cvf), Microsoft Visual C (mvc), and Intel's C compiler (icc). Either Fortran compiler may be used to build the Fortran77 interface, with no effect on performance. ATLAS F77 compiler is controlled through config's -f flag.

ATLAS uses two compilers for C compilation. One compiler compiles the generated matrix multiply kernels (this compiler is called MCC, and is set using config's -m flag). The C compiler that compiles everything else is calls CC, and is set using the -c flag.

You can pretty much set any of the three ATLAS compiler macros to any of the supported compilers of the appropriate language. However, right now you cannot choose gcc as the CC compiler unless you also choose g77 as the Fortran compiler. This is a compiler bug, and I believe the gcc folks are already aware of it.

Along with this restriction, it should be noted that mvc is much slower than the other C compilers if it is used for MCC. Finally, you may want to use compilers for which there are architectural defaults. There are arch defaults for gcc for most platforms. The Intel compiler has arch defaults for P4 and PIII. Their are arch defaults for only CC on the Pentium 4 for mvc. Are you hopelessly lost yet? All this discussion is just for those who want to understand exactly what's going on. To cut to the chase, we provide a table of the more common scenarios and a some examples below.

This table shows some common Windows usages, listed by what compiler you use to perform the link (i.e., what compiler you are using for your own application). For a given Fortran & C interface that you use, the table then shows what each compiler-controlling flag of config should be set to for best performance. If a flag is set to NONE, that means you should not throw that flag at all. If a YOU LINK is set to NONE, that means you don't use that language interface to the BLAS in your application.

You Link./xconfig flags
F77C-c-m -f
g77/NONEgccNONENONENONE
CVFMVCmvcNONEcvf
iforticcicciccifort
g77/NONEMVCmvcNONENONE
ifortgcc/NONEmvcNONEifort

Therefore, if you are using gcc & g77 as your compilers, you take defaults all the way, and can start the install with:

   cd ATLAS ; make

If you link from mvc only, you probably want:

   cd ATLAS ; make xconfig ; ./xconfig -c mvc

If you link from cvf, you probably want:

   cd ATLAS ; make xconfig ; ./xconfig -c mvc -f cvf

All of these options should get roughly the same performance.

You can also vary the flags for each compiler on the command line, where -F says to set the flags, and the next argument says what flags to set, and the second argument what those flags should be. For instance, if you want to use CVF with Intel icc, but without the no mixed string modifier that ATLAS turns on by default, you would have to:

   make xconfig ; 
   ./xconfig -m icc -c icc -f cvf -F f '-fast -assume:accuracy_sensitive -fltconsistency'
NOTE: ATLAS always calls CVF with /iface:cref. Also, the accuracy and consistancy arguments are necessary if the testers are going to pass. Finally, the C interface testers cannot handle the CVF without the nomixed_str_len_arg, so you will die in the C interface testing (and thus the sanity tests) if you don't keep this flag (which is on by default). Therefore, if you override the default flags and don't specify nomixed, realize that failing the C interface tests comes with the territory, and reflects a tester failing, not necessarily an error in your library.

Back to Window compiler overview

Post-config Make.ARCH fiddling for windows

After config has run, you need to edit the Make.ARCH by hand and change one or two settings. First, you need to change TOPdir so it has the full cygwin-style path, including the cygdrive letter. For instance, I changed mine from:
   TOPdir = /home/Owner/ATLAS
to:
   TOPdir = /cygdrive/c/cygwin/home/Owner/ATLAS
You'll need to hunt around a bit if you don't remember where you installed cygwin.

Next, if you are using the Intel compilers and have the older Visual Studio 6, you need to add the flag

   -Qvc6
to both your MMFLAGS and CCFLAGS.

Back to Window compiler overview

Using pthreads under Windows

Right now, the only way to use pthreads under Windows is if you compile with gcc. Due to the aforementioned gcc bug, this currently means that your Fortran compiler must be g77. If this is acceptable, no extra steps are needed: just run config as normal, and say you want to use threads.

The reason ATLAS presently requires gcc to get you threaded support is that the supported Windows compilers don't supply the required include file pthread.h. My MSVC++ is version 6, and I suspect newer versions actually do provide pthreads. If they do, you can try to install, and it may work. If it does, however, you will still have a small problem. I am aware of no wall-timer under Windows, and you need a wall-timer to get meaningful timings of threaded code. ATLAS defaults to calling a cpu timer, which will make the install work, but will not give you any idea how fast your lib is. Therefore, you will need to turn on ATLAS's assembler timer.

To do this, before you do 'make install', Change line 110 of ATLAS/makes/Make.sysinfo from:

        $(CC) -c $(CCFLAGS) ATL_walltime.c
to:
        $(BC) -c $(BCFLAGS) ATL_walltime.c

Now, before install, but after config, edit the Make.ARCH created by config, and add the definition -DPentiumCPS=[MyMhz] to the CDEFS macro. For instance, on my 2.5Ghz P4, I would change the line:

   CDEFS = $(L2SIZE) $(INCLUDES) $(F2CDEFS) $(ARCHDEFS)
to:
   CDEFS = $(L2SIZE) $(INCLUDES) $(F2CDEFS) $(ARCHDEFS) -DPentiumCPS=2500

Forcing 3DNow! detection on SSE-enabled athlon

If ATLAS detects SSE, it will not use 3DNow! instructions even if present due to 3DNow's non-IEEE compliance. We highly recommend that you leave the ATLAS behavior like this, as underflow/overflow (which 3DNow! absolutely does not handle) never seems like a big deal until you get a completely incorrect answer. However, if you are certain that your code never produces under/overflow, and wish to use non-IEEE computations, you can force ATLAS to detect your 3DNow! capabilities by artificially causing the SSE probe to fail.

To do this, before you run config, edit ATLAS/CONFIG/probe_SSE?.c, and change the first executable line to be:

   printf("FAILURE\n"); exit(-1);
Note that this makes sense to do only on 32-bit Athlons: ATLAS's SSE code runs faster than it's 3DNow! code on hammer-based (eg., Athlon-64, Opteron), while at the same time being IEEE compliant.

How about C++ header files for the C interfaces?

Since ATLAS does not provide full OO C++ interfaces, I am reluctant to raise the expectation that it does by providing C++ specific header files. What I have always envisioned is the C++ programmer creating his own include files, such as:
>cat cppblas.h
extern "C" {
#include cblas.h
}

>cat cpplapack.h
extern "C" {
#include clapack.h
}
If you are a C++ programmer using ATLAS, and think differently, let me know.

Missing symbols when linking with g77 on OS X.

When linking AltiVec-enabled code under OS X using g77, I got missing symbols such as:
/usr/bin/ld: Undefined symbols:
restFP
saveFP

Problems with linking/missing LAPACK routines on OS X

OS X has a built-in version of ATLAS, and uses the standard names for them. They may be less up-to-date and/or have less libs than something you install yourself; in particular, if you have a Fortran compiler, you can build a full lapack library, which Apple does not currently provide, and so many users want to install the standard ATLAS. Unfortunately, when searching for libs the compiler looks in the system areas where apple keeps its ATLAS libs before looking in directories supplied by -L. This means that if you use -L and -l for your linking, you always get Apple's modified ATLAS, rather than the one you installed. There are two fixes for this problem that I know of. First, you can just link to the full name and path, rather than using -L. For instance, change something like:
   gcc -o xtst test.c -L /home/whaley/TEST/ATLAS/build64/lib -lcblas -latlas
to:
   gcc -o xtst test.c /home/whaley/TEST/ATLAS/build64/lib/libcblas.a \
          /home/whaley/TEST/ATLAS/build64/lib/libatlas.a
The only other trick I'm aware of is to rename your ATLAS libraries so that the Apple versions will not override them.

Config hangs in compiler search

Particularly on Solaris, ATLAS will sometimes hang in the search for valid compilers. The easist fix is to do several ctrl-C's (breaking out of config's find call, but not config) until config comes back and asks you for the compiler, and then you enter the full path.

Should I use the newer gcc 4.x rather than 3.x?

Gcc 4.2 and newer are good compilers. The 4.0 and 4.1 series are poor compilers on x86 machines, as discussed in this gcc buzilla report. If you are using the 3.7 series, which is highly recommended, and should become the new stable soon, then using gcc 4.2 is recommended for all platforms. This is true even on the PowerPC, where this gcc bug costs you some performance. For 3.6, you should probably stick with gcc 3, since that it what the arch defaults config flags are tuned for.

More generally, gcc 4 has produced slower or the same speed ATLAS code on all architectures we've tried, so I recommend using gcc 3.x if possible. If gcc 4 is the only compiler you've got, you will probably be OK if on an x86 arch, as ATLAS's assembly code will insulate you from many harmful effects. If you are on a non-x86 architecture, you should see if your your results are in line with those reported here. You may also want to install without arch defaults, and play with flags, and take the best library (def/nodef). All in all, installing gcc 3.4 is not that hard!

If you are on an Itanium, and cannot get gcc 3 work, you will need to edit ATLAS/tune/blas/gemm/CASES/?cases.flg, and change the compiler flag lines for ATL_mm6x8x8_1p.c and ATL_mm8x8x2.c to:

   -fomit-frame-pointer -O2 -fno-tree-loop-optimize
You should also set all your C compiler lines in your Make.ARCH to these flag values, and not use the architectural defaults.

If you must use gcc 4, note that there seems to be a compiler error (or perhaps an error in my understanding of C that isn't enforced in any other C compiler). You need to move the prototype of ATL_L2GE on lines 67 and 68 of ATLAS/bin/uumtst.c before the start of the function (gcc 4 can't take static func prototypes inside functions anymore).

Avoiding SSE3 for ease of installation

I will be producing a new stable with both optimization and config support for the SSE3 versions of architectures this summer. For right now, the best idea is to fool config into thinking your machine is an earlier version of the architecture that had only SSE2 (unless you are using the 3.7.1x, which has support SSE3 for the P4e only). Then, you can use the default compilers, flags, and architectural defaults. For pretty much everyone, this will produce a faster library than anything you build with SSE3 support. Note that you want gcc 3.x, since newer versions of the gcc still run slower, as explained here. If you are on a 64-bit architecture, use gcc, not icc, as icc didn't have 64-bit support when I added 64-bit support. This will all be fixed soon.

In order to make config detect only SSE2, rig the SSE3 probe to fail by commenting out line 77 of ATLAS/CONFIG/probe_SSE3.c. Line 77 is:

   if (testv3[0] != 3.0 || testv3[1] != 7.0)

Now, you can just use the compiler (gcc 3.x) and arch defaults, and install should go smoothly.