If you are on a machine that ATLAS identifies as AMD64K10h64SSE3,
then ATLAS has an error in its architectural defaults. To fix this
save
this file to ATLAS/CONFIG/AMD64K10h64SSE3.tgz
before doing configure.
S/C GEMM gets wrong answer when M is a multiple of 14
and NB is greater than 84
If the ATLAS search chooses a block factor that exceeds 84, then the file
ATL_smm14x1x84_sseCU.c can cause errors when it is used by ATLAS
to provide M-cleanup. To fix this problem, add the following lines to
the top of ATLAS/tune/blas/gemm/CASES/ATL_smm14x1x84_sseCU.c
#if KB > 84
#error "KB cannot exceed 84!"
#endif
#if (KB/4)*4 != KB
#error "KB must be a multiple of 4!"
#endif
Complex GEMM sometimes reads unitialized mem when BETA=0
This error can possibly occur when K dominates M and N. The fix is to
change lines 270-271 of ATLAS/src/blas/gemm/ATL_cmmJITcp.c from:
if (SCALAR_IS_ZERO(beta))
Mjoin(PATL,gezero)(M, N, C, ldc);
to:
Mjoin(PATLU,gezero)(M, N, pC, ldpc);
Mjoin(PATLU,gezero)(M, N, pC+ipc, ldpc);
Notice that we removed the SCALAR_IS_ZERO test as well as adding the second
call.
Complex GEMM sometimes reads C when BETA=0
This error can possibly occur when K dominates M and N. The fix is to
save
this file to ATLAS/src/blas/gemm/ATL_gereal2cplx.c. If you
have already built ATLAS, recompile with:
cd BLDdir/src/blas/gemm/
make zlib clib
ATLAS violates the BLAS standard by assigning vector
elements to zero rather than multiplying by zero in
[d,s,c,z]scal
As an optimization, ATLAS detects if the scalar passed to the Level-1 BLAS
routine SCAL is zero, and if it is, it simply assigns 0 to every element
of the vector. However, this violates the BLAS standard, and can have
unforseen affects for users depending on standard behavior (eg., users
that expect NaN propogation to work correctly when applying zero). So,
if you expect the standard behavior, you must apply this fix (and if you
expect ATLAS's unfixed behavior, stop it, since subsequent releases will
have this bug fixed).
The original user report
can be found here.
To fix, edit ATLAS/tune/blas/level1/scalsrch.c, and comment out
lines 750-759, which read:
fprintf(fpout, "%sif ( SCALAR_IS_ZERO(alpha) )\n", spc); fprintf(fpout, "%s{\n", spc); if (pre == 'c' || pre == 'z') { fprintf(fpout, "%s TYPE zero[2] = {ATL_rzero, ATL_rzero};\n", spc); fprintf(fpout, "%s Mjoin(PATL,set)(N, zero, X, incx);\n", spc); } else fprintf(fpout, "%s Mjoin(PATL,set)(N, ATL_rzero, X, incx);\n", spc); fprintf(fpout, "%s return;\n", spc); fprintf(fpout, "%s}\n", spc);I.e, change 750-759 of ATLAS/tune/blas/level1/scalsrch.c to:
/* fprintf(fpout, "%sif ( SCALAR_IS_ZERO(alpha) )\n", spc); fprintf(fpout, "%s{\n", spc); if (pre == 'c' || pre == 'z') { fprintf(fpout, "%s TYPE zero[2] = {ATL_rzero, ATL_rzero};\n", spc); fprintf(fpout, "%s Mjoin(PATL,set)(N, zero, X, incx);\n", spc); } else fprintf(fpout, "%s Mjoin(PATL,set)(N, ATL_rzero, X, incx);\n", spc); fprintf(fpout, "%s return;\n", spc); fprintf(fpout, "%s}\n", spc); */This fix needs to be applied prior to installation; if you have already installed, you will need to reinstall after the fix.
Configure does not correctly ID your P4E architecture
Change line 323 of ATLAS/CONFIG/src/backend/archinfo_x86.c
from:
case 4:to:
case 4: ; case 6:
Threading doesn't work for Windows/cygwin
ATLAS presently will not autodetect the number of processors under Windows,
so you must pass the '-t [# of processors]' flag to configure to force
threading to be enabled (eg., I add -t 4 to my Core2Quad configure
command).
The second problem is that cygwin's pthreads does not support setting the scope, and ATLAS requires that it works. To fix this, edit the file ATLAS/src/pthreads/misc/ATL_thread_init.c and change line 99 from:
ATL_assert(!pthread_attr_setscope(ATTR, PTHREAD_SCOPE_SYSTEM));to:
pthread_attr_setscope(ATTR, PTHREAD_SCOPE_SYSTEM);
ATLAS threaded performance dies for large
problems
On some platforms, if you time very large problems, you'll see that ATLAS's
threaded library does well, and then suddenly drops to below serial
performance. This is probably an error in how we recover from lack of
memory in the threaded case, but in the meantime the easy fix is to increase
the total amount of memory ATLAS is allowed to allocate. To do this, edit
the file ATLAS/include/atlas_lvl3.h, and pump up the macro
ATL_MaxMalloc, which is the maximal size (in bytes) ATLAS is allowed to
allocate. It is presently set to be 64MB;
make it as large as you think you can afford.
Search doesn't work with cpu throttling.
By default, newer Linuxes (and probably other OSes) have CPU throttling
turned on even for desktops
in order to save power. Since the speed of your CPU is constantly changing,
the ATLAS timing results become essentially meaningless. Therefore, to get
good performance numbers (and thus a fast ATLAS library), be sure to turn
off cpu throttling in your BIOS before installation. If you are using
the machine for high performance code a lot, you may want to leave it off.
You can also usually turn off CPU throttling in your OS, see
the ATLAS install guide (ATLAS/doc/atlas_install.pdf) for
further details. Windows users may find
these directions helpful.
ATLAS install fails on G4 when using gnu gcc instead
of Apple's gcc
In addition to your normal configure flags, pass the following to
configure when using gnu gcc on a system with AltiVec:
-D c -DATL_AVgccWe have reports that some users (depending on the way their gcc was installed) additionally had to pass the following to configure or configure fails to detect that their machine has the AltiVec unit, though we have not seen this problem in our own testing (again, probably due to the way we install gcc):
--cflags='-mregnames'
atlas_prefetch.h won't compile using Sun CC
v 5 or earlier.
In order to use prefetch with Sun's cc, ATLAS includes the Sun header file
sun_prefetch.h, which did not exist until Sun CC version 6.
ATLAS's architectural defaults use gcc anyway, which will almost certainly
get you better performance.
However, if you want to use Sun CC anyway, you will need
to modify atlas_prefetch.h. I haven't had time to scope this problem
myself (and I'm not sure I have access to a machine with an old enough
Sun CC anyway), but here's a fix the user who originally found the error
mentioned: Uncomment the version check on line 50 of atlas_prefetch.h
Change line 50 of SRCdir/include/atlas_prefetch.h from:
#elif defined(__SUNPRO_C) && defined(__sparc) /* && __SUNPRO_CC > 0x600 */to:
#elif defined(__SUNPRO_C) && defined(__sparc) && __SUNPRO_CC >= 0x600If you do this, you also want to install using gcc, and compare performance to see which is better.
Applying a patch to your ATLAS directory
To apply an ATLAS patch to your existing directory tree, save the patch
file (we will call it patchfile from here on out) in your
SRCdir/ directory, and then issue:
patch -p1 < patchfile
Gcc compiler bug hurts PowerPC Performance.
ATLAS 3.8 assumes gcc 4.2 (most recent gcc stable release at ATLAS release).
Gcc 4.2 is has a performance bug that normally cuts performance in half.
ATLAS works around this by throwing the flags:
-fno-schedule-insns -fno-rerun-loop-opt
This makes the performance drop caused by using gcc-4.2 roughly 1-7%, instead
of 50%. If you are using a gcc version prior to gcc-4.2, remove the these
flags from all your flag macros in your BLDdir/Make.inc after configure
but before install to get back this performance. If you are using a later
gcc release, you may be able to see if this bug has been fixed by scoping my
original bug report
.
Unkillable and relentless
'warning: the use of `tmpnam' is dangerous' warnings from gcc.
During configure you will get a lot of warnings of the following form:
/tmp/ccq5b8sE.o(.text+0x852): In function `CmndResults': config.c: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
This is normal, and not an error. Let me translate this message out of gnu-speak:
Hey, idiot, would you stop using that pesky ANSI/ISO C standard and use this non-standard routine instead?
For maximal compatability, ATLAS hews to the ANSI/ISO 9899-1990 standard, and so I cannot make the proposed change. Unfortunately, this warning message is literally immortal: there exists in gcc no flag combination that I can discover that can turn the freaking thing off. So, every time you link one of the config programs that calls this standard routine, the linker outputs this message, even if you turn on the strict ANSI compatibility flag. I reported it as an error to the gcc folks, but they point out it is the linker/glibc people that generate the "warning" and immediately closed the tracker. Still seems wrong to me that the strict ANSI flag with all warning messages turned off insists on printing out a message warning about standard usage, but there appears to be little for me to do about it. Therefore, just ignore the hectoring, and don't worry about these immortal, bogus, annoying and repetitive "warnings".
For each platform, ATLAS defaults to using the fastest available compiler.
If the two compilers deliver roughly the same ATLAS performance, we pick
gnu compilers, since they are freely available (assuming they compile
on the platform in question).
The ATLAS architectural defaults were all build with gcc 4.2, except on
IRIX/MIPS, where gcc was outperformed by SGI's cc. If you use
gcc 4.0 or 4.1, then your performance will be cut roughly in half on
all x86 platforms, so
such users should install and use gcc 4.2. On the x86, Gcc 3 gets better
performance than 4.1, but still less than 4.2. On gcc 3 platforms,
we recommend you install gcc 4.2, and do a mixed gcc3/4 build, as described
in the
ATLAS install guide.
ATLAS is built to use gcc-4.2 as the main compiler. Newer versions are
(hopefully) OK, but earlier ones are not. Be sure to run
make time after completing your install to ensure that your compiler
has not compromised your performance.
As of this writing, gcc-4.2 or newer (presumably) is necessary for all x86
archs. 4.1 or earlier
will cut your performance by as much as half, depending on the architecture.
The architectural defaults and compilers flags are unlikely to work for
gcc-3.
For all PowerPC/POWER systems, gcc 4.2 is actually slower than 4.1, as described
here. Gcc-3's late enough that they will take all
the flags suggested by ATLAS seem to perform just fine.
To install with a non-default f77 compiler, simply override the
default fortran compiler and flags from the command line when running
config. This can be done by adding the following flags to configure:
If you want to install ATLAS so it can be called from multiple,
non-interoperable Fortran compilers (or indeed, have already installed
with the wrong f77 compiler), you can do this with moderate
ease, assuming you know how C and the given F77 compiler(s) interoperate.
If you do not know this interoperational information, you must get
configure to find it for you. To do this, create a bogus BLDdir directory
(eg, mkdir bogus), and then run confgure from it, and overriding
the default fortran compiler and flags from the command line
as described here. You can then look at the
generated Make.inc's settings for the macro F2CDEFS
and replicate them, along with the new F77 compiler/linker information,
into your original Make.inc.
For those user's already aware of the information needed for C/F77
interoperation,
ATLAS needs three pieces of information in order to correctly handle
F77/C interoperation, and this information appears as defines to the
C compiler, set in your Make.inc's F2CDEFS.
The first macro controls the name space alterations necessary to make a
C routine callable from Fortran77. The options are:
The second macro provides a mapping between F77's INTEGER
and the appropriate C integral type. Options are:
The third macro deals with F77 string handling. The options are:
By default, ATLAS builds the F77 interface to the BLAS into the file pointed
at by Make.inc's F77BLASlib, and so changing this macro before
recompiling the interface will allow you to build multiple F77 interfaces.
For example, say on a Solaris machine I want to build the f77 interface
for both Sun's f77 and gfortran. First, I install ATLAS as normal, with the
default gfortran compiler. Now, to get a f77 interface lib, I edit my
ATLAS/Make.SunOS_SunUS2, and I find that ATLAS has detected the C/F77
interface for gfotran as:
Sun's f77 compiler as:
Finally, I change the f77 compiler/linker information from:
You can essentially repeat this process for the LAPACK F77 interface, but
change LAPACKlib rather than F77BLASlib, and go to
BLDdir/interfaces/lapack/F77/src
rather than
BLDdir/interfaces/blas/F77/src. Also, LAPACK does not
have a separate entry point for threads, so do not issue any of the additional
threading instructions.
Finally, in your BLDdir/src/testing directory, issue :
Real errors have residuals very large residuals (eg., 10e15). However, even
these kinds of errors may be a result of the LAPACK tester being completely
tuned to the LAPACK BLAS implementation. For instance, the DGER operation
is supposed to do A += alpha * x * y. The reference BLAS perform
A += x*(alpha*y). ATLAS does this or A += (alpha*x)*y,
whichever is cheaper. This causes is the LIN testers to fail quite
a few residual checks with size 10e15, even though both are legal. I have
modified ATLAS to avoid these spurious GER problems, but not all of the
LAPACK testers' reliance on fixed orderings can be fixed.
This is not a big problem if you are doing a large matrix multiply, where
the cubic computation disguises this square cost. For small problems, though,
the O(N**2) costs are actually dominant, and this type of malloc behavior
effectively doubles them (at least). You should be able to change
Linux's malloc behavior by setting these environment variables:
Once this is done, malloc should be cheaper, but ATLAS was tuned with
the expensive malloc. Therefore, you may be able to get better small-case
performance by rerunning the crossover search with these environment variables
set (don't do this unless you are going to keep these settings whenever you
use this library). You can rerun the search from the
ATLAS/tune/blas/gemm/ARCH directory by issuing:
In ATLAS/tune/blas/gemm/ARCH, issue make xdfindCE. Run
You want to run
this program several times to get a consensus idea of what a good setting
would be. If a CacheEdge setting gets performance in the same range as
no CacheEdge (CacheEdge of 0 is no CacheEdge in printout of xdfindCE),
it is still recommended that you use that setting, since ATLAS with
CacheEdge set will use less memory as problem sizes grows.
Once you have gotten an idea of what to set CacheEdge to, you can change it by
editing ATLAS/include/ARCH/atlas_cacheedge.h. xdfindCE
prints out data in KB, but atlas_cacheedge.h needs bytes, so multiply
the xdfindCE result by 1024 to get the number you want to use in
atlas_cacheedge.h.
Let's take an example. Say xdfindCE printed out this:
By successively editing this file and recompiling, for instance
ATLAS/bin/ARCH/x[d,s,z,c]mmtst you can tune this value further.
Many users expect that they should set CacheEdge to the actual size of their
L2 cache. This is only rarely the best setting, mainly because L2 caches
are normally combined data/instruction, and so a smaller setting,
leaving room for instruction caching, is usually best. On some machines
with large L2 caches, things like associativity, or even TLB issues, can
make it more efficient to use a very small subset of the available cache.
Once you have set CacheEdge to the value you need, update all libs
with the new setting by issuing
make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your
ATLAS/bin/ARCH directory.
The basic technique for finding CacheEdge is given
here. Unfortunately, xdfindCE
presently operates only on uniprocessor code, so what you want
to use instead is varying CacheEdge and iteratively compiling
and running x[pre]l3blastst_pt until you have a number
you are happy with. It is vital to use a large problem. Use
the largest problem you can stand to wait on for this many timing runs.
x[pre]findCE usually takes the smallest CacheEdge setting
possible, since this saves memory. For multiprocessor systems, however,
it is vital to use as much of the available cache as possible so that
the processors spend as little time contending for the bus as possible.
Thus, you want to set CacheEdge to the largest value that gives
decent results. I usually run xdfindCE a few times to get an
idea of ranges, and then try the larger settings by running
x[pre]l3blastst_pt. Remember that threaded timings have
to use walltime, so make sure any speedup is repeatable before changing
CacheEdge.
If you need to add full LAPACK after ATLAS has already been built,
in your BLDdir/lib/ directory (where you should have a
liblapack.a), issue the following commands:
Just linking in ATLAS's liblapack.a first will not get you the best LAPACK
performance, mainly because LAPACK's untuned ILAENV will be used instead
of ATLAS's tuned one. So, if you use any LAPACK routine that is not
provided by ATLAS, it is essential that you create this hybrid LAPACK/ATLAS
library in order to get the best performance.
A serial performance cliff is
usually due to the normal install failing to set CacheEdge to any
value, and then eventually ATLAS winds up using memory-saving algorithms
that hurt performance. The solution is to set CacheEdge, so we use less
workspace, while improving overall performance.
What you want to do is tune CachEdge, as shown here,
but be sure to use very large problem sizes in order find CacheEdge.
Another problem that could cause this is that ATLAS misdetected the peak
of your machine, and is thus using an inadequate timing interval. You
can see if this is happening by scoping how long each timing is taking.
If it is very quick, and thus unrepeatable, you need to tell ATLAS to pump
up the timing granularity. To do this, edit the files
BLDdir/include/atlas_?sysinfo.h. Each of these four files
will have a quantity called ATL_nkflop. Pump this quantity up
by some significant factor until timings are regular. I usually increase
it by a factor of 5 or 10. If the individual timings are then too slow,
interrupt the process, and decrease the values.
Finally, the Level 1 timings very often display this problem even when
the timing interval is sufficient. The most likely explaination of this
non-repeatable timing problem involves inadequate cache flushing, but it
has not been tracked down for sure. Regardless, the only way is to keep
restarting the interrupted install until it completes, as explained
below.
Most of the time, when an install dies in this way, you can just
restart it, as outlined here. If this dies
right away in the exact same timing, but without actually running
the timing again, it means that the install process kept a record of
the bad timing, and is just rereading it. You then need to remove the
bogus timing record file. This file will be in the appropriate BLDdir
directory under the res/ subdirectory. For instance, if you are
dying in the level 1 tuning, the result files are stored in
BLDdir/tune/blas/level1/res,
and if you are in the gemm tuning they are in
BLDdir/tune/blas/gemm/ARCH/res, etc. Your
last message from the dying install should give you most info you need
to figure out the result directory and the file. Just remove the file
(or all the files of that precision, if you cannot figure out the
specific file that is bad), and restart the install.
For the Intel compilers, the C compiler (icl) seemed to work
out-of-the-box once I had done the MSVC++ environment variables setup
outlined above. To get the Fortran compiler (ifort) rolling,
I had to additinally add:
These same changes can be made to /usr/X11R6/bin/starxwin.bat
if you use that batchfile to start xterms instead of cygwin windows.
If you need to set up the environment after bash has already started running,
things are a little more difficult. Take
this file, modify
the paths to match your system, and then source it from the bash prompt.
In order to use ifort for the F77 interfaces, add the flag
In order to compile the C interface routines with CL (the command
line interface to MSVC++), add the flag:
It may be important to have a library that you can link without linking to
libcygwin.a, which is a GPL (not LGPL) library. To say that
you want to avoid this library, pass the following flag to configure:
Back to Window compiler overview
Some of the time, depending on how your programming environment is setup,
you also don't need to change anything to use ifort as your
linker. Occasionally,
in order to link using the using ifort, with the rest of the library
(including the interface routines) compiled with gcc, you need to
explicitly link
to both gcc's libgcc.a (which uses the LGPL, not GPL, license),
and MSVC's libcmt.lib. Unfortunately, the spaces in the typical
windows paths pose a challenge when called from the commandline, and so you
typically need to copy the file over to the cygwin directory (logical links
don't work). Therefore, before modifying or Make.inc, issue a
command like (adjust the paths to your system):
Wait until the ATLAS build is complete before modifying the Make.inc to point
at these libraries (but you must make these changes before doing
make check or make time). First, try to do a
make check before trying these mods, as you may not need to
make them! If this fails,
set your LIBS flag in Make.inc
something like (modifying the paths as necessary):
Back to Window compiler overview
What you should set it to will vary by system, as you need to locate the
correct library path. You can usually do this using the locate
command. For instance, here's what I see on my system:
Note that if you add -mpreferred-stack-boundary=2 to the
fortran compiler flags, then g77/gfortran will cause the BLAS tester
to die with a bus error/seg fault, because gfortran then aligns double
precision arrays to 4-byte rather than 8-byte boundaries. ATLAS assumes
native alignment on data types, and so the code seg faults. As long as
you don't need to call the Fortran interface from a non-gnu compiler, then
you can solve this problem by simply leaving the
-mpreferred-stack-boundary
flag off of for the Fortran interface. At configure time, you would pass:
I reported these related problems (ABI non-conformance and non-native
alignment) to the gcc folks, and got a range of opinions, the most
startling of which was that
whatever
they do is the standard. On that link, one of the more helpful
replies mentions that the non-native alignment problem should be fixed
in gcc 4.4, but that the ABI non-compliance must be maintained in order
to match the prior failure to comply with the ABI.
Basic compiler information
ATLAS has support for one or more compilers for every platform. In general,
we provide gcc/gfortran for most supported architectures, since these compilers
are freely available. What happens if I install with no Fortran compiler?
You enable this by passing --nof77 to configure. In this case,
ATLAS will still install correctly, but it will obviously not create the
Fortran77 interface libraries. You will not be able to run the
testers under the BLDdir/interfaces/ directory, since these testers
are written in Fortran. Further, ATLAS expects that you will be comparing
against a Fortran77 interface BLAS, and this will obviously not be the
case, and so you will need to make the following changes if you want to
run any of the ATLAS tester/timers, even the ones written in C:
#define USE_F77_BLAS
to:
#define USE_L1_REFERENCE
#define USE_F77_BLAS
to:
#define USE_L2_REFERENCE
#define USE_F77_BLAS
to:
#define USE_L3_REFERENCE
#define TRUST_SMALL
Installing with a non-default f77 compiler
The only Fortran routines in ATLAS are the Fortran77 interface routines,
which do no computation. Therefore, the Fortran77 compiler has absolutely
no effect on ATLAS's performance, and so the only reason you should need
to use a non-default f77 compiler is if the f77 compiler you wish to use
does not interoperate with ATLAS's default compiler.
-C if /path/to/f77comp -F if 'f77 compiler flags'
Installing additional f77 interfaces
The only Fortran routines in ATLAS are the Fortran77 interface routines,
which do no computation. Therefore, the Fortran77 compiler has absolutely
no effect on ATLAS's performance, and so the only reason you should need
to use a non-default f77 compiler is if the f77 compiler you wish to use
does not interoperate with ATLAS's default compiler.
struct {char *cp; F77_INTEGER len;};
struct {char *cp; F77_INTEGER len;};
F2CDEFS = -DAdd__ -DStringSunStyle
I then change this to match f77:
F2CDEFS = -DAdd_ -DStringSunStyle
Now, so that my gfortran interface will not be overwritten, I also change:
F77BLASlib = $(LIBdir)/libf77blas.a
to:
F77BLASlib = $(LIBdir)/libsunf77blas.a
If I had built the threaded BLAS, I would make a similar change to
PTF77BLASlib.
F77 = /usr/local/bin/gfortran
F77FLAGS = -O3 -funroll-all-loops
to:
F77 = /opt/SUNWspro/bin/f77
F77FLAGS = -dalign -native -xarch=v8plusa -xO5
Now, I cd BLDdir/interfaces/blas/F77/src/, and issue:
make clean
make lib
If you are using threads, additionally issue:
make ptlib
Now, when linking with Sun's f77, I link to
-lsunf77blas.a -latlas.a, and
when linking with g77 I use -lf77blas.a -latlas.a
make clean ; make lib
How do I link with all these libraries?
The user libs created by ATLAS are:
If you have missing symbols on link, make sure you are linking in all of the
libraries you need, and remember that order *is* significant.
For instance, a code calling the Fortran77 interface to the BLAS would need:
-L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -lf77blas -latlas
The full LAPACK library created by merging ATLAS and netlib LAPACK requires
both C and Fortran77 interfaces, and thus that link line would be:
-L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lf77blas -lcblas -latlas
If you wish to use threaded BLAS, you simply indicate those interface libs
rather than the sequential. The above line for SMP would be:
-L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lptf77blas -lptcblas -latlas
Why am I failing more LAPACK tester cases?
The LAPACK testers have been hand-tuned to work with the reference
BLAS, and thus these BLAS almost always produce the least amount of failures
(though you typically get some failures even with LAPACK's BLAS). If you
scope the output files, you can quickly get an idea of which failures are
serious by looking at the residuals. Residuals that are of size O(100) are
typically not real failures, but merely a result of differing order of flops
from what the ref blas do (which is legal and expected). In fact, on many
platforms ATLAS achieves noticably less error than the reference BLAS,
but since the tester has been so heavily tuned to the reference BLAS, these
more accurate results cause more failures. You can see a simple example of
this on most x86 platforms by compiling the reference BLAS with SSE only,
which results in 64/32-bit precision. Then compile the same BLAS to use the
x87 unit (which has 80-bit precision, though the gcc or the code can
sometimes drop back to 64/32-bit precision briefly), and you will find you
have more errors, despite having at least the same precision in all cases,
and usually much more precision.
Installation help for AIX
Under AIX, you need to set the environment variable OBJECT_MODE
appropriately. So, if you want a 64-bit library and you use the bash shell,
you would do export OBJECT_MODE=64, while
setenv OBJECT_MODE 32 would be for 32-bit libraries under
csh.
Installing gcc under unix without being root
You do not need to be root to install a gcc that will deliver decent
performance for ATLAS. I include below the exact steps I use to install
the C compiler only in my own home area. Changing my home area path (given
in the --prefix command to configure) to yours should allow you to do the same.
These directions are for x86 users, where ATLAS needs gcc 4.2.x for decent
performance. They work pretty much the same for other platforms/gcc versions.
Note that these directions will install gfortran as well. The fortran
compiler is not needed for ATLAS performance, so if you want to use a
different fortran compiler than this version of gfortran,
simply omit fortran from the --enable-languages step.
NOTE: if your old gcc version is a lot older than the new one, there
will often by library incompatabilities, which can cause linking,
particularly of FORTRAN codes, to fail. If this occurs, you can
either use the old compiler to do linking, or set the environment
variable LD_LIBRARY_PATH so that your new libraries are seen
before the system libraries. For instance, on the above install where
the system libraries were for gcc 4.1.0, I had to add the following
line to my .cshrc:
setenv LD_LIBRARY_PATH /home/whaley/local/gcc-4.2.1/lib:/home/whaley/local/gcc-4.2.1/lib64
Post install tuning.
Here are some tips to improving ATLAS performance after an install:
Improving ATLAS small case performance by changing
malloc behavior
ATLAS allocates a buffer space for most GEMM calls. When I wrote it,
my assumption was that only first call requires a switch to kernel space
to do the allocation, and incurs the unneeded overhead of zeroing out the
memory. However, by default Linux (as well as some other OSes, such as
OS X) allocates non-trivial sized allocations using mmap, which
means that when free is called, the memory is immediately returned
to the system. Thus all malloc calls have extremely high overheads.
setenv MALLOC_TRIM_THRESHOLD_ -1
setenv MALLOC_MMAP_MAX_ 0
make sRun_tfc pre=s
make dRun_tfc pre=d
make cRun_tfc pre=c
make zRun_tfc pre=z
This search takes a *loooong* time, then to build the changes into the
libraries, go to ATLAS/bin/ARCH, and issue:
make xsl3blastst
make xdl3blastst
make xcl3blastst
make xzl3blastst
Tuning CacheEdge.
CacheEdge is an Level 2 Cache blocking parameter; because it's effects are
fairly subtle on most machines, it often goes wrong on machines experiencing
any kind of load, causing performance to be be suboptimal. CacheEdge can
improve performance by as much as 15%, and it can reduce ATLAS's
memory usage as well.
./xdfindCE -m [N] -n [N] -k [N]
where [N} is replaced by a very large number that is a multiple
of your blocking factor. You want to make this number as large as you
can stand to wait on, and this varies a great deal from machine to machine.
A good guestimate for most machines might be around 2000.
TA TB M N K alpha beta CacheEdge TIME MFLOPS
== == ====== ====== ====== ====== ====== ========= ========= ========
T N 1000 1000 1000 1.00 1.00 0 5.470 365.63
T N 1000 1000 1000 1.00 1.00 16 5.470 365.63
T N 1000 1000 1000 1.00 1.00 32 5.460 366.30
T N 1000 1000 1000 1.00 1.00 64 5.470 365.63
T N 1000 1000 1000 1.00 1.00 128 5.260 380.23
T N 1000 1000 1000 1.00 1.00 256 5.240 381.68
Initial CE=256KB, mflop=381.68
Best CE=256KB, mflop=381.68
So we want to set CacheEdge to 1024*256 = 262144. atlas_cacheedge
will look something like:
#ifndef ATLAS_CACHEEDGE_H
#define ATLAS_CACHEEDGE_H
#define CacheEdge 196608
#endif
If your initial install did not use CacheEdge, line 3 will be missing
completely. If you don't have this line, you would simply add it, using
the new value of 262144. In the above example, we would simply
replace 196608 with 262144.Special hints for setting CacheEdge for multiprocessor machines
CacheEdge turns out to be very important to threaded performance.
Unfortunately most of the default CacheEdge settings were obtained on
single processor machines. So, you may well be able to see a substantial
speedup by changing CacheEdge for your multiprocessor system.Building a complete LAPACK library
ATLAS does not provide a full LAPACK library. However, there is a simple way
to get ATLAS to provide its faster LAPACK routines to a full LAPACK library
provided by netlib. First, install lapack from netlib.
First, download and install the standard LAPACK library from the
LAPACK homepage.
Then when doing
the configure step to ATLAS pass the following flag:
--with-netlib-lapack=[path/to/lapack/lib]
mkdir tmp
cd tmp
ar x ../liblapack.a
cp <your LAPACK path & lib> ../liblapack.a
ar r ../liblapack.a *.o
cd ..
rm -rf tmp
My performance drops off for very large problem (N > 1500)
Note this is for the serial interface. If it is only your parallel
performance that drops off for large problem sizes, you are probably
experiencing this problem.
Your install dies with "unable to get timings in tolerance"
This means that ATLAS could not get repeatable timings. There are several
things that could cause this to happen. This could occur if the machine is
heavily loaded or experiences a sudden surge in usage from another program,
for instance. If this is the problem, simply keep restarting the install
(as discussed below) until it finishes.How do I restart an interrupted install?
If your ATLAS install was interrupted, and you have fixed the problem,
you can usually safely (there are always exceptions; if the install died
in the middle of an ar command, for instance, many systems cannot recover)
restart the install by:
How do I do I get rid of all the .o's?
Once you have done make install (and/or manually copied the
libraries and include files you want), you can simply delete the
entire OBJdir directory.
Help for building ATLAS under windows
You must install ATLAS in an directory without spaces in the path. Eg.,
/home/Administrator/ATLAS is fine, but /cygwin/c/My Home Area
is not.Building ATLAS with a non-cygwin compiler
If you want to build ATLAS with a non-cygwin compiler (i.e., a native windows
compiler such as Intel's icl or ifort), you will need
to perform the following steps (you can skip this if using the gnu compilers
is OK):
You will also need to install as root, as ATLAS builds unix-workalike compiler
wrappers and puts them in /usr/local/bin.
Setting your Windows compiler environment correctly
The easiest way to get your environment variables setup is to modify
your /cygdrive/c/cygwin/cygwin.bat file. In order to be able
to call MSVC++ (cl) from cygwin, I added the following lines
to my cygwin.bat:
chdir c:\Program Files\Microsoft Visual Studio 8\VC
call vcvarsall
Obviously, adapt the path as necessary to point to your MSVC++ install.
You can use the 'find files' function to search for the vcvarsall
file.
chdir c:\Program Files\Intel\Compiler\Fortran\10.0.025\IA32\Bin
call ifortvars
to my cygwin.bat file. In all, here is the
/cygdrive/c/cygwin/cygwin.bat that allows me to use MSVC++, icl,
and ifort:
@echo off
chdir c:\Program Files\Microsoft Visual Studio 8\VC
call vcvarsall
chdir c:\Program Files\Intel\Compiler\Fortran\10.0.025\IA32\Bin
call ifortvars
C:
chdir C:\cygwin\bin
bash --login -i
Back to Window compiler overview
Telling config about your windows compilers
Presently, ATLAS allows you to change only the interfaces compilers with
the native Windows compilers (the bulk of the library must be compiled with
gcc).
-C if ifort
to your configure command.
-C ic cl
-Si nocygwin 1
Note that without the cygwin library, you cannot build the threaded BLAS,
since Windows does not provide POSIX threads compliance, so only throw this
flag if you need it.
Post-config Make.inc fiddling for windows compilers
If you are compiling with all gnu compilers, no changes need to be made.
cp /cygdrive/c/Program\ Files/Microsoft\ Visual\ Studio\ 8/VC/lib/libcmt.lib /usr/local/lib/.
LIBS = /cygdrive/c/cygwin/usr/local/lib/libcmt.lib \
/cygdrive/c/cygwin/lib/libcygwin.a
If you built without cygwin, then the line you want is:
LIBS = /cygdrive/c/cygwin/usr/local/lib/libcmt.lib \
/cygdrive/c/cygwin/lib/gcc/i686-pc-mingw32/3.4.4/libgcc.a
and then the ATLAS timers and testers should work. You will probably
need to link
to these libraries yourself when linking to ATLAS externally as well.
Make shared fails with something like:
ld: cannot find -lgfortran
This is due to configure choosing a bad F77SYSLIB
(in Make.inc) due to a faulty probe. This macro should include
the path
where your libgfortran.so can be found. You can fix this problem
by manually changing this macro to have the correct path after configure.
If you have this problem, simply fix this macro, and then reissue your
make shared commands.
drteeth>locate libgfortran.so
/usr/lib/libgfortran.so.2
/usr/lib/libgfortran.so.2.0.0
/usr/lib/gcc/x86_64-linux-gnu/4.2/libgfortran.so
/usr/lib/gcc/x86_64-linux-gnu/4.2/32/libgfortran.so
/usr/lib32/libgfortran.so.2
/usr/lib32/libgfortran.so.2.0.0
My ATLAS install is 64-bit, so I would set (in Make.inc):
F77SYSLIB = -L/usr/lib/gcc/x86_64-linux-gnu/4.2/ -lgfortran
If you have multiple choices and don't know how to choose, it should
work to try them until your shared build works.
Problems with gcc 4.2 on Windows
A user has reported that
this gcc bug
can cause problems when ATLAS is installed on Windows using gcc 4.2.
The fix given in the gnu bug report (passing -fno-common) has
been reported to fix the ATLAS install as well. Therefore, if you are
using gcc 4 under windows, you may want to add the following to your
configure command:
-Fa alg -fno-common
Gcc's violation of the x86 ABI causes seg faults/bus
errors when mixed with other compilers
Gcc violates the x86-32 ABI by mandating a 16-byte aligned stack, where the
ABI mandates a 4-byte aligned stack. Therefore, if gcc is used to compile
some routines that may be called from an ABI-compliant compiler, you may
(depending on how lucky you are) get a seg fault or a bus error. Therefore,
if you plan on mixing gcc with any compiler that does not extend the ABI
in the same way, or if you want to use windows threads,
you must tell gcc to turn off this misfeature, which may
be done by passing -mpreferred-stack-boundary=2 to all gnu compilers.
This problem has been noticed on Windows, but might occur anywhere gcc is not
used to generate all of the object files. If you are compiling ATLAS with
all gnu compilers, you can just pass:
-Fa alg -mpreferred-stack-boundary=2
to configure. If you are building ATLAS with some mix of non-gnu and gnu
compilers, then you need the above flag added only to the gnu compilers.
You can do this at configure time by setting it for individual compilers,
as described
here, or you can configure as normal, but edit
Make.inc before starting the build, and add the flag to all gnu
compilers' flags.
-Fa acg -mpreferred-stack-boundary=2
When linking ATLAS's testers, I'm getting a bunch of
undefined BLAS symbols (eg. dgemm_, dgemv_, etc).
The ATLAS BLAS testers (x[s,d,c,z]l[1,2,3]blastst) expect to compare
against a F77 interface BLAS library for performance and testing purposes.
You get these missing symbols when your Make.ARCH's BLASlib
is left blank, or does not point at a complete BLAS library. If you have
a non-ATLAS BLAS built somewhere, point the BLASlib macro at it. If you
don't, probably the easiest fix is probably to grab the
Fortran77 reference BLAS
tarfile, and build it into the required lib. If you don't want to
do this, or don't have access to Fortran77, then you can have ATLAS
test against its own C reference as discussed
here.I'm linking with C, and getting missing symbols
(such as w_wsfe, do_fio, w_esfe or
s_stop).
These kinds of symbols are Fortran library calls. The problem is that the
C linker does not automatically find the Fortran libraries. The most
common fix is to either link using your fortran linker, or to rewrite your
code so that Fortran routines are not called. If you know where they are,
you can also choose to link in the Fortran libraries explicitly.
Problems with linking/missing
LAPACK routines on OS X
OS X has a built-in version of ATLAS, and uses the standard names for them.
They may be less up-to-date and/or have less libs than something you install
yourself; in particular, if you have a Fortran compiler, you can build
a full lapack library, which Apple does not currently provide, and so
many users want to install the standard ATLAS.
Unfortunately, when searching for libs the compiler looks in the
system areas where apple keeps its ATLAS libs before looking in
directories supplied by -L. This means that if you use
-L and -l for your linking, you always get Apple's modified
ATLAS, rather than the one you installed. There are two fixes for this
problem that I know of. First, you
can just link to the full name and path, rather than using -L.
For instance, change something like:
gcc -o xtst test.c -L /home/whaley/TEST/ATLAS/build64/lib -lcblas -latlas
to:
gcc -o xtst test.c /home/whaley/TEST/ATLAS/build64/lib/libcblas.a \
/home/whaley/TEST/ATLAS/build64/lib/libatlas.a
The only other trick I'm aware of is to rename your ATLAS libraries so that
the Apple versions will not override them.
How about C++ header files for the C interfaces?
Since ATLAS does not provide full OO C++ interfaces, I am reluctant to raise
the expectation that it does by providing C++ specific header files. What
I have always envisioned is the C++ programmer creating his own include
files, such as:
>cat cppblas.h
extern "C" {
#include cblas.h
}
>cat cpplapack.h
extern "C" {
#include clapack.h
}
If you are a C++ programmer using ATLAS, and think differently, let me know.
How do I restart an install from scratch?
Simply do a rm -rf * in your BLDdir directory,
and then reconfigure and build.
My system doesn't have the -f option to cp
If you take the following line, and put it in a file cp you make
executable, and then put it in your path before your system cp, it should
get rid of the -f option:
/bin/cp `echo $* | sed -e 's/-f / /'`