Error in ROTMG causes failures in modern lapack tests
ATLAS 3.10.2 has a bug in ROTMG. To fix, save
this file to
ATLAS/src/blas/level1/ATL_lamch.c, and then in
BLDdir/src/blas/level1 issue: make lib.
if (mb != NB && nb != NB) mmk_bX = mmk = genmm;Change this to:
if (mb != NB && nb != NB) { if (SCALAR_IS_ZERO(beta)) Mjoin(PATL,gezero)(mb, nb, C, ldc); mmk_bX = mmk = genmm; }
Building using OpenMP
In general, this is a bad idea, since OpenMP tends to much slower than
pthreads for ATLAS use. However, if your own application uses OpenMP,
sometimes pthread usage can slow down your own threads, making it worthwhile
to damage ATLAS performance in order to improve your OpenMP performance.
In ATLAS, thread affinity is the main reason pthreads wins against OpenMP,
and so OS~X (and probably FreeBSD), which don't support real affinity,
are the platforms where it makes sense to use OpenMP regardless.
To force ATLAS to use OpenMP rather than pthreads, you must add the
following flags to your configure line:
-Si omp 1 -F alg -fopenmp
-Ss ADdir /home/whaley/TEST/WINAD
Right now, the tarfile just has defaults for Core2SSE364, which is the only Windows machine that I have access to. It is my hope that Windows users will submit their own architectural default to me, so that I can expand this tarfile for the Windows community.
If you have a good install on 64-bit Windows, please create the architectural defaults as outlined in the atlas developer guide, and submit it to me, either via e-mail or the patches tracker.
-D c -DATL_ARM_HARDFP=1 -Ss ADdir /home/whaley/TEST/ARMHARDFP
/tmp/ccq5b8sE.o(.text+0x852): In function `CmndResults': config.c: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
This is normal, and not an error. Let me translate this message out of gnu-speak:
Hey, idiot, would you stop using that pesky ANSI/ISO C standard and use this non-standard routine instead?
For maximal compatability, ATLAS hews to the ANSI/ISO 9899-1990 standard, and so I cannot make the proposed change. Unfortunately, this warning message is literally immortal: there exists in gcc no flag combination that I can discover that can turn the freaking thing off. So, every time you link one of the config programs that calls this standard routine, the linker outputs this message, even if you turn on the strict ANSI compatibility flag. I reported it as an error to the gcc folks, but they point out it is the linker/glibc people that generate the "warning" and immediately closed the tracker. Still seems wrong to me that the strict ANSI flag with all warning messages turned off insists on printing out a message warning about standard usage, but there appears to be little for me to do about it. Therefore, just ignore the hectoring, and don't worry about these immortal, bogus, annoying and repetitive "warnings".
The ATLAS architectural defaults were all built with gcc 4.7.0, except on on PowerPCG4, where we lost machine access after supporting 4.6.2. Newer versions are likely OK, but earlier ones are usually not. If you use an older or newer version, be sure to run make time after completing your install to ensure that your compiler has not compromised your performance.
#define USE_F77_BLASto:
#define USE_L1_REFERENCE
#define USE_F77_BLASto:
#define USE_L2_REFERENCE
#define USE_F77_BLASto:
#define USE_L3_REFERENCE
#define TRUST_SMALL
To install with a non-default f77 compiler, simply override the default fortran compiler and flags from the command line when running config. This can be done by adding the following flags to configure:
-C if /path/to/f77comp -F if 'f77 compiler flags'
If you want to install ATLAS so it can be called from multiple, non-interoperable Fortran compilers (or indeed, have already installed with the wrong f77 compiler), you can do this with moderate ease, assuming you know how C and the given F77 compiler(s) interoperate. If you do not know this interoperational information, you must get configure to find it for you. To do this, create a bogus BLDdir directory (eg, mkdir bogus), and then run confgure from it, and overriding the default fortran compiler and flags from the command line as described here. You can then look at the generated Make.inc's settings for the macro F2CDEFS and replicate them, along with the new F77 compiler/linker information, into your original Make.inc.
For those user's already aware of the information needed for C/F77 interoperation, ATLAS needs three pieces of information in order to correctly handle F77/C interoperation, and this information appears as defines to the C compiler, set in your Make.inc's F2CDEFS.
The first macro controls the name space alterations necessary to make a C routine callable from Fortran77. The options are:
The second macro provides a mapping between F77's INTEGER and the appropriate C integral type. Options are:
The third macro deals with F77 string handling. The options are:
struct {char *cp; F77_INTEGER len;};
struct {char *cp; F77_INTEGER len;};
By default, ATLAS builds the F77 interface to the BLAS into the file pointed at by Make.inc's F77BLASlib, and so changing this macro before recompiling the interface will allow you to build multiple F77 interfaces.
For example, say on a Solaris machine I want to build the f77 interface for both Sun's f77 and gfortran. First, I install ATLAS as normal, with the default gfortran compiler. Now, to get a f77 interface lib, I edit my ATLAS/Make.SunOS_SunUS2, and I find that ATLAS has detected the C/F77 interface for gfotran as: Sun's f77 compiler as:
F2CDEFS = -DAdd__ -DStringSunStyleI then change this to match f77:
F2CDEFS = -DAdd_ -DStringSunStyleNow, so that my gfortran interface will not be overwritten, I also change:
F77BLASlib = $(LIBdir)/libf77blas.ato:
F77BLASlib = $(LIBdir)/libsunf77blas.aIf I had built the threaded BLAS, I would make a similar change to PTF77BLASlib.
Finally, I change the f77 compiler/linker information from:
F77 = /usr/local/bin/gfortran F77FLAGS = -O3 -funroll-all-loopsto:
F77 = /opt/SUNWspro/bin/f77 F77FLAGS = -dalign -native -xarch=v8plusa -xO5Now, I cd BLDdir/interfaces/blas/F77/src/, and issue:
make clean make libIf you are using threads, additionally issue:
make ptlibNow, when linking with Sun's f77, I link to -lsunf77blas.a -latlas.a, and when linking with g77 I use -lf77blas.a -latlas.a
You can essentially repeat this process for the LAPACK F77 interface, but change LAPACKlib rather than F77BLASlib, and go to BLDdir/interfaces/lapack/F77/src rather than BLDdir/interfaces/blas/F77/src. Also, LAPACK does not have a separate entry point for threads, so do not issue any of the additional threading instructions.
Finally, in your BLDdir/src/testing directory, issue :
make clean ; make lib
-L$(MY_BLDdir)/lib/ -lf77blas -latlasThe full LAPACK library created by merging ATLAS and netlib LAPACK requires both C and Fortran77 interfaces, and thus that serial link line would be:
-L$(MY_BLDdir)/lib/ -llapack -lf77blas -lcblas -latlasWhile the threaded LAPACK link would be:
-L$(MY_BLDdir)/lib/ -lptlapack -lptf77blas -lptcblas -latlasWhere $(MY_BLDdir) should be replaced by the directory where you have built your ATLAS.
Real errors have residuals very large residuals (eg., 10e15). However, even these kinds of errors may be a result of the LAPACK tester being completely tuned to the LAPACK BLAS implementation. For instance, the DGER operation is supposed to do A += alpha * x * y. The reference BLAS perform A += x*(alpha*y). ATLAS does this or A += (alpha*x)*y, whichever is cheaper. This causes is the LIN testers to fail quite a few residual checks with size 10e15, even though both are legal. I have modified ATLAS to avoid these spurious GER problems, but not all of the LAPACK testers' reliance on fixed orderings can be fixed.
#!/bin/sh # *************************************************************************** # This script adapted from: # http://advogato.org/person/redi/diary/240.html # For use in getting a standard gcc 4.7 for ATLAS stable 3.10 testing # # Assumes M4 is already installed; can get it from package manager on # any Linux that I know of. # *************************************************************************** instd=/home/whaley/local/gcc4.7.0 # change this to your path bldd=/home/whaley/TEST # change to your build dir, /tmp OK np=12 # set this to your number of cores # ------------------------------------------------------------ # Get and unpack the files, and stick them in common directory # ------------------------------------------------------------ #cd ${bldd} #mkdir GCC4.7.0 #cd GCC4.7.0 #wget http://www.netgull.com/gcc/releases/gcc-4.7.0/gcc-4.7.0.tar.bz2 #wget http://www.netgull.com/gcc/infrastructure/gmp-4.3.2.tar.bz2 #wget http://www.netgull.com/gcc/infrastructure/mpc-0.8.1.tar.gz #wget http://www.netgull.com/gcc/infrastructure/mpfr-2.4.2.tar.bz2 # --------------------------------------------------------- # Assuming we have all needed packages in current directory # --------------------------------------------------------- bunzip2 -c gmp-4.3.2.tar.bz2 | tar xf - bunzip2 -c mpfr-2.4.2.tar.bz2 | tar xf - gunzip -c mpc-0.8.1.tar.gz | tar xf - bunzip2 -c gcc-4.7.0.tar.bz2 | tar xf - mv gmp-4.3.2 gcc-4.7.0/gmp mv mpfr-2.4.2 gcc-4.7.0/mpfr mv mpc-0.8.1 gcc-4.7.0/mpc mkdir MyObj cd MyObj ../gcc-4.7.0/configure --prefix=${instd} --enable-languages=c,fortran make -j ${np} make install
Assuming you wget the exact same versions of the tarfiles that I show above, and put them in the common directory, then you should be able to use this shell script directly. Simply copy the above to a text file (say instgcc.sh), set the execute permission (chmod a+x instgcc.sh), and run it (./instgcc.sh).
NOTE: if your default gcc version is a lot older than the new one, there will often by library incompatabilities, which can cause linking, particularly of FORTRAN codes, to fail. If this occurs, you'll need to update your LD_LIBRARY_PATH. For instance, on my desktop lubuntu machine, I added the following line to my .cshrc after installing as above:
setenv LD_LIBRARY_PATH /home/whaley/local/gcc4.7.0/lib64:/home/whaley/local/gcc4.7.0/lib(change line to export and put it in your .bashrc if you use the bash shell rather than the tcsh shell, as I do).
This is not a big problem if you are doing a large matrix multiply, where the cubic computation disguises this square cost. For small problems, though, the O(N**2) costs are actually dominant, and this type of malloc behavior effectively doubles them (at least). You should be able to change Linux's malloc behavior by setting these environment variables:
setenv MALLOC_TRIM_THRESHOLD_ -1 setenv MALLOC_MMAP_MAX_ 0
Once this is done, malloc should be cheaper, but ATLAS was tuned with the expensive malloc. Therefore, you may be able to get better small-case performance by rerunning the crossover search with these environment variables set (don't do this unless you are going to keep these settings whenever you use this library). You can rerun the search from the ATLAS/tune/blas/gemm/ARCH directory by issuing:
make sRun_tfc pre=s make dRun_tfc pre=d make cRun_tfc pre=c make zRun_tfc pre=zThis search takes a *loooong* time, then to build the changes into the libraries, go to ATLAS/bin/ARCH, and issue:
make xsl3blastst make xdl3blastst make xcl3blastst make xzl3blastst
In ATLAS/tune/blas/gemm/ARCH, issue make xdfindCE. Run
./xdfindCE -m [N] -n [N] -k [N]where [N} is replaced by a very large number that is a multiple of your blocking factor. You want to make this number as large as you can stand to wait on, and this varies a great deal from machine to machine. A good guestimate for most machines might be around 2000.
You want to run this program several times to get a consensus idea of what a good setting would be. If a CacheEdge setting gets performance in the same range as no CacheEdge (CacheEdge of 0 is no CacheEdge in printout of xdfindCE), it is still recommended that you use that setting, since ATLAS with CacheEdge set will use less memory as problem sizes grows.
Once you have gotten an idea of what to set CacheEdge to, you can change it by editing ATLAS/include/ARCH/atlas_cacheedge.h. xdfindCE prints out data in KB, but atlas_cacheedge.h needs bytes, so multiply the xdfindCE result by 1024 to get the number you want to use in atlas_cacheedge.h.
Let's take an example. Say xdfindCE printed out this:
TA TB M N K alpha beta CacheEdge TIME MFLOPS == == ====== ====== ====== ====== ====== ========= ========= ======== T N 1000 1000 1000 1.00 1.00 0 5.470 365.63 T N 1000 1000 1000 1.00 1.00 16 5.470 365.63 T N 1000 1000 1000 1.00 1.00 32 5.460 366.30 T N 1000 1000 1000 1.00 1.00 64 5.470 365.63 T N 1000 1000 1000 1.00 1.00 128 5.260 380.23 T N 1000 1000 1000 1.00 1.00 256 5.240 381.68 Initial CE=256KB, mflop=381.68 Best CE=256KB, mflop=381.68So we want to set CacheEdge to 1024*256 = 262144. atlas_cacheedge will look something like:
#ifndef ATLAS_CACHEEDGE_H #define ATLAS_CACHEEDGE_H #define CacheEdge 196608 #endifIf your initial install did not use CacheEdge, line 3 will be missing completely. If you don't have this line, you would simply add it, using the new value of 262144. In the above example, we would simply replace 196608 with 262144.
By successively editing this file and recompiling, for instance ATLAS/bin/ARCH/x[d,s,z,c]mmtst you can tune this value further. Many users expect that they should set CacheEdge to the actual size of their L2 cache. This is only rarely the best setting, mainly because L2 caches are normally combined data/instruction, and so a smaller setting, leaving room for instruction caching, is usually best. On some machines with large L2 caches, things like associativity, or even TLB issues, can make it more efficient to use a very small subset of the available cache.
Once you have set CacheEdge to the value you need, update all libs with the new setting by issuing make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your ATLAS/bin/ARCH directory.
x[pre]findCE usually takes the smallest CacheEdge setting possible, since this saves memory. For multiprocessor systems, however, it is vital to use as much of the available cache as possible so that the processors spend as little time contending for the bus as possible. Thus, you want to set CacheEdge to the largest value that gives decent results. I usually run xdfindCE a few times to get an idea of ranges, and then try the larger settings by running x[pre]l3blastst_pt. Remember that threaded timings have to use walltime, so make sure any speedup is repeatable before changing CacheEdge.
What you want to do is tune CachEdge, as shown here, but be sure to use very large problem sizes in order find CacheEdge.
Another problem that could cause this is that ATLAS misdetected the peak of your machine, and is thus using an inadequate timing interval. You can see if this is happening by scoping how long each timing is taking. If it is very quick, and thus unrepeatable, you need to tell ATLAS to pump up the timing granularity. To do this, edit the files BLDdir/include/atlas_?sysinfo.h. Each of these four files will have a quantity called ATL_nkflop. Pump this quantity up by some significant factor until timings are regular. I usually increase it by a factor of 5 or 10. If the individual timings are then too slow, interrupt the process, and decrease the values.
Finally, the Level 1 timings very often display this problem even when the timing interval is sufficient. The most likely explaination of this non-repeatable timing problem involves inadequate cache flushing, but it has not been tracked down for sure. Regardless, the only way is to keep restarting the interrupted install until it completes, as explained below.
Most of the time, when an install dies in this way, you can just restart it, as outlined here. If this dies right away in the exact same timing, but without actually running the timing again, it means that the install process kept a record of the bad timing, and is just rereading it. You then need to remove the bogus timing record file. This file will be in the appropriate BLDdir directory under the res/ subdirectory. For instance, if you are dying in the level 1 tuning, the result files are stored in BLDdir/tune/blas/level1/res, and if you are in the gemm tuning they are in BLDdir/tune/blas/gemm/ARCH/res, etc. Your last message from the dying install should give you most info you need to figure out the result directory and the file. Just remove the file (or all the files of that precision, if you cannot figure out the specific file that is bad), and restart the install.
What you should set it to will vary by system, as you need to locate the correct library path. You can usually do this using the locate command. For instance, here's what I see on my system:
drteeth>locate libgfortran.so /usr/lib/libgfortran.so.2 /usr/lib/libgfortran.so.2.0.0 /usr/lib/gcc/x86_64-linux-gnu/4.2/libgfortran.so /usr/lib/gcc/x86_64-linux-gnu/4.2/32/libgfortran.so /usr/lib32/libgfortran.so.2 /usr/lib32/libgfortran.so.2.0.0My ATLAS install is 64-bit, so I would set (in Make.inc):
F77SYSLIB = -L/usr/lib/gcc/x86_64-linux-gnu/4.2/ -lgfortranIf you have multiple choices and don't know how to choose, it should work to try them until your shared build works.
-Fa alg -mpreferred-stack-boundary=2to configure. If you are building ATLAS with some mix of non-gnu and gnu compilers, then you need the above flag added only to the gnu compilers. You can do this at configure time by setting it for individual compilers, as described here, or you can configure as normal, but edit Make.inc before starting the build, and add the flag to all gnu compilers' flags.
Note that if you add -mpreferred-stack-boundary=2 to the fortran compiler flags, then g77/gfortran will cause the BLAS tester to die with a bus error/seg fault, because gfortran then aligns double precision arrays to 4-byte rather than 8-byte boundaries. ATLAS assumes native alignment on data types, and so the code seg faults. As long as you don't need to call the Fortran interface from a non-gnu compiler, then you can solve this problem by simply leaving the -mpreferred-stack-boundary flag off of for the Fortran interface. At configure time, you would pass:
-Fa acg -mpreferred-stack-boundary=2
I reported these related problems (ABI non-conformance and non-native alignment) to the gcc folks, and got a range of opinions, the most startling of which was that whatever they do is the standard. On that link, one of the more helpful replies mentions that the non-native alignment problem should be fixed in gcc 4.4, but that the ABI non-compliance must be maintained in order to match the prior failure to comply with the ABI.
gcc -o xtst test.c -L /home/whaley/TEST/ATLAS/build64/lib -lcblas -latlasto:
gcc -o xtst test.c /home/whaley/TEST/ATLAS/build64/lib/libcblas.a \ /home/whaley/TEST/ATLAS/build64/lib/libatlas.aThe only other trick I'm aware of is to rename your ATLAS libraries so that the Apple versions will not override them.
>cat cppblas.h extern "C" { #include cblas.h } >cat cpplapack.h extern "C" { #include clapack.h }If you are a C++ programmer using ATLAS, and think differently, let me know.
/bin/cp `echo $* | sed -e 's/-f / /'`