[Home] [Docs] [FAQ] [Errata] [Software] [Install] [Support] [Lists] [Developer home] [Timings]




[Home] [Docs] [FAQ] [Errata] [Software] [Install] [Support] [Lists] [Developer home] [Timings]

SourceForge Logo ICL Logo

What is ATLAS?

ATLAS stands for Automatically Tuned Linear Algebra Software. ATLAS is both a research project and a software package. This FAQ describes the software package. ATLAS's purpose is to provide portably optimal linear algebra software. The current version provides a complete BLAS API (for both C and Fortran77), and a very small subset of the LAPACK API. For all supported operations, ATLAS achieves performance on par with machine-specific tuned libraries.

Who uses ATLAS?

ATLAS can be used by anyone needing fast linear algebra routines. ATLAS is used directly by a great many research scientists. Because of the open nature of ATLAS, we have no way of knowing how many users of ATLAS there are. In the following paragraphs, we indicate some of the users that we know about, but this is far from a complete list.

ATLAS is used, or is planned to be used, in the following PSEs:

ATLAS is also included in: ATLAS may be optionally used by almost any project requiring the BLAS. Here are some projects that we have seen providing the option for using ATLAS:

Additionally, ATLAS is included in some way by the following OS distributions:

What are the academic references for ATLAS?

The academic references for ATLAS are given in bibtex format below. If you want to reference one paper only, probably the newest (first shown) is the best, as it references the others. The first two papers contain the bulk of the needed information. Referencing the homepage can help other researchers find the software.

Note that there have been quite a few subsequent papers that discuss ATLAS (with varying degrees of accuracy and detail) written by people not directly involved in ATLAS's production and design. While these papers may be about ATLAS, they are not, obviously, primary sources, and should not be cited as such. If the paper is not authored by Whaley or Petitet, it is not a primary-source ATLAS paper.

AUTHOR = "R. Clint Whaley and Antoine Petitet",
TITLE  = "Minimizing development and maintenance costs in supporting
          persistently optimized {BLAS}",
JOURNAL= "Software: Practice and Experience",
volume = "35",
number = "2",
pages  = "101-121",
month  = "February",
YEAR   = "2005",
NOTE           = {\verb+}

AUTHOR         = "R. Clint Whaley and Antoine Petitet and Jack J. Dongarra",
TITLE          = "Automated Empirical Optimization of Software and the
                 {ATLAS} Project",
JOURNAL        = "Parallel Computing",
VOLUME         = "27",
NUMBER         = "1--2",
PAGES          = "3--35",
YEAR           = 2001,
NOTE           = "Also available as University of Tennessee LAPACK Working
                 Note \#147, UT-CS-00-448, 2000
                 ({\tt})" }

AUTHOR         = {R. Clint Whaley and Jack Dongarra},
TITLE          = "{Automatically Tuned Linear Algebra Software}",
BOOKTITLE      = "Ninth SIAM Conference on Parallel Processing for
                 Scientific Computing",
NOTE           = "CD-ROM Proceedings",
YEAR           = 1999 }

AUTHOR    = "R. Clint Whaley and Jack Dongarra",
TITLE     = "Automatically Tuned Linear Algebra Software",
BOOKTITLE = "SuperComputing 1998: High Performance Networking and Computing",
YEAR      = "1998",
NOTE      = "CD-ROM Proceedings. {\bf Winner, best paper in the systems
            URL: \verb+"

AUTHOR         = {R. Clint Whaley and Jack Dongarra},
TITLE          = "{Automatically Tuned Linear Algebra Software}",
INSTITUTION    = "University of Tennessee",
YEAR           = "1997",
MONTH          = "December",
NUMBER         = "UT-CS-97-366",
NOTE           = "URL : \verb+"

TITLE = "ATLAS homepage",
AUTHOR = "{See homepage for details}",
NOTE   = ""

Does ATLAS run on my platform (OS/hardware)?

ATLAS should produce optimized libraries on almost any platform possessing an ANSI/ISO C compiler, and some Unix-like command-line tools (eg., make, cp, etc). ATLAS runs on pretty much all Unix variants (including embedded systems), as well as Windows (Windows users must install the free cygnus tools).

What software license does ATLAS use (AKA: in what ways and for what purposes am I allowed to use ATLAS)?

ATLAS uses a BSD-style license, without the advertising clause. ATLAS's license is taken almost verbatim from the example given at Here is the relevant portion of the license, as taken from an ATLAS source file:
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *   1. Redistributions of source code must retain the above copyright
 *      notice, this list of conditions and the following disclaimer.
 *   2. Redistributions in binary form must reproduce the above copyright
 *      notice, this list of conditions, and the following disclaimer in the
 *      documentation and/or other materials provided with the distribution.
 *   3. The name of the ATLAS group or the names of its contributers may
 *      not be used to endorse or promote products derived from this
 *      software without specific written permission.
To see the exact license, simply edit almost any source file in the ATLAS tarfile (eg., ATLAS/src/auxil/ATL_lcm.c).

How do I get help/technical support with ATLAS?

Your first resource for should always be the ATLAS errata file. This file keeps track of all discovered errors in ATLAS, and their workarounds or fixes. It also contains workarounds for common system problems (eg., compiler errors, non-standard commands, etc), as well containing advice necessary to get the best performance on various machines.

If you have downloaded the ATLAS source, your ATLAS/doc directory contains some useful documentation, though it is often more dated than the info in the errata and online.

If (and only if) neither of these sources provides the information you need, you can can submit a support request to:

Do not, under any circumstances, post your support request to the "bug" tracker. As documented on the tracker itself, this is for developer confirmed bugs only. All users should use the support or feature request trackers. Things that turn out to be bugs will later be escalated to the bug tracker by the confirming developer.

In addition, please understand that the tone of your support request is important, as described here.

You do not need to create SourceForge account in order to use the tracker (that persistant plea to "please log in" can be ignored), though it makes things easier if you do. In particular, if you don't log in, you won't be able to later attach extra files, etc (you can attach a file in your initial report, but afterwords, it is unsure you are the original poster, so it won't allow it). So, if you think you may need to do this kind of thing relatively often, it may be worth doing.

Note that you should upload the error_[ARCH].tgz file as well. If the error killed the ATLAS install before it succesfully created the error tarfile, create it yourself by issuing the following command from your BLDdir subdirectory:

   make error_report

Note that the [ARCH] of the above directions should be replaced by your architecture string that ATLAS is using (eg., Linux_P4SSE1 or SunOS_SunUS4, etc).

What documentation is available (usage info)?

ATLAS's main job is to provide optimized libraries, so most of the documentation is on the appropriate APIs. ATLAS does provide some executables, but these are merely testers and timers for the provided libraries. A very rough description of the operation of these executables is given in ATLAS/doc/TestTime.txt in your ATLAS source directory.

Here's some pointers to ATLAS documentation:

ATLAS stable errata
Errata file for last stable release. Includes many architecture-specific install hints, as well as documenting and providing fixes for all known errors.
ATLAS installation guide
Guide to configuring, installing, and testing (accuracy and efficiency) ATLAS. Also available in the tarfile at ATLAS/doc/atlas_install.pdf.
This document. Frequently asked questions about ATLAS.
A brief API reference for the ANSI/ISO C interface to the BLAS.
Fortran77 BLAS API reference
A brief API reference for the Fortran77 interface to the BLAS.
Fortran77 and C LAPACK API reference
A brief API reference for the LAPACK routines provided by ATLAS.
Contributer HOWTO [ps] [html]
Paper describing mechanism to speedup ATLAS by contributing kernels.
Developer HOWTO [ps] [html]
Paper with information on how to be an ATLAS developer.
LAPACK homepage
All kinds of LAPACK-related info given at the LAPACK homepage.
BLAS homepage
All kinds of BLAS-related info given at the BLAS homepage.

What mailing lists, archives, and so on does ATLAS have?

ATLAS has the following tracker lists:
ATLAS technical support
The place you go once you are sure you have a technical support issue not covered in the other docs. Anyone can post to this list.
ATLAS feature request list
The place to request additions/extensions to ATLAS. Anyone can post to this list.

ATLAS also has various mail lists and archives. Anyone can sign up or post to these guys. They are:

ATLAS announcement list [browse] [subscribe/unsubscribe/preferences]
Low-volumne list, where significant changes in stable release are announced. Mainly used to announce new stable releases, perhaps occasional updates about very important features appearing in developers' release. Traffic should be O(10) messages a year.
ATLAS error list [browse] [subscribe/unsubscribe/preferences]
Low traffic list for errors found in the ATLAS stable release. Errors in config or install portions are generally not reported, but those involving getting incorrect results or substantial performance penalties are. The idea is that the people monitoring this list are using a successfully installed ATLAS library, and want to be informed if errors are discovered.
ATLAS developer list [browse] [subscribe/unsubscribe/preferences]
Relatively high volume list for ATLAS developer communication.
ATLAS Performance/results list [browse] [subscribe/unsubscribe/preferences]
Mailing list for reporting performance results from ATLAS.
ATLAS CVS commits list [browse] [subscribe/unsubscribe/preferences]
High volume traffic list, with a message for each CVS commit.

Can I download a prebuilt binary instead of installing from source?

Unfortunately, we lack the manpower to provide prebuild binaries.

Can I get ATLAS in rpm or .deb or some other format?

Our only supported format is a compressed tarfile. If you really feel the need for .rpm or .deb versions, other parties (eg, Debian, SuSE) provide them (note that we can't answer questions on ATLAS installed in this way, however, since we don't know much about them). ATLAS provided by third parties may not be as up-to-date, or may run slower than compiled ATLAS (eg., some companies compile only a couple of x86 libraries, so that they would use the same library for the P4 and P4E chips, even though ATLAS should tune itself separately to all the x86 variants for maximal performance).

What does the version number of ATLAS mean?

ATLAS version numbers look like: <major number>.<minor number>.<update number>. The meaning of these terms is:
Major number
Major release numbers are changed only when fairly large, sweeping changes are made. Changes in the API are the most likely to cause a major release number to incremement. For example, when ATLAS went from supporting only matrix multiply to all the Level 3 BLAS, the major number changed; the same happened when ATLAS went from supporting only Level 3 BLAS to all BLAS.
Minor number
Minor release numbers are changed at each official release. Even numbers represent stable releases, while odd minor numbers are reserved for developer releases. Click here for an explanation of developer and stable releases.
Update number
Update numbers are essentially patches on a particular release. For instance, stable ATLAS releases only occur roughly once per year. As errors are discovered, they are errataed, so that a user can apply the fixes by hand. When enough errata are built up that it becomes impractical to apply the important ones by hand, we will issue an update. So, stable updates are typically bug fixes, or important system workarounds, while developer releases often involve new code. A typical number of updates to a stable release might be something like 4. By their very nature, each update to the developer release usually contains new code, and they can happen relatively rapidly. A developer release may have any number of updates.

So, 3.2.1 would be a stable release, with one group of fixes already applied. 3.3.12 would be the 12th update (13th release) of the associated developer release.

How can I tell what version of ATLAS I have?

For ATLAS version 3.3.6 or newer, you can find out version and build information via the routine ATL_buildinfo. The following complete program will give build information (including version number) when linked against version 3.3.6 or later libatlas.a's:
 * Compile, link and run with something like:
 *    gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
 * if link fails, you are using ATLAS version older than 3.3.6.
   void ATL_buildinfo(void);

If you are using an ATLAS version prior to 3.3.6, there is no easy way to find the version information without looking at the source. If you have the source tree around, the easiest fix is to examine pretty much any source file (eg. ATLAS/src/auxil/ATL_lcm.c); the major and minor version number will be given in the copyright notice at the top. To find out the update number, you'd have to consult the actual routines updated by the particular update, as given in the ATLAS errata file.

If you still have the directory where you built ATLAS around, you can find this version information w/o writing the above routine by:

   cd BLDdir/bin
   make xprint_buildinfo

What's the difference between stable and developer ATLAS releases?

The vast majority of ATLAS users should download and use only the stable version of ATLAS. Stable versions of ATLAS are released roughly once a year. The most current stable release has an associated errata file, which details all errors found in the release. Stable releases are very well tested, and are much more heavily supported.

Developer releases, on the other hand, are meant to be used, as the name suggests, by ATLAS developers, contributers, and people happy to live on the bleeding edge. Developer releases are meant to allow access to the newest ATLAS sources, and may represent a simple snapshot of the internal developer tree. As such, they are essentially untested, and may not build, much less run, correctly. So, while they may possess features not available in the current ATLAS release, only the most experienced of users should consider utilizing them.

Developer releases are available from the developer site, while stable releases are available from the ATLAS main page. Stable and developer releases are also distinguished by their version numbers, as explained here.

What LAPACK routines does ATLAS provide?

The only way to be sure you have the most up-to-date list is to examine the source in ATLAS/interfaces/lapack/F77/src/. It is pretty much a foregone conclusion that any documentation, this page included, will eventually become out of date. ATLAS 3.6 and 3.8 provide C and Fortran77 interfaces to these routines:

Since LAPACK has no official C API, ATLAS provides its own in ATLAS/interfaces/lapack/C/src/.

What header files does ATLAS provide?

The official header file for the C interface to the BLAS is available as ATLAS/include/cblas.h. The header file for the C interface to LAPACK is ATLAS/include/clapack.h.

How can I get dynamic (.so) libraries rather than ATLAS's default static libraries (.a)?

ATLAS 3.8.0 has prototype support for building dynamic libraries, as described here.

What's the best hardware for running ATLAS/what machine do you recommend I buy for this kind of work?

This is another question that is pretty much impossible to answer generally or keep up to date. I need to update this entry once I get my hands on the new Opteron!

Can I use ATLAS with CLAPACK?

Yes. CLAPACK gives you the option to compile CLAPACK to use the standard C interface to the BLAS, which ATLAS provides. If you run CLAPACK's included BLAS tester, be sure to turn off error-exit tests, since it can't properly test the error exits returned by the CBLAS. ATLAS provides essentially the same testers in ATLAS/interfaces/blas/C/testing, which do correctly test the error exits, if that's important to you.

How well optimized are the various routines in ATLAS?

All of the routines in ATLAS tend to be competitive with the machine-specific versions for most known architectures. However, ATLAS is not just about working well on known architectures, but also tries to be optimal for unknown machines. When it comes to the generality of the optimizations ATLAS uses, there is a definite heirarchy:

I need routine/architecture X optimized, can you do it?

If you have a particular operation and/or architecture you really need optimized, you may want to post a mention of that to the ATLAS feature request tracker. We don't do optimization on request, but when we have to choose the next set of operations to support, user input can certainly influence things. To maximize your chance of swaying us, you'll want to include what percentage of your application time is spent in the particular operation, etc.

A quicker way to get action is to do it yourself. ATLAS is open source, and The developer homepage explains how you can use ATLAS to optimize various operations.

How is ATLAS funded?

ATLAS is presently funded by NSF CAREER OCI-1149303. It was supported in the past by NSF CRI CNS-0551504 and the NSF EPSCoR Cooperative Agreement No. EPS-1003897, with additional support from the Looisiana Board of Regents. For more details, see here.

I originally started ATLAS development when I worked at the Innovative Computer Laboratory at the University of Tennessee. I got enough of it working to convince Jack (Dongarra, of LAPACK and BLAS fame) to give the development go-ahead on my own time. After that, ATLAS was written into a variety of grants, but was never funded (to my knowledge) solely on its own grant at ICL. I believe some of it's later development took place under the NSF grant "Linear Algebra Algorithms and Tools for Emerging Computing Environments and User Communities", Grant Number ACI-9813362.

Both Antoine and myself (the two full-time ATLAS researchers and developers) left ICL in 2001. After this date, ATLAS work was pretty much entirely unsupported, which slowed development in a massive way. However, in 2003, Advanced Micro Devices funded a year of my graduate studies in return for some Opteron tuning. This allowed me to spend quite a bit more time on ATLAS than previously, and resulted in the release of ATLAS 3.6.0 in that year.

After this, I found very little time to work on ATLAS due to faculty duties until 2006 when I got funding for both research and maintainence, thanks to both NSF and DoD. Details can be found here. ATLAS development has therefore picked up again, and ATLAS 3.8.0 was released in October of 2007.

At present, the main support for ATLAS comes from my CAREER award. The government contract was not transfered to LSU, but Tony Castaldo is supported on it until mid-2014.

Who provides infrastructure support?

Obviously, Sourceforge provides the the ATLAS main page, including CVS services, tracker, etc. Also netlib provides access for a large part of the mathematical community through ATLAS's original homepage.

As far as machine access for tuning:

Who wrote/contributed to ATLAS?

Note that this question addresses package design and code contribution, not money, infrastructure or testing. R. Clint Whaley founded the ATLAS project. After the initial release, he was joined on the project by Antoine Petitet. Between them, these two individuals are responsible for 95% of the code in ATLAS, along with pretty much all of the design. That is not to say that others have not made substantial contributions, however.

In particular, ATLAS has been designed to allow for outside contribution such that a user can provide only a very small kernel, and thus speed up large portions of the library. Many people have contributed in this manner, and this has resulted in extremely large performance improvements for ATLAS on certain architectures. These contributers (in alphabetic order), and a rough sketch of what they have done, are:

Doug Aberdeen
His work on emmerald (an SSE-enabled SGEMM), and help on atlas developer mailing list, was the starting point for a lot of people working on SSE-enabled kernels.
Matthew Brett
Initial work and help with getting ATLAS to build dynamic libraries.
Nicholas Coult
Initial version of AltiVec-enabled SGEMM.
Tony Castaldo
Discovered importance of issuing instructions in sets of 4 for PowerPC970, which allowed us to increase dgemm kernel performance from 75 to 82.5% of peak.
Markus Dittrich
Found shell technique allowing configure to pass multiple words as a single flag.
Dean Gaudet
Initial work on using CPUID in configure, Efficeon tuning information, and many informative atlas-devel discussions on related matters.
Kazushige Goto
His ev5/ev6 GEMM was used directly by ATLAS 3.6 and older if the installer answers "yes" to its use during the configuration procedure on an alpha processor. This results in a significant speedup over ATLAS's own GEMM codes, and is the fastest ev5/ev6 implementation we are aware of. Explicit support for alphas was dropped in ATLAS 3.8.0.
Jeff Horner
Initial versions of Level 3 BLAS tester/timer, C interface to the Level 3 BLAS, and non-generated complex matmul code.
Camm Maguire
SSE and SSE2 kernels enabling large speedups for Intel architectures. Also maintains ATLAS (along with other packages) for debian distribution.
Tim Mattox and Hank Deitz
Provided an extremely efficient SGEMM 3DNow! kernel for Athlon.
Viet Nguyen and Peter Strazdins
UltraSparc-optimized [D,Z]GEMM kernels.
Julian Ruhe
Incredibly efficient Athlon matmul kernel.
Peter Soendergaard
SSE and 3DNow! kernel work, enabling large speedups on Intel and AMD archs. Also, translation of Julian's Athlon kernels from NASM to gnu assembler, and extention to all precisions.
Carl Staelin
Initial work on parallelizing ATLAS make.

ATLAS testers

Getting a stable release of ATLAS out the door is a seemingly never-ending task. One of the biggest headaches involves testing ATLAS, which can run on almost any platform, on enough architectures to be confident that it is sound enough to be called stable. We have testers that automate this process, but they typically hog up the entire CPU for more than a day, and no one likes to have to hassle with running them.

For ATLAS 3.8.0, I did all testing myself, though several developers provided me with machine access.

Is ATLAS thread safe?

It should be completely safe to call any ATLAS routine from a threaded code. There are no global variables, or other shared information between routines. Probably the best idea is to say "yes" to threading in config, even if you wish to do the threading yourself. That way, the ATLAS lib will be compiled with the threading flags. Then, simply link to the serial interface so that ATLAS doesn't do the threading. If you want ATLAS to do the threading as well, simply link to the threaded interface.

Can I vary the number of threads ATLAS uses dynamically?

No. The maximum number of threads to use is determined at compile time. ATLAS will never use more than this, but may use less if the problem sizes are too small to get speedup from the additional parallelism.

What's the deal with the RHS in the row-major factorization/solves?

Most users are confused by the row major factorization and related solves. The right-hand side vectors are probably the biggest source of confusion. The RHS array does not represent a matrix in the mathematical sense, it is instead a pasting together of the various RHS into one array for calling convenience. As such, RHS vectors are always stored contiguously, regardless of the row/col major that is chosen. This means that ldb/ldx is always independent of NRHS, and dependant on N, regardless of the row/col major setting.

Why don't you speedup/improve ATLAS's search

There are several questions here, handled in their own sub-questions:
  1. Why don't you improve the type of search?
  2. Why don't you improve the speed of the search?
  3. Why don't you improve the accuracy of the search?
  4. Why don't you search NB > 80?
As you will see by reading each, only the last of these actually would be helpful for ATLAS's main use, and it has stayed on the backburner for quite some time because its almost always more useful to expand ATLAS's other capabilities. Note that the majority of users should use the provided architectural defaults, thus avoiding the search altogether. The search is there only for exploration by the expert user (in a user-controlled fashion), or to enable a naive user to get an adequate library on a truely new architecture (in its fully automatic mode).

Why don't you improve the type of ATLAS's search?

There has been quite a bit of research on fast search techniques. ATLAS uses a relaxed 1-D line search, where the `relaxed' comes from the fact that interacting transforms are usually handled by restricted 2/3-D searches. This is a very basic search technique, and many people wonder why a more advanced algorithm, such as hill climbing, simulated annealing, or genetic algorithm isn't used. The real answer is that it is overkill. Because I understand the transformations ATLAS attempts, and how they interact, I am able to target the relaxed line search appropriately. More advanced techniques are more appropriate when you know do not understand good start values for transforms and less about the interactions between optimizations and how to resolve them. The modified line search has some nice properties: it is easily guided by hand by the expert user in order to expore spaces more fully, and it is easy to understand and maintain.

Why don't you improve the speed of ATLAS's search?

I occasionally get suggestions on how to speedup ATLAS empirical search. I know of a multitude of ways that I could do this. In my view, however, they are not worth the effort/risk at the present time. Most users should use the architectural defaults, skipping the search altogether. The only speed criteria that went into the search design was that it needed to be tolerable. The main purpose of ATLAS is to provide an optimized library, and once the search could produce that in a period of time O(1 day), that seems good enough. Many architectures are much faster than that, of course.

There are many places in the search where I could prune things back and have no effect on performance on any known architecture, but since the speed is adequate, additional search options are left on in case an unexpected architectural change is found. I could also utilize more sophisticated sampling techniques, but these would then need to be validated to work on the vast array of machines (the present search having been tested for over seven years, and on innumerable architectures). All this is to say that speeding up the search is not a bad thing, it just is not that helpful to the core usage of ATLAS, and so it is not worth the cost/risk of change at this time. If additional tuning capabilities are added, so that the search time becomes more critical, then of course the search will be updated.

Why don't you improve the accuracy of ATLAS's search?

This is the search problem that I am most tempted to fix. The present search is mainly designed to be usable by an installer with no system priveledges, who must install on stock systems that are experiencing unrelated load during the installation. Thus, by default ATLAS uses CPU-time for all non-threaded installation decisions, which is extremely innaccurate. This often leads to the search going awry (i.e., failing to find a more optimal kernel), which is why the architectural defaults are so important.

For most systems, you can at least tell ATLAS's configure to use the cycle-accurate wall timers, which will make all timings much more accurate if you are on an unloaded machine.

Ultimately, a search with some real statistics should be built, which would determine if two timings are statistically different or not, and also use some statistics to see how many probes are required to get reliable results. This is the area that I think I a most likely to improve the search in, if I ever have time.

What's the deal with the architectural defaults?

I split this into several seperate questions:
  1. Is using the architectural defaults important, rather than doing my own search?
  2. When should I not use architectural defaults?
  3. If I don't use architectural defaults, how can I get better performance?

Is using the architectural defaults important, rather than doing my own search?

The short answer is definitely. As described elsewhere the search is designed to be used only when architectural defaults are unavailable or have become non-optimal due to compiler change. To understand this, you need to understand the nature of empirical searches in general.

Empirical searches, when ran on real machines experiencing unrelated load, are almost never strictly repeatable, even in the best of cases. The default ATLAS search is far from the best case: the sampling and timing mechanisms are crude, made to work on the lowest-common denominator setups, up to and included embedded systems. So, when run in this mode, the search is designed to give you a library that isn't bad, but is often far from the best.

To get better results (which are then saved as architectural defaults), I usually run the search multiple times, and if necessary, intervene by hand to probe promising transformations. Thus, the architectural defaults can be thought of as a save of several installs + some user intervention. Also, the architecural defaults are synergistic with the default compiler flags, so you want to leave both alone for best results.

When should I not use architectural defaults?

As previously mentioned, architectural defaults are usually the result of several guided installations, and thus represent best of breed installs. They can become a barrier to performance occasionally, particularly when a compiler goes through a major release. For instance, ATLAS 3.8.0's architectural defaults are for gcc 4.2, and you are presently using 5.1, it might be possible that things have changed enough to require new defaults, and if you are using a bad compiler like gcc 4.1 or an old one like gcc 3, you will almost certainly not want to use the architectural defaults.

The first thing to check is that your library runs about the same as those that you would get using the architectural defaults. You can determine this by make time, as described here.

If you choose not to use the recommended compiler and architectural defaults, be sure to follow these directions. If you suspect your performance is suboptimal, open up a support request and ask.

If I don't use architectural defaults, how can I get better performance?

First, make sure your defaults are better than the architectural defaults by comparing the timings of a default install against your search install, as described here.

Play with the different compiler and flags to find things that better match both the defaults, and your output flags. Be sure to do all the normal post-install tuning, including tuning CacheEdge. Finally, if your install is indeed faster than the arch defaults, report it.

Why does the search limit NB to 80?

The default ATLAS search limits GEMM's blocking factor to at most 80. On systems where larger NB actually blocks for the L2, blocking for the L2 prevents ATLAS from using it's multilevel blocking parameter, CacheEdge. In this case, larger blockings may result in superior kernel timings (which do no L2 blocking), but if an L1-contained NB is used, similar or superior performance may be obtained in full GEMM with a tuned CacheEdge. In this case, the GEMM speedup is illusory, but the application and small-case gemm slowdown (discussed below) is quite real. On machines with large L1, or very fast L2, GEMM may indeed get a asymptotic speedup from larger blocking factors, but it is still almost always a bad idea, as outlined below.

There are three main factors why even true asymptotic speedup from large blocking factors are a bad idea:

  1. Overall GEMM performance is usually decreased, because more time is spent in cleanup (called when one or more dimensions are not a multiple of NB). For many GEMM calls, very large NBs result in calling nothing but the relatively poorly optimized cleanup.
  2. Application performance, which usually calls GEMM with varying sizes, most of which are of modest dimension, spend even greater proportion of their time in cleanup code, and thus experience much greater slowdown than individual GEMM calls.
  3. Many applications have unblocked code, and in some cases, the blocking factor may not necessarily match GEMMs. In this case, a large NB can guarantee that all GEMM calls go straight to cleanup. Even when this doesn't happen, as NB rises, the number of calls to GEMM goes down, and since large NB gemm requires larger matrices to reach its asymptotic peak, even recursive factorizations (the best case for large-NB GEMM) need to get extremely high efficiency for the large GEMM cases to overcome the slowdown on the small cases. For fixed-NB cases as in the LAPACK factorizations, large NB are almost always a loss. This is the main reason that ATLAS is reluctant to use large NB.

Note that points (2) & (3) are very important: GEMM is one of the most studied performance kernels in the world not for its own sake, but due to the wide variety of applications whose performance can be improved by speeding it up. Thus, speeding up GEMM at the expense of application performance is something that only someone interested in benchmarking GEMM (as opposed to building a usuable library) would want to do.

Therefore, the ATLAS search limits NB to 80. We occasionally relax this limit (manually, never blindly in the search) when it is absolutely necessary. For instance, on SPARCs, large NB have proven necessary for decent performance, and on the Pentium 4 (not P4E), the floating point unit does not make use of the L1 cache, and so we block for the L2. However, in these cases we first verified that the win is true and substantial, and we then hand-tuned the cleanup to ameloriate the effects of large NB as best we could. Even so, these systems can display very bad performance due to point (3) above, and we actually do not use the best NB for GEMM performance even so, as we increase it only large enough to get adequate asymptotic performance. Without examining this tradeoffs, you should never increase NB, unless you are tuning for a large GEMM benchmark.

Does ATLAS provide a standard C interface to LAPACK?

The short answer is no, and neither does anyone else. As far as I know, there has been no official standardization of a C interface to LAPACK. In the absence of a standard, each library is free to do things differently. For instance, netlib provides something called clapack, which is the result of running lapack through f2c on a particular platform. This means all paremeters must be passed by reference, names have an underscore appended, etc. ATLAS does not support this adhoc interface, though you can use ATLAS to provide the BLAS for netlib clapack, as mentioned here.

I believe most of the vendors provide some C interface to LAPACK; I think most of them just use the same name (eg, F77's DGESV becomes dgesv), and the scalars become pass-by-value. ATLAS provides a C interface to the LAPACK routines natively provided by ATLAS, which is based on the standardized C interface to the BLAS. These routines are prefixed with clapack_, and their prototypes can be found in ATLAS/include/clapack.h. ATLAS natively provides only a handful of LAPACK routines, so if you want to call something that is not provided here, your best bet is to call the Fortran77 interface. The good news about this is that being a standard interface, all lapack libraries should support it in the same way.

Why are you such a jerk when answering user questions?
AKA: how can I help you feel good about providing me with support?

It is one of the unfortunate realities of open source development that one of the few rewards that it should supply turns out in practice to be a string of disparagement. I am talking, of course, about corresponding with the people who are using the software you have produced and supported, free of charge to them.

I understand why. A user has a problem using/installing/understanding the software, and is understandably frustrated. Only when the frustration has built up to a great degree is he/she motivated to write to the author. At that point, the user emotionally feels that the author has done it to him on purpose, and so, usually without realizing it, attacks the author.

What users should consider is that for an open source developer, support requests constitute 90% of his contact with his users. If all of this contact is negative, it does not lead to a desire on the author's part to provide a lot more support, and in extreme cases, probably causes developers to quit the project.

My guess is I get probably 2 or 3 message per year that have something positive to say about the software that I spend enormous amounts of my life developing, maintaining and supporting, and provide to the user for free. For every positive message, I get many many insulting or denigrating replies. It always adds to the anger quotient to feel that the user thinks so little of you that he thinks nothing of insulting you while using your software and asking for your help, even though you don't know him, he's not intending to do anything for you, and you are providing this stuff for free.

So, I have added this discourse to the FAQ in the hopes of stimulating users to consider the tone of their messages to me, but also to open source projects in general. Remember that the open source developer is not a vendor such as Microsoft or AOL, or even Red Hat, who is in a financial relationship with you, and thus owes you a lot of hand-holding. Instead the author has given all users of the package a gift, and you are now asking for additional help in using it.

Another thing to keep in mind when you are writing your mail is that it is entirely possible the question you are asking is answered in the documentation. If I get several questions about the same topic, I usually wind up creating a FAQ or errata entry about it. In a perfect world, this would mean you would read the docs and not have to ask it. Even assidiously trying to do so, it is easy to miss the relavant doc. So, it is a good idea to couch your language so that the author is not tempted to reply: RTFM, @ASD@!. If a user sends in a mail Hey, I can't get these files to link, any idea what's wrong?, I don't mind giving him the link to the errata entry that's been there since version 0.0.1 of the software. However, the guy with the more common Your libraries do not work style of message simultaneously demotivates me for work on the project, pisses me off, and generates a great desire to return the anger to him with a good old RTFM diatribe.

Even if you are not going to read through all the docs, if you are submitting a support request, at least take the time to read the FAQ entry on how to submit a support request. Almost half my users send in a message that translates to I haven't read the docs, and they post it to the "bugs" list, which is reserved for developer-verified bugs.

Let me show the absolute best kind of support request I get:

Subject: Problems with 3.5.8

I've been using it in my chemistry research in order to do XXX for a couple
of years now, and ATLAS is a great piece of software!  However, I'm
now having a problem getting the newest release to work.
I'm already aching to help this guy. Not only has he indicated he appreciates what I've produced, he's given me an idea of what he is using it for (something I am always interested in). He's also not prejudged that the software or I am wrong. He's having a problem, which he later describes in detail, including the error report. If it's a bug, I'll tell him so and post a fix. It its a user error, I won't mind letting him know the fix, and will feel more like helping him again sometime.

As I said, I get maybe one or two messages of this type a year. I get quite a few absolutely neutral messages, which are OK as well. They don't imply that the problem is necessarily in my software or my brain, but rather just report that a problem has been encountered. They go more like:

Subject : matvec problems
Whenever I call matvec with N=200, my install seems to get worse performance
than the reference BLAS.  Any idea why?  I include my timer below.

My Name
This is a simple request for help, that does allow for the interpretation that there might be an problem with the installation, or perhaps a timer error on the user's part, as well as the idea that ATLAS is screwed up. This is good, because the majority of user requests turn out not to be errors in ATLAS. Nonetheless, here is a more typical phrasing:
Subject : error in matvec
Your matrix-vector product has an error.  For N=200, it is slower than the
reference BLAS!  Please fix this.
Please keep in mind that user error is more common than package error, so keep your message open to this interpretation. Do not use the phrase "there is a bug in your software" (which half my support requests use), unless you are absolutely confident it is a bug, and have verified it by finding the problem in the actual code. Otherwise, the chance is too great that the error is in user understanding, and you have just implied the author screwed something up.

Another thing that users do that drives me crazy is to insist that something done in a way they don't like is a bug, rather than a disagreement between the developer and a user on how things should be done. I'm not sure what the motivation here is, but users will often argue that anything that didn't work the way they expected was a "bug", even if it was documented to work another way. I, like many good software developments, am a bit of pedant, and insisting on calling your preference a bug in my software often irritates me so much that I cannot objectively determine if your preference is preferable (assuming its something I have the freedom to change, which I often don't). Therefore, if you think there are improvements to be made, suggest them as improvements, or things that would be convenient for for you, rather than terming them bugs.

Keep in mind that the author of a package probably consideres himself more knowledgable than the majority of his users on issues closely related to his project, and so it may grate upon him to have users tell him that he has done things the wrong way, or doesn't understand how "modern" libraries work, etc. So, I would avoid phrasings like can you fix the insane way config works? or how about doing this in the correct way (yes, I really get messages like this).

Another important thing is to understand the context the author is working in. If you see a piece of code that is truly horribly written, or works in a particularly awkward way, you may feel justified in arguing that the author has simply done it all wrong, should call it a bug, and fix it! However, the author is not responsible only for the couple hundred lines of code you are examining. In my case, I am responsible for roughly half a million lines of code, as well as continuing development. Therefore, I may actually agree that a particular way of doing things is sub-optimal, but if it is well-tested to work on the enormous number of platforms ATLAS runs under, I will often decide to leave it alone, in order to concentrate on more important concerns. Even in the case where the author agrees with you that the code is a complete POS, a little tact on your part will go a long way. Understanding that the author is not always free to rewrite the section of code of greatest interest to you will go even further.

Well, I am not confident that users will read this, but I am confident that I will use its URL in replying to a lot of future user requests, so I think this time away from development is well spent!