ATLAS FAQ
[Home]
[Docs]
[FAQ]
[Errata]
[Software]
[Install]
[Support]
[Lists]
[Developer home]
[Timings]
ATLAS BASICS:
INSTALL QUESTIONS:
POST-INSTALL QUESTIONS:
[Home]
[Docs]
[FAQ]
[Errata]
[Software]
[Install]
[Support]
[Lists]
[Developer home]
[Timings]
What is ATLAS?
ATLAS stands for Automatically Tuned Linear Algebra Software. ATLAS is
both a research project and a software package. This FAQ describes the
software package. ATLAS's purpose is to provide portably optimal linear
algebra software. The current version provides a complete
BLAS API (for both C and Fortran77),
and a very small subset of the
LAPACK API. For all supported
operations, ATLAS achieves performance on par with machine-specific tuned
libraries.
ATLAS can be used by anyone needing fast linear algebra routines. ATLAS
is used directly by a great many research scientists. Because of the
open nature of ATLAS, we have no way of knowing how many users of ATLAS
there are. In the following paragraphs, we indicate some of the users
that we know about, but this is far from a complete list.
ATLAS is used, or is planned to be used, in the following PSEs:
ATLAS is also included in:
ATLAS may be optionally used by almost any project requiring the BLAS.
Here are some projects that we have seen providing the option for using ATLAS:
Additionally, ATLAS is included in some way by the following OS distributions:
The academic references for ATLAS are given in bibtex format below. If
you want to reference one paper only, probably the newest (first shown)
is the best, as it references the others. The first two papers
contain the bulk of the needed information. Referencing the homepage can
help other researchers find the software.
Note that there have been quite a few subsequent
papers that discuss ATLAS (with varying degrees of accuracy and detail)
written by people not directly involved
in ATLAS's production and design. While these
papers may be about ATLAS, they are not, obviously, primary sources,
and should not be cited as such. If the paper is not authored by Whaley
or Petitet, it is not a primary-source ATLAS paper.
@ARTICLE{whaley04,
AUTHOR = "R. Clint Whaley and Antoine Petitet",
TITLE = "Minimizing development and maintenance costs in supporting
persistently optimized {BLAS}",
JOURNAL= "Software: Practice and Experience",
volume = "35",
number = "2",
pages = "101-121",
month = "February",
YEAR = "2005",
NOTE = {\verb+http://www.cs.utsa.edu/~whaley/papers/spercw04.ps+}
}
@ARTICLE{WN147,
AUTHOR = "R. Clint Whaley and Antoine Petitet and Jack J. Dongarra",
TITLE = "Automated Empirical Optimization of Software and the
{ATLAS} Project",
JOURNAL = "Parallel Computing",
VOLUME = "27",
NUMBER = "1--2",
PAGES = "3--35",
YEAR = 2001,
NOTE = "Also available as University of Tennessee LAPACK Working
Note \#147, UT-CS-00-448, 2000
({\tt www.netlib.org/lapack/lawns/lawn147.ps})" }
@inproceedings{atlas_siam,
AUTHOR = {R. Clint Whaley and Jack Dongarra},
TITLE = "{Automatically Tuned Linear Algebra Software}",
BOOKTITLE = "Ninth SIAM Conference on Parallel Processing for
Scientific Computing",
NOTE = "CD-ROM Proceedings",
YEAR = 1999 }
@inproceedings{atlas_sc98,
AUTHOR = "R. Clint Whaley and Jack Dongarra",
TITLE = "Automatically Tuned Linear Algebra Software",
BOOKTITLE = "SuperComputing 1998: High Performance Networking and Computing",
YEAR = "1998",
NOTE = "CD-ROM Proceedings. {\bf Winner, best paper in the systems
category.}\\
URL: \verb+http://www.cs.utsa.edu/~whaley/papers/atlas_sc98.ps+"
}
@techreport{atlas_wn97,
AUTHOR = {R. Clint Whaley and Jack Dongarra},
TITLE = "{Automatically Tuned Linear Algebra Software}",
INSTITUTION = "University of Tennessee",
YEAR = "1997",
MONTH = "December",
NUMBER = "UT-CS-97-366",
NOTE = "URL : \verb+http://www.netlib.org/lapack/lawns/lawn131.ps+"
}
@UNPUBLISHED{atlas-hp,
TITLE = "ATLAS homepage",
AUTHOR = "{See homepage for details}",
NOTE = "http://math-atlas.sourceforge.net/"
}
ATLAS should produce optimized libraries on almost any platform
possessing an ANSI/ISO C compiler, and some Unix-like command-line tools
(eg., make, cp, etc). ATLAS runs on pretty much all Unix variants
(including embedded systems), as well as Windows (Windows users must install
the free cygnus tools).
ATLAS uses a BSD-style license, without the advertising clause. ATLAS's
license is taken almost verbatim from the example given at
opensource.org. Here is the relevant portion of the license,
as taken from an ATLAS source file:
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions, and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. The name of the ATLAS group or the names of its contributers may
* not be used to endorse or promote products derived from this
* software without specific written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ATLAS GROUP OR ITS CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
To see the exact license, simply edit almost any source
file in the ATLAS tarfile (eg., ATLAS/src/auxil/ATL_lcm.c).
Your first resource for should always be
the ATLAS errata file. This file keeps track
of all discovered errors in ATLAS, and their workarounds or fixes. It
also contains workarounds for common system problems (eg., compiler errors,
non-standard commands, etc), as well containing advice necessary to get
the best performance on various machines.
If you have downloaded the ATLAS source, your ATLAS/doc directory
contains some useful documentation, though it is often more dated than the
info in the errata and online.
If (and only if) neither of these
sources provides the information you need, you can can submit a support
request to:
Do not, under any circumstances, post your support request to the "bug"
tracker. As documented on the tracker itself, this is for developer
confirmed bugs only. All users should use the support or feature request
trackers. Things that turn out to be bugs will later be escalated to the bug
tracker by the confirming developer.
In addition, please understand that the tone of your support request is
important, as described here.
You do not need to create SourceForge account in order to use the tracker
(that persistant plea to "please log in" can be ignored), though it makes
things easier if you do. In particular, if you don't log in, you won't be
able to later attach extra files, etc (you can attach a file in your
initial report, but afterwords, it is unsure you are the original
poster, so it won't allow it). So, if you
think you may need to do this kind of thing relatively often, it may be
worth doing.
Note that you should upload the error_[ARCH].tgz file as well. If
the error killed the ATLAS install before it succesfully created the error
tarfile, create it yourself by issuing the following command from your
BLDdir subdirectory:
make error_report
Note that the [ARCH] of the above directions should be replaced by
your architecture string that ATLAS is using (eg., Linux_P4SSE1
or SunOS_SunUS4, etc).
ATLAS's main job is to provide optimized libraries, so most of the
documentation is on the appropriate APIs. ATLAS does provide some
executables, but these are merely testers and timers for the provided
libraries. A very rough description of the operation of these executables
is given in ATLAS/doc/TestTime.txt in your ATLAS source directory.
Here's some pointers to ATLAS documentation:
- ATLAS stable errata
- Errata file for last stable release. Includes many architecture-specific
install hints, as well as documenting and providing fixes for all known
errors.
- ATLAS
installation guide
- Guide to configuring, installing, and testing (accuracy and efficiency)
ATLAS. Also available in the tarfile at
ATLAS/doc/atlas_install.pdf.
- ATLAS FAQ
- This document. Frequently asked questions about ATLAS.
-
ANSI/ISO C BLAS API reference
- A brief API reference for the ANSI/ISO C interface to the BLAS.
-
Fortran77 BLAS API reference
- A brief API reference for the Fortran77 interface to the BLAS.
-
Fortran77 and C LAPACK API reference
- A brief API reference for the LAPACK routines provided by ATLAS.
- Contributer HOWTO
[ps]
[html]
- Paper describing mechanism to speedup ATLAS by contributing kernels.
- Developer HOWTO
[ps]
[html]
- Paper with information on how to be an ATLAS developer.
- LAPACK homepage
- All kinds of LAPACK-related info given at the LAPACK homepage.
- BLAS homepage
- All kinds of BLAS-related info given at the BLAS homepage.
ATLAS has the following tracker lists:
-
ATLAS technical support
- The place you go once you are sure you have a technical support issue not
covered in the other docs. Anyone can post to this list.
-
ATLAS feature request list
- The place to request additions/extensions to ATLAS.
Anyone can post to this list.
ATLAS also has various mail lists and archives. Anyone can sign up or
post to these guys. They are:
- ATLAS announcement list
[browse]
[subscribe/unsubscribe/preferences]
- Low-volumne list, where significant changes in stable release are
announced. Mainly used to announce new stable releases, perhaps occasional
updates about very important features appearing in developers' release.
Traffic should be O(10) messages a year.
- ATLAS error list
[browse]
[subscribe/unsubscribe/preferences]
- Low traffic list for errors found in the ATLAS stable release. Errors
in config or install portions are generally not reported, but those involving
getting incorrect results or substantial performance penalties are. The idea
is that the people monitoring this list are using a successfully installed
ATLAS library, and want to be informed if errors are discovered.
- ATLAS developer list
[browse]
[subscribe/unsubscribe/preferences]
- Relatively high volume list for ATLAS developer communication.
- ATLAS Performance/results list
[browse]
[subscribe/unsubscribe/preferences]
- Mailing list for reporting performance results from ATLAS.
- ATLAS CVS commits list
[browse]
[subscribe/unsubscribe/preferences]
- High volume traffic list, with a message for each CVS commit.
Unfortunately, we lack the manpower to provide prebuild binaries.
Can I get ATLAS in rpm or .deb or some other format?
Our only supported format is a compressed tarfile. If you really feel the
need for .rpm or .deb versions, other parties (eg, Debian, SuSE) provide
them (note that we can't answer questions on ATLAS installed in this way,
however, since we don't know much about them). ATLAS provided by third
parties may not be as up-to-date, or may run slower than compiled ATLAS
(eg., some companies compile only a couple of x86 libraries, so that they
would use the same library for the P4 and P4E chips, even though
ATLAS should tune itself separately to all the x86 variants for maximal
performance).
ATLAS version numbers look like:
<major number>.<minor number>.<update number>.
The meaning of these terms is:
- Major number
- Major release numbers are changed only when fairly large, sweeping
changes are made. Changes in the API are the most likely to cause a major
release number to incremement. For example, when ATLAS went from
supporting only matrix multiply to all the Level 3 BLAS, the major
number changed; the same happened when ATLAS went from supporting only
Level 3 BLAS to all BLAS.
- Minor number
- Minor release numbers are changed at each official release. Even
numbers represent stable releases, while odd minor numbers are reserved
for developer releases. Click here for an
explanation of developer and stable releases.
- Update number
- Update numbers are essentially patches on a particular release.
For instance, stable ATLAS releases only occur roughly once per year.
As errors are discovered, they are errataed, so that a user can
apply the fixes by hand. When enough errata are built up that it
becomes impractical to apply the important ones by hand, we will issue
an update. So, stable updates are typically bug fixes, or important system
workarounds, while developer releases often involve new code.
A typical number of updates
to a stable release might be something like 4. By their very nature,
each update to the developer release usually contains new code, and they
can happen relatively rapidly. A developer release may have any number of
updates.
So, 3.2.1 would be a stable release, with one group of fixes already applied.
3.3.12 would be the 12th update (13th release) of the associated developer
release.
For ATLAS version 3.3.6 or newer, you can find out version and build
information via the routine
ATL_buildinfo. The following complete program will give build
information (including version number) when linked against version 3.3.6
or later libatlas.a's:
main()
/*
* Compile, link and run with something like:
* gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
* if link fails, you are using ATLAS version older than 3.3.6.
*/
{
void ATL_buildinfo(void);
ATL_buildinfo();
exit(0);
}
If you are using an ATLAS version prior to 3.3.6, there is no easy way
to find the version information without looking at the source.
If you have the source tree around, the easiest fix is to examine pretty
much any source file (eg. ATLAS/src/auxil/ATL_lcm.c); the
major and minor
version number will be given in the copyright notice at the top. To find
out the update number, you'd have to consult the actual routines updated
by the particular update, as given in the ATLAS errata file.
If you still have the directory where you built ATLAS around, you
can find this version information w/o writing the above routine by:
cd BLDdir/bin
make xprint_buildinfo
./xprint_buildinfo
What's the difference between stable and developer
ATLAS releases?
The vast majority of ATLAS users should download and use only the stable
version of ATLAS. Stable versions of ATLAS are released roughly once
a year. The most current stable release has an associated errata file,
which details all errors found in the release. Stable releases are very
well tested, and are much more heavily supported.
Developer releases, on the other hand, are meant to be used, as the name
suggests, by ATLAS developers, contributers, and people happy to live on
the bleeding edge.
Developer releases are meant to allow access to the newest
ATLAS sources, and may represent a simple snapshot of the internal
developer tree. As such, they are essentially untested, and may not
build, much less run, correctly.
So, while they may possess features not available in the current
ATLAS release, only the most experienced of users should consider utilizing
them.
Developer releases are available from the
developer site,
while stable releases are available from the
ATLAS main page. Stable
and developer releases are also distinguished by their version numbers,
as explained here.
The only way to be sure you have the most up-to-date list is to examine
the source in ATLAS/interfaces/lapack/F77/src/. It is pretty
much a foregone conclusion that any documentation, this page included,
will eventually become out of date. ATLAS 3.6 and 3.8 provide C and Fortran77
interfaces to these routines:
- [S,D,C,Z]GESV
- [S,D,C,Z]GETRF
- [S,D,C,Z]GETRS
- [S,D,C,Z]GETRI
- [S,D,C,Z]TRTRI
- [S,D,C,Z]POSV
- [S,D,C,Z]POTRF
- [S,D,C,Z]POTRS
- [S,D,C,Z]POTRI
- [S,D,C,Z]LAUUM
Since LAPACK has no official C API, ATLAS provides its own in
ATLAS/interfaces/lapack/C/src/.
The official header file for the C interface to the BLAS is available
as ATLAS/include/cblas.h. The header file for the
C interface to LAPACK is ATLAS/include/clapack.h.
ATLAS 3.8.0 has prototype support for building dynamic libraries,
as described here.
What's the best hardware for running ATLAS/what machine
do you recommend I buy for this kind of work?
This is another question that is pretty much impossible to answer
generally or keep up to date. I need to update this entry once I
get my hands on the new Opteron!
Yes. CLAPACK gives you the option to compile CLAPACK to use the standard
C interface to the BLAS, which ATLAS provides. If you run CLAPACK's included
BLAS tester, be sure to turn off error-exit tests, since it can't properly
test the error exits returned by the CBLAS. ATLAS provides essentially the
same testers in ATLAS/interfaces/blas/C/testing, which do
correctly test the error exits, if that's important to you.
All of the routines in ATLAS tend to be competitive with the machine-specific
versions for most known architectures. However, ATLAS is not just about
working well on known architectures, but also tries to be optimal for
unknown machines. When it comes to the generality of the optimizations
ATLAS uses, there is a definite heirarchy:
- Level 3 BLAS are very general in their optimizations, and
will be competitive on almost all cache-based architectures possessing
a decent ANSI/ISO C compiler.
- The provided LAPACK routines utilize a recursive algorithm that should
yield reliably better results than the more common staticly-blocked algorithms.
- The Level 1 and 2 BLAS are less general, and may not perform as well on
significantly different architectures (eg., IA64).
If you have a particular operation and/or architecture you really need
optimized, you may want to post a mention of that to
the ATLAS feature request tracker.
We don't do optimization on request, but when we have to choose the next
set of operations to support, user input can certainly influence things.
To maximize your chance of swaying us, you'll want to include what percentage
of your application time is spent in the particular operation, etc.
A quicker way to get action is to do it yourself. ATLAS is open source,
and The developer homepage
explains how you can use ATLAS to optimize various operations.
ATLAS is presently funded by NSF CAREER OCI-1149303.
It was supported in the past by NSF CRI CNS-0551504 and the NSF EPSCoR
Cooperative Agreement No. EPS-1003897, with additional support from
the Looisiana Board of Regents. For more details, see
here.
I originally started ATLAS development when I worked at
the Innovative Computer Laboratory at
the University of Tennessee. I got enough of it working to convince Jack
(Dongarra, of LAPACK and BLAS fame)
to give the development go-ahead on my own time. After that, ATLAS was
written into a variety of grants, but was never funded (to my knowledge)
solely on its own grant at ICL. I believe some of it's later development
took place under the NSF grant "Linear Algebra Algorithms and Tools for
Emerging Computing Environments and User Communities", Grant Number ACI-9813362.
Both Antoine and myself (the two full-time ATLAS researchers and developers)
left ICL in 2001. After this date, ATLAS work was pretty much entirely
unsupported, which slowed development in a massive way.
However, in 2003,
Advanced Micro Devices funded a year of
my graduate studies in return for some Opteron tuning. This
allowed me to spend quite a bit more time on ATLAS than previously,
and resulted in the release of ATLAS 3.6.0 in that year.
After this, I found very little time to work on ATLAS due to faculty duties
until 2006 when I got funding for both research and
maintainence, thanks to both NSF and DoD. Details can be found
here.
ATLAS development has therefore picked up again, and ATLAS 3.8.0 was
released in October of 2007.
At present, the main support for ATLAS comes from my CAREER award. The
government contract was not transfered to LSU, but Tony Castaldo is
supported on it until mid-2014.
Obviously, Sourceforge provides
the the ATLAS main page,
including CVS services, tracker, etc. Also
netlib provides
access for a large part of the mathematical community through
ATLAS's original homepage.
As far as machine access for tuning:
- Apple has provided local G4, G5,
CoreDuo, and Core2Duo access for ATLAS development.
- Kate Minola at the University of Maryland, College Park, provided access
to a wide array of machines for the
most ATLAS 3.8.0, including providing my sole access to solaris/x86
and AIX/POWER4.
- William Stein gave me access to his
UltraSPARC III platform.
- I have access to a wide variety of architectures through my old
colleagues at ICL.
- Advanced Micro Devices provided
local Opteron access for ATLAS3.6 development.
- I've used machines through
Sourceforge's compilefarm.
Note that this question addresses package design and code contribution,
not money,
infrastructure or
testing.
R. Clint Whaley founded the
ATLAS project. After the initial release, he was joined on the project
by Antoine Petitet. Between
them, these two individuals are responsible for 95% of the code in ATLAS,
along with pretty much all of the design. That is not to say that others
have not made substantial contributions, however.
In particular, ATLAS has been designed to allow for outside contribution
such that a user can provide only a very small kernel, and thus speed up
large portions of the library. Many people have contributed in this
manner, and this has resulted in extremely large performance improvements
for ATLAS on certain architectures. These contributers (in alphabetic
order), and a rough sketch of what they have done, are:
- Doug Aberdeen
- His work on emmerald (an SSE-enabled SGEMM), and help on atlas developer
mailing list, was the starting point for a lot of people working on
SSE-enabled kernels.
- Matthew Brett
- Initial work and help with getting ATLAS to build dynamic libraries.
- Nicholas Coult
- Initial version of AltiVec-enabled SGEMM.
- Tony Castaldo
- Discovered importance of issuing instructions in sets of 4 for
PowerPC970, which allowed us to increase dgemm kernel performance
from 75 to 82.5% of peak.
- Markus Dittrich
- Found shell technique allowing configure to pass multiple words as a
single flag.
- Dean Gaudet
- Initial work on using CPUID in configure, Efficeon tuning information,
and many informative atlas-devel discussions on related matters.
- Kazushige Goto
- His ev5/ev6 GEMM was used directly by ATLAS 3.6 and older if the installer
answers "yes"
to its use during the configuration procedure on an alpha processor.
This results in a significant speedup over ATLAS's own GEMM codes, and is
the fastest ev5/ev6 implementation we are aware of. Explicit support
for alphas was dropped in ATLAS 3.8.0.
- Jeff Horner
- Initial versions of Level 3 BLAS tester/timer, C interface to the Level 3
BLAS, and non-generated complex matmul code.
- Camm Maguire
- SSE and SSE2 kernels enabling large speedups for Intel architectures.
Also maintains ATLAS (along with other packages) for debian distribution.
- Tim Mattox and Hank Deitz
- Provided an extremely efficient SGEMM 3DNow! kernel for Athlon.
- Viet Nguyen and
Peter Strazdins
- UltraSparc-optimized [D,Z]GEMM kernels.
- Julian Ruhe
- Incredibly efficient Athlon matmul kernel.
- Peter Soendergaard
- SSE and 3DNow! kernel work, enabling large speedups on Intel and AMD archs.
Also, translation of Julian's Athlon kernels from NASM to gnu assembler,
and extention to all precisions.
- Carl Staelin
- Initial work on parallelizing ATLAS make.
ATLAS testers
Getting a stable release of ATLAS out the door is a seemingly never-ending
task. One of the biggest headaches involves testing ATLAS, which can
run on almost any platform, on enough architectures to be confident that
it is sound enough to be called stable. We have testers that automate this
process, but they typically hog up the entire CPU for more than a day, and
no one likes to have to hassle with running them.
For ATLAS 3.8.0, I did all testing myself, though several developers provided
me with machine access.
It should be completely safe to call any ATLAS routine from a threaded code.
There are no global variables, or other shared information between routines.
Probably the best idea is to say "yes" to threading in config, even if you wish
to do the threading yourself. That way, the ATLAS lib will be compiled with
the threading flags. Then, simply link to the serial interface so that ATLAS
doesn't do the threading. If you want ATLAS to do the threading as well,
simply link to the threaded interface.
No. The maximum number of threads to use is determined at compile time.
ATLAS will never use more than this, but may use less if the problem sizes
are too small to get speedup from the additional parallelism.
Most users are confused by the row major factorization and related solves.
The right-hand side vectors are probably the biggest source of confusion.
The RHS array does not represent a matrix in the mathematical sense, it is
instead a pasting together of the various RHS into one array for calling
convenience. As such, RHS vectors are always stored contiguously, regardless
of the row/col major that is chosen. This means that ldb/ldx is always
independent of NRHS, and dependant on N, regardless of the row/col major
setting.
There are several questions here, handled in their own sub-questions:
- Why don't you improve the type of search?
- Why don't you improve the speed of the search?
- Why don't you improve the accuracy of the search?
- Why don't you search NB > 80?
As you will see by reading each, only the last of these actually would be
helpful for ATLAS's main use, and it has stayed on the backburner for quite
some time because its almost always more useful to expand ATLAS's other
capabilities. Note that the majority of users should use the provided
architectural defaults, thus avoiding the search altogether. The search
is there only for exploration by the expert user (in a user-controlled
fashion), or to enable a naive user to get an adequate library on a
truely new architecture (in its fully automatic mode).
There has been quite a bit of research on fast search techniques. ATLAS
uses a relaxed 1-D line search, where the `relaxed' comes from the fact
that interacting transforms are usually handled by restricted 2/3-D searches.
This is a very basic search technique, and many people wonder why a more
advanced algorithm, such as hill climbing, simulated annealing, or
genetic algorithm isn't used. The real answer is that it is overkill.
Because I understand the transformations ATLAS attempts, and how they
interact, I am able to target the relaxed line search appropriately.
More advanced techniques are more appropriate when you know do not understand
good start values for transforms and less about the
interactions between optimizations and how to resolve them. The modified line
search has some nice properties: it is easily guided by hand by the
expert user in order to expore spaces more fully, and it is easy to
understand and maintain.
I occasionally get suggestions on how to speedup ATLAS empirical search.
I know of a multitude of ways that I could do this. In my view, however, they
are not worth the effort/risk at the present time. Most users should use the
architectural defaults, skipping the search altogether. The only speed
criteria that went into
the search design was that it needed to be tolerable. The main purpose of
ATLAS is to provide an optimized library, and once the search could produce
that in a period of time O(1 day), that seems good enough. Many architectures
are much faster than that, of course.
There are many places in the search where I could prune things back and
have no effect on performance on any known architecture, but since the
speed is adequate, additional search options are left on in case an unexpected
architectural change is found. I could also utilize more sophisticated
sampling techniques, but these would then need to be validated to work
on the vast array of machines (the present search having been tested
for over seven years, and on innumerable architectures). All this is to say
that speeding up the search is not a bad thing, it just is not that helpful
to the core usage of ATLAS, and so it is not worth the cost/risk of change
at this time. If additional tuning capabilities are added, so that the
search time becomes more critical, then of course the search will be
updated.
This is the search problem that I am most tempted to fix. The present
search is mainly designed to be usable by an installer with no system
priveledges, who must install on stock systems that are experiencing
unrelated load during the installation. Thus, by default ATLAS uses
CPU-time for all non-threaded installation decisions, which is extremely
innaccurate. This often leads to the search going awry (i.e., failing
to find a more optimal kernel), which is why the architectural defaults
are so important.
For most systems, you can at least
tell ATLAS's configure to use the cycle-accurate
wall timers, which will make all timings much more accurate if you
are on an unloaded machine.
Ultimately, a search with some real statistics should be built, which
would determine if two timings are statistically different or not, and
also use some statistics to see how many probes are required to get
reliable results. This is the area that I think I a most likely to
improve the search in, if I ever have time.
I split this into several seperate questions:
- Is using the architectural defaults important, rather
than doing my own search?
- When should I not use architectural defaults?
- If I don't use architectural defaults, how can I
get better performance?
Is using the architectural defaults important, rather
than doing my own search?
The short answer is definitely. As described elsewhere
the search is designed to be used only when architectural defaults are
unavailable or have become non-optimal due to compiler change. To understand
this, you need to understand the nature of empirical searches in general.
Empirical searches, when ran on real machines experiencing unrelated load,
are almost never strictly repeatable, even in the best of cases. The default
ATLAS search is far from the best case: the sampling and timing mechanisms
are crude, made to work on the lowest-common denominator setups, up to and
included embedded systems. So, when run in this mode, the search is designed
to give you a library that isn't bad, but is often far from the best.
To get better results (which are then saved as architectural defaults),
I usually run the search multiple times, and if necessary, intervene by
hand to probe promising transformations. Thus, the architectural
defaults can be thought of as a save of several installs + some user
intervention. Also, the architecural defaults are synergistic with the
default compiler flags, so you want to leave both alone for best results.
As previously mentioned, architectural defaults are usually the result
of several guided installations, and thus represent best of breed installs.
They can become a barrier to performance occasionally, particularly when
a compiler goes through a major release. For instance, ATLAS 3.8.0's
architectural
defaults are for gcc 4.2, and you are presently using 5.1, it might be
possible that things have changed enough to require new defaults, and
if you are using a bad compiler like gcc 4.1 or an old one like gcc 3,
you will almost certainly not want to use the architectural defaults.
The first thing to check is that your library runs about the same as those
that you would get using the architectural defaults. You can determine
this by make time,
as described here.
If you choose not to use the recommended compiler and architectural defaults,
be sure to follow these directions.
If you suspect your performance
is suboptimal, open up a support request and ask.
First, make sure your defaults are better than the architectural defaults
by comparing the timings of a default install against your search install,
as described here.
Play with the different compiler and flags to find things that better
match both the defaults, and your output flags. Be sure to do all the normal
post-install tuning, including tuning
CacheEdge.
Finally, if your install is indeed faster than the arch defaults,
report it.
The default ATLAS search limits GEMM's blocking factor to at most 80.
On systems where larger NB actually blocks for the L2, blocking for the
L2 prevents ATLAS from using it's multilevel blocking parameter,
CacheEdge.
In this case, larger blockings may result in superior kernel timings (which
do no L2 blocking), but if an L1-contained NB is used, similar or superior
performance may be obtained in full GEMM with a tuned CacheEdge. In this
case, the GEMM speedup is illusory, but the application and small-case gemm
slowdown (discussed below) is quite real. On machines with large L1, or
very fast L2, GEMM may indeed get a asymptotic speedup from larger blocking
factors, but it is still almost always a bad idea, as outlined below.
There are three main factors why even true asymptotic speedup from large
blocking factors are a bad idea:
- Overall GEMM performance is usually decreased, because more time is
spent in cleanup (called when one or more dimensions are not a multiple of
NB). For many GEMM calls, very large NBs result in calling nothing but
the relatively poorly optimized cleanup.
- Application performance, which usually calls GEMM with varying sizes,
most of which are of modest dimension, spend even greater proportion of their
time in cleanup code, and thus experience much greater slowdown than individual
GEMM calls.
- Many applications have unblocked code, and in some cases, the blocking
factor may not necessarily match GEMMs. In this case, a large NB can
guarantee that all GEMM calls go straight to cleanup. Even when this
doesn't happen, as NB rises, the number of calls to GEMM goes down, and
since large NB gemm requires larger matrices
to reach its asymptotic peak, even recursive factorizations (the best
case for large-NB GEMM) need to get extremely high efficiency for the
large GEMM cases to overcome the slowdown on the small cases. For
fixed-NB cases as in the LAPACK factorizations, large NB are almost
always a loss.
This is the main reason that ATLAS is reluctant to use large NB.
Note that points (2) & (3) are very important: GEMM is one of the most
studied performance kernels in the world not for its own sake, but due
to the wide variety of applications whose performance can be improved by
speeding it up. Thus, speeding up GEMM at the expense of application
performance is something that only someone interested in benchmarking
GEMM (as opposed to building a usuable library) would want to do.
Therefore, the ATLAS search limits NB to 80. We occasionally relax this
limit (manually, never blindly in the search) when it is absolutely necessary.
For instance, on SPARCs, large NB have proven necessary for decent performance,
and on the Pentium 4 (not P4E), the floating point unit does not make use of
the L1 cache, and so we block for the L2. However, in these cases we first
verified that the win is true and substantial, and we then hand-tuned the
cleanup to ameloriate the effects of large NB as best we could. Even so,
these systems can display very bad performance due to point (3) above, and
we actually do not use the best NB for GEMM performance even so, as we
increase it only large enough to get adequate asymptotic performance.
Without examining this tradeoffs, you should never increase NB, unless you
are tuning for a large GEMM benchmark.
The short answer is no, and neither does anyone else. As far as I know, there
has been no official standardization of a C interface to LAPACK. In the
absence of a standard, each library is free to do things differently. For
instance, netlib provides something called clapack, which is the result
of running lapack through f2c on a particular platform. This means all
paremeters must be passed by reference, names have an underscore appended,
etc. ATLAS does not support this adhoc interface, though you can use
ATLAS to provide the BLAS for netlib clapack,
as mentioned
here.
I believe most of the vendors provide some C interface to LAPACK; I think
most of them just use the same name (eg, F77's DGESV becomes
dgesv), and the scalars become pass-by-value. ATLAS provides
a C interface to the LAPACK routines natively provided by ATLAS, which
is based on the standardized C interface to the BLAS. These routines
are prefixed with clapack_, and their prototypes can be found
in ATLAS/include/clapack.h. ATLAS natively provides only a
handful of LAPACK routines, so if you want to call something that is
not provided here, your best bet is to call the Fortran77 interface.
The good news about this is that being a standard interface, all lapack
libraries should support it in the same way.
It is one of the unfortunate realities of open source development
that one of the few rewards that it should supply turns out in practice to
be a string of disparagement. I am talking, of course, about corresponding
with the people who are using the software you have produced and supported,
free of charge to them.
I understand why. A user has a problem using/installing/understanding
the software, and is understandably frustrated. Only when the frustration
has built up to a great degree is he/she motivated to write to the author.
At that point, the user emotionally feels that the author has done it to
him on purpose, and so, usually without realizing it, attacks the author.
What users should consider is that for an open source developer, support
requests constitute 90% of his contact with his users. If all of this
contact is negative, it does not lead to a desire on the author's part
to provide a lot more support, and in extreme cases, probably causes
developers to quit the project.
My guess is I get probably 2 or 3 message per year that have something
positive to say about the software that I spend enormous amounts of
my life developing, maintaining and supporting, and provide to the
user for free. For every positive message, I get many many insulting
or denigrating replies. It always adds to the anger quotient to feel
that the user thinks so little of you that he thinks nothing of insulting
you while using your software and asking for your help, even though
you don't know him, he's not intending to do anything for you, and you
are providing this stuff for free.
So, I have added this discourse to the FAQ in the hopes of stimulating
users to consider the tone of their messages to me, but also
to open source projects in general. Remember that the open source
developer is not a vendor such as Microsoft or AOL, or even Red Hat, who is in a
financial relationship with you, and thus owes you a lot of
hand-holding. Instead the author has given all users of the package a
gift, and you are now asking for additional help in using it.
Another thing to keep in mind when you are writing your mail is that it
is entirely possible the question you are asking is answered in the
documentation. If I get several questions about the same topic, I usually
wind up creating a FAQ or errata entry about it. In a perfect world, this
would mean you would read the docs and not have to ask it. Even assidiously
trying to do so, it is easy to miss the relavant doc. So, it is a good
idea to couch your language so that the author is not tempted to reply:
RTFM, @ASD@!. If a user sends in a mail
Hey, I can't get these files to link, any idea what's wrong?,
I don't mind giving him the link to the errata entry that's been there
since version 0.0.1 of the software. However, the guy with the more
common Your libraries do not work style of message simultaneously
demotivates me for work on the project, pisses me off, and generates a great
desire to return the anger to him with a good old RTFM diatribe.
Even if you are not going to read through all the docs, if you are submitting
a support request, at least take the time to read the
FAQ entry on how
to submit a support request. Almost half my users send in a message that
translates to I haven't read the docs, and they post it to the "bugs"
list, which is reserved for developer-verified bugs.
Let me show the absolute best kind of support request I get:
Subject: Problems with 3.5.8
Hi,
I've been using it in my chemistry research in order to do XXX for a couple
of years now, and ATLAS is a great piece of software! However, I'm
now having a problem getting the newest release to work.
I'm already aching to help this guy. Not only has he indicated he appreciates
what I've produced, he's given me an idea of what he is using it for (something
I am always interested in).
He's also not prejudged that the software or I am wrong. He's having
a problem, which he later describes in detail, including the error report.
If it's a bug, I'll tell him so and post a fix. It its a user error, I
won't mind letting him know the fix, and will feel more like helping him
again sometime.
As I said, I get maybe one or two messages of this type a year. I get
quite a few absolutely neutral messages, which are OK as well. They don't
imply that the problem is necessarily in my software or my brain, but
rather just report that a problem has been encountered. They go more like:
Subject : matvec problems
Whenever I call matvec with N=200, my install seems to get worse performance
than the reference BLAS. Any idea why? I include my timer below.
Thanks,
My Name
This is a simple request for help, that does allow for the interpretation
that there might be an problem with the installation, or perhaps a timer error
on the user's part, as well as the idea that ATLAS is screwed up. This is
good, because the majority of user requests turn out not to be errors in
ATLAS. Nonetheless, here is a more typical phrasing:
Subject : error in matvec
Your matrix-vector product has an error. For N=200, it is slower than the
reference BLAS! Please fix this.
Please keep in mind that user error is more common than package error, so
keep your message open to this interpretation. Do not use the phrase
"there is a bug in your software" (which half my support requests
use), unless you are absolutely confident it is a bug, and have verified it
by finding the problem in the actual code. Otherwise, the chance is too
great that the error is in user understanding, and you have just implied
the author screwed something up.
Another thing that users do that drives me crazy is to insist that
something
done in a way they don't like is a bug, rather than a disagreement
between
the developer and a user on how things should be done. I'm not sure
what the motivation here is, but users will often argue that anything
that didn't work the way they expected was a "bug", even if it was
documented to work another way. I, like many good software developments,
am a bit of pedant, and insisting on calling your preference a bug in my
software often irritates me so much that I cannot objectively determine
if your preference is preferable (assuming its something I have the
freedom to change, which I often don't).
Therefore, if you think there are improvements to be made, suggest them
as improvements, or things that would be convenient for for you, rather
than terming them bugs.
Keep in mind that the author of a package probably consideres himself more
knowledgable than the majority of his users on issues closely related to
his project, and so it may grate upon him
to have users tell him that he has done things the wrong way, or doesn't
understand how "modern" libraries work, etc. So, I would avoid phrasings like
can you fix the insane way config works? or
how about doing this in the correct way (yes, I really get messages
like this).
Another important thing is to understand the context the author is working in.
If you see a piece of code that is truly horribly written, or works in
a particularly awkward way, you may feel justified in arguing that the
author has simply done it all wrong, should call it a bug, and fix it!
However, the author is not responsible only for the couple hundred lines
of code you are examining. In my case, I am responsible for roughly half
a million lines of code, as well as continuing development. Therefore, I may
actually agree that a particular way of doing things is sub-optimal, but if
it is well-tested to work on the enormous number of platforms ATLAS runs
under, I will often decide to leave it alone, in order to concentrate on
more important concerns. Even in the case where the author agrees with you
that the code is a complete POS, a little tact on your part will go a long
way. Understanding that the author is not always free to rewrite the section
of code of greatest interest to you will go even further.
Well, I am not confident that users will read this, but I am confident that
I will use its URL in replying to a lot of future user requests, so I think
this time away from development is well spent!
Cheers,
Clint