ATLAS Timings


[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home] [Timings]

This is a place where we index some performance results for ATLAS. This area will never have anything like a comprehensive series of timings. In particular, it is mainly for timings that have been put into visual formats so people can just scope a graph, etc.

Unless otherwise noted, all timings were obtained using the ATLAS timers in ATLAS/bin, and we were flushing at least three times the actual cache size.

A valuable resource for a greater variety of timings is the ATLAS results mail archive.


Typical ATLAS asymptotic DGEMM performance


Once you've got an install, it can be helpful to see if you are getting the expected performance. The following table gives a rough estimate of ATLAS's asymptotic DGEMM performance as a percentage of peak for a variety of systems. Some variance is expected; as CPU Mhz rises, you may expect percent of peak to diminish slightly, unless caches are enlarged or memory bus speed rises in proportion. Similarly, differing models of the same machine get greater/lesser percent of peak (eg., some USIII get roughly 87%, rather than the 82% shown below). To get an idea of what this is for your system, run DGEMM in the range of say N=1000-2000 (./xdmmtst -N 1000 2000 200), and take the best number. If you are very much below this percent of peak for a similar platform, make sure you are using the architectural defaults and default flags, and if you are and still get poor performance, enter a help request.

ARCHATLASCOMP% PeakPEAK (Gflop) LINK
2.4Ghz Core23.9.5gcc 4.2.389%9.6 NO
900Mhz Itanium23.6.0icc90%3.6 YES
1.6Ghz Opteron3.6.0gcc 88%3.2 YES
1062Mhz UltraSPARC III3.7.8gcc 3.382%2.124 NO
600Mhz Atdlon3.5.7gcc 2.95.380%1.2 YES
2.8Ghz Pentium4E3.7.3gcc 3.3.277%5.6 YES
2.6Ghz Pentium43.6.0gcc77%5.2 YES
1Ghz PentiumIII3.7.7gcc 2.95.376% 1 YES
1Ghz Efficieon3.7.7gcc 3.260%2 YES
1.8Ghz PPC970FX (G5) 3.7.10Apple gcc 3.369%7.2 NO
3.0Ghz P4E EM64T 3.7.10gcc RH 3.2.378%6.0 NO

Note that these numbers reflect asymptotic DGEMM speed only, and having a high percentage does not necessarily make the machine faster for real computational tasks.


3.9 Developer timings:
Multiprocessor timings comparing new (3.9) and old (3.8) threading subystems
Timings showing performance of new threading system on 8 and 4 processor Core2 systems (Linux and Windows, respectively), and a 6-process SiCortex MIPS node.
old 3.7 Developer timings:
Efficeon and Pentium III timings for ATLAS 3.7.3
Serial [D,S]GEMM and [D,S]LU results.
Opteron 64 v 32 bit timings for ATLAS 3.7.1
Pentium 4 and Pentium4E timings for ATLAS 3.7.3
Serial DGEMM and DLU results.
Opteron 64 v 32 bit timings for ATLAS 3.7.1
Serial SGEMM and DGEMM results.
Here are the 3.6 timings:
ATLAS 3.6.0 v 3.4.2 on a 1.6Ghz Opteron
LU, Cholesky and GEMM results.
ATLAS 3.6.0 v 3.4.2 on a 2.6Ghz P4HT
LU, Cholesky and GEMM results, including a graph showing the effects of hyperthreading.
ATLAS 3.6.0 v 3.4.2 on a 900Mhz Itanium 2
LU, Cholesky and GEMM results, including a graph showing the performance bug in TRSM that killed 3.4 performance.
Old 3.5 developer timings:
ATLAS 3.5.6 on a Dual 1.6Ghz Opteron
Serial and dual threaded results for LU and Cholesky factorizations, and matrix multiply (double precision only).
ATLAS 3.5.6 vs. ATLAS 3.4.1 on a 1.7Ghz P4.
Double and single precision real results for LU and matrix multiply. Includes a graph showing the effects of kernel cleanup.
ATLAS 3.5.6 vs. ATLAS 3.4.1 on my 1Ghz PIII laptop
Double and single precision real results for LU and matrix multiply.
ATLAS 3.5.7 vs. ATLAS 3.5.6 Athlon & Opteron
Double precision real results for 3.5.7's improved SYRK and Cholesky on an Athlon and Opteron.
ATLAS 3.5.10 on Opteron
Multiprecision ATLAS 3.5.10 results for GEMM, SYRK, LU, and Cholesky on one processor of a 1.6 Ghz Opteron.

Get Automatically Tuned Linear Algebra Soft. at SourceForge.net. Fast, secure and Free Open Source software downloads UTSA/CS Logo ICL Logo