ATLAS 3.6.0 Opteron Timings

This first graph compares the performance of LU, Cholesky and matrix multiply on a 1.6Ghz Opteron. We see what we'd expect: 3.6.0 dominates in performance due to it's 88% of peak DGEMM, and the improved SYRK shows up in the fact that Cholesky runs competitively with LU in 3.6, where it did not in 3.4.

Matrix multiply, LU, & Cholesky on 1.6 Ghz Opteron

This next graph, showing the difference between 64 and 32 bit performance, may be more surprising to people:

32-bit vs 64-bit Opteron Results

The gap is even larger for single precision. There are a couple of reasons for the 64 bit performance lead. The first won't go away: under 64-bit mode, you have access to double the number of integer and floating point registers. I would guess that somewhere between one eighth and one quarter of this gap is due to this. The rest of the gap is that I have not applied all the optimizations tricks to the 32 bit code that I did to the 64 bit. Since I am in a perpetual shortage of time, I simply didn't do much special tuning for the 32 bit mode. In fact, it is using a kernel I wrote for the Pentium 4.

I'm pretty happy with this decision, except for the usual gall: Windows. Under windows right now, all people have is the 32 bit mode, and so they get only the crappy performance. Still, Windows (and Windows/cygnus) should eventually get to 64 bit mode, and then you'll see the better curves again.

Back to ATLAS timing page