ATLAS Opteron Timings

OK, these timings are with ATLAS 3.5.6 on a Dual 1.6Ghz Opteron. I've been steadily tuning the DGEMM kernel for this processor for some time now, and I've finally got it where I'm pretty happy with it. This is for double precision only (I'm not through optimizing the other precisions). So, our first graph shows uniprocessor performance:

DGEMM and Factorizations on 1 Processor of 1.6Ghz Opteron

There are several interesting things about this chart. The first is that ATLAS DGEMM gets 88% of theoretical peak! Less cool, though, we see that small problems factorizations do not get close to GEMM performance, and it's real bad for Cholesky. Even very large problems are suprisingly far from GEMM in my opinion. I think I know how to help the small cases (ATLAS has always concetrated on the large problems, but on this monster machine, problems don't get large until at least 1000 :), but I need to do some profiling runs if I'm going to figure out what is going wrong on the large end there.


Next up are threaded timings using both processors:

DGEMM and Factorizations on 2 Processors of 1.6Ghz Opteron

Again, I'm pretty happy with 85% of theoretical peak for good old DGEMM. I'm less happy with this crappy factorization performance, but there you have it.


Back to ATLAS timing page