ATLAS Opteron Timings
OK, these timings are with ATLAS 3.5.6 on a Dual 1.6Ghz Opteron. I've been
steadily tuning the DGEMM kernel for this processor for some time now,
and I've finally got it where I'm pretty happy with it. This
is for double precision only (I'm not through optimizing the other precisions).
So, our first graph shows uniprocessor performance:
DGEMM and Factorizations on 1 Processor of 1.6Ghz Opteron
There are several interesting things about this chart. The first is that
ATLAS DGEMM gets 88% of theoretical peak! Less cool, though, we see that
small problems factorizations do not get close to GEMM performance, and it's
real bad for Cholesky. Even very large problems are suprisingly far from
GEMM in my opinion. I think I know how to help the small cases (ATLAS has
always concetrated on the large problems, but on this monster machine,
problems don't get large until at least 1000 :), but I need
to do some profiling runs if I'm going to figure out what is going wrong on the
large end there.
Next up are threaded timings using both processors:
DGEMM and Factorizations on 2 Processors of 1.6Ghz Opteron
Again, I'm pretty happy with 85% of theoretical peak for good old DGEMM.
I'm less happy with this crappy factorization performance, but there you have
it.
Back to ATLAS timing page