ATLAS 3.5.10 Multiprecision Opteron Timings

OK, ATLAS 3.5.10 was all about getting other precisions to run as fast as double (which I've been steadily tuning for a while). All these timings are on one processor of a dual 1.6 Ghz Opteron. So, the following graphs show the performance of GEMM, LU, SYRK, and Cholesky for all precisions.

[s,c,d,z]GEMM performance on 1 processor of 1.6Ghz Opteron

OK, this is generally what we would like to see. All precisions clock in top performance around 88%. Double real and complex are within clock resolution, as we would hope. Single precision complex is not quite as good as single precision real: this is due to the extra shuffling required at the end of the K-loop. I think it could be made to run a little faster, but frankly I got tired of messing with it. With gemms that are at the right speed, how about LU:

[s,c,d,z]LU performance on 1 processor of 1.6Ghz Opteron

Looks good. What if you've got symmetric matrices:

[s,c,d,z]SYRK performance on 1 processor of 1.6Ghz Opteron

Well, the large-case complex appears to be a little slower than the real. I have a feeling I may have messed up CacheEdge for complex SYRK, so that it is not using the cache as effectively. I need to investigate this. At any rate, the gap is not too large. Last up is Cholesky:

[s,c,d,z]Cholesky performance on 1 processor of 1.6Ghz Opteron


Back to ATLAS timing page