You can see the effect of the new SYRK, in that Cholesky is actually faster than LU for larger problems in 3.6, whereas 3.4's Cholesky is never comes close to LU. Other than this, the only real change is the speed increase in gemm, which of course speeds up the other routines.
Just to be cool, here's a graph showing the performance difference in using threads on a uniprocessor hyperthreading P4:
You can see that HT doesn't start to win until very large problems are reached. Still, I was impressed it could deliver any advantage at all, since ATLAS already keeps the pipelines pretty full.