This is shown more formally below. Define mod , let be the dimensional arguments to the gemmK and/or cleanup, and remember that matrix multiplication takes flops, and we see that the flop count for each catagory is:
Note that the simplified equations to the right of assume the square case, i.e. . The above analysis can now be grouped into the catagories of interest as in:
With this analysis, we can easily see why it is not important for the user to be able to contribute 2D and 3D cleanup cases: remember that all of these kernels are for ATLAS's large-case gemm. ATLAS has a seperate small-case gemm, which is invoked when the problem is so small that the copy cost is significant compared to the computational costs. So, in the cases where the 2D cleanup or 3D cleanup costs are prohibitive, this large-case gemm will probably not be used anyway.
Clint Whaley 2012-07-10