This is shown more formally below. Define mod
, let
be the dimensional arguments to the gemmK and/or cleanup,
and remember that matrix multiplication takes
flops, and we see
that the flop count for each catagory is:
Note that the simplified equations to the right of
assume the square case, i.e.
. The above analysis can now
be grouped into the catagories of interest as in:
With this analysis, we can easily see why it is not important for the
user to be able to contribute 2D and 3D cleanup cases: remember that
all of these kernels are for ATLAS's large-case gemm. ATLAS has a
seperate small-case gemm, which is invoked when the problem is so small
that the copy cost is significant compared to the
computational
costs. So, in the cases where the
2D cleanup or
3D cleanup
costs are prohibitive, this large-case gemm will probably not be used anyway.
Clint Whaley 2012-07-10