###

Importance of cleanup

In analyzing the importance of good cleanup for performance, it is necessary
to recognize the various types that can occur. The cleanup that user's can
supply to ATLAS is *one dimensional cleanup*, i.e., only one of the three
possible dimensions is less than . There is also 2 and 3 dimensional
cleanup. To give an idea of the relative importance of various catagories
of computation, it is roughly true that the matmul kernel is a cubic cost,
the one dimensional cleanup is a square cost, the two dimensional cleanup
is a linear cost, and the three dimensional cleanup is .
This is shown more formally below. Define mod , let
be the dimensional arguments to the *gemmK* and/or cleanup,
and remember that matrix multiplication takes flops, and we see
that the flop count for each catagory is:

Note that the simplified equations to the right of
assume the square case, i.e. . The above analysis can now
be grouped into the catagories of interest as in:

**kernel** :
**1D cleanup**:
**2D cleanup**:
**3D cleanup**:

The simplified equations to the right of the above provide a safe
upper bound on cleanup cost by setting
(in reality, , of course).
With this analysis, we can easily see why it is not important for the
user to be able to contribute 2D and 3D cleanup cases: remember that
all of these kernels are for ATLAS's *large-case* gemm. ATLAS has a
seperate small-case gemm, which is invoked when the problem is so small
that the copy cost is significant compared to the computational
costs. So, in the cases where the 2D cleanup or 3D cleanup
costs are prohibitive, this large-case gemm will probably not be used anyway.

Clint Whaley
2012-07-10