Importance of cleanup

In analyzing the importance of good cleanup for performance, it is necessary to recognize the various types that can occur. The cleanup that user's can supply to ATLAS is one dimensional cleanup, i.e., only one of the three possible dimensions is less than $N_B$. There is also 2 and 3 dimensional cleanup. To give an idea of the relative importance of various catagories of computation, it is roughly true that the matmul kernel is a cubic cost, the one dimensional cleanup is a square cost, the two dimensional cleanup is a linear cost, and the three dimensional cleanup is $O(1)$.

This is shown more formally below. Define $M_r = M$ mod $N_B$, let $m, n, k$ be the dimensional arguments to the gemmK and/or cleanup, and remember that matrix multiplication takes $2 M N K$ flops, and we see that the flop count for each catagory is:

Note that the simplified equations to the right of $\Longrightarrow$ assume the square case, i.e. $M = K = N$. The above analysis can now be grouped into the catagories of interest as in:

The simplified equations to the right of the $<$ above provide a safe upper bound on cleanup cost by setting $N_r = N_B$ (in reality, $0 < N_r < N_B$, of course).

With this analysis, we can easily see why it is not important for the user to be able to contribute 2D and 3D cleanup cases: remember that all of these kernels are for ATLAS's large-case gemm. ATLAS has a seperate small-case gemm, which is invoked when the problem is so small that the $2 N^2$ copy cost is significant compared to the $2 N^3$ computational costs. So, in the cases where the $O(N)$ 2D cleanup or $O(1)$ 3D cleanup costs are prohibitive, this large-case gemm will probably not be used anyway.

Clint Whaley 2012-07-10