The generated code handles all cleanup where more
than one dimension is less than the blocking factor. This simplification
allows ATLAS to avoid having to test cases when selecting user
cleanup. Once the matrices in question are larger than
, cleanup
with more than one dimension less than
rapidly stops being a
performance factor. Small matrices where this cleanup is a factor are
almost certainly going to be handled by ATLAS's small-case code anyway,
so it seems unlikely that this simplification will hurt performance in
practice. Section 2.7.5 shows this in a more formal way.
Users need to be very careful when supplying cleanup, because if the user
indicates that a dimension must be a compile-time variable, rather than
a runtime variable, ATLAS will generate up to routines to handle
user cleanup, and since user routines are compiled with all BETA
variants, it is possible to generate
cleanup cases, in addition
to ATLAS's generated cases. It is therefore recommended that the user
supply cleanup that uses run-time arguments whenever possible, and indicate
kernels taking compile-time dimensions as not to be used for cleanup.
Clint Whaley 2012-07-10