Large NB means more time in cleanup

One bad news about choosing a large NB is that applications will spend more of their time in cleanup. Let us say you choose a block factor of 120. In this case, many applications will never even call your optimized kernel, but spend all their time in GEMM cleanup. Some applications are staticly blocked, and if their NB is smaller than yours, they can spend their entire time in cleanup even for large problems.

Therefore, if you must choose a large NB in order to get adequate GEMM performance, you must pay an unusual amount of attention to cleanup optimization. However, as the next section will discuss, even if cleanup ran at the same speed as your best kernel, this will yield poor performance for many codes.

Clint Whaley 2012-07-10