The Main GEMM Kernel, gemmK

So, there are actually three gemmK kernels (corresponding to different $\beta$ values), and perform the operations: $C \leftarrow A^T B$, $C \leftarrow A^T B + C$, $C \leftarrow A^T B + \beta C$. All input arrays ($A, B, C$) are column-major (they are still used as performance kernels for row-major BLAS as well, so don't worry). Additionally, $A^T$ and $B$ are in block-major format, such that $lda = ldb = M = N = K = N_B$.


Clint Whaley 2012-07-10