## Speeding Up GER, GERU, GERC, HER, HER2, SYR and SYR2

All of these routines rely on the GER primitive for their performance. The hand-written primitives tried by ATLAS may be found in
   ATLAS/tune/blas/ger/CASES.


Most of the discussion of the GEMV primitives applies to the GER primitives as well, so I assume you have read and are familiar with the concepts discussed above. As before, the routines to be timed are given in a kernel description file, <pre>cases.dsc. GER does not have a transpose case, so this file first lists the number of GER primitives to search, followed by that many primitive lines describing them.

GER primitive lines are of the form:

<ID> <flag> <Xunroll> <Yunroll> <filename> "<author(s)>"


• <ID>: Integer greater than 0 uniquely identifying this entry
• <flag>: is an integer flag which is ignored at the moment
• <Xunroll>: is the unrolling of the loop over the X vector (i.e. the M-loop)
• <Yunroll>: is the unrolling of the loop over the Y vector (i.e. the N-loop)
• <filename>: is the name of the C source file for the primitive.
• <author(s)>: author(s) name(s)

The API for the ger primitive is:

#if defined(SCPLX) || defined(DCPLX)
#ifdef Conj_
ATL_<pre>ger1c_a1_x1_yX
#else
ATL_<pre>ger1u_a1_x1_yX
#endif
#else
ATL_<pre>ger1_a1_x1_yX
#endif
(
const int M,       /* length of X vector */
const int N,       /* length of Y vector */
const SCALAR alpha,/* ignored, assumed to be one */
const TYPE *X,     /* pointer to X vector */
const int incX,    /* ignored, assumed to be one */
const TYPE *Y,     /* pointer to Y vector */
const int incY     /* increment of Y vector; NOTE: NOT IGNORED */
TYPE *A,     /* pointer to column-major matrix */
const int lda,     /* leading dimension of A, or row-stride */
);


Assumptions:

• incX = 1
• Column-major storage of A

Subsections
Clint Whaley 2012-07-10