Speeding Up GER, GERU, GERC, HER, HER2, SYR and SYR2

All of these routines rely on the GER primitive for their performance. The hand-written primitives tried by ATLAS may be found in
   ATLAS/tune/blas/ger/CASES.

Most of the discussion of the GEMV primitives applies to the GER primitives as well, so I assume you have read and are familiar with the concepts discussed above. As before, the routines to be timed are given in a kernel description file, <pre>cases.dsc. GER does not have a transpose case, so this file first lists the number of GER primitives to search, followed by that many primitive lines describing them.

GER primitive lines are of the form:

<ID> <flag> <Xunroll> <Yunroll> <filename> "<author(s)>"

The API for the ger primitive is:

#if defined(SCPLX) || defined(DCPLX)
   #ifdef Conj_
      ATL_<pre>ger1c_a1_x1_yX
   #else
      ATL_<pre>ger1u_a1_x1_yX
   #endif
#else
   ATL_<pre>ger1_a1_x1_yX
#endif
   (
      const int M,       /* length of X vector */
      const int N,       /* length of Y vector */
      const SCALAR alpha,/* ignored, assumed to be one */
      const TYPE *X,     /* pointer to X vector */
      const int incX,    /* ignored, assumed to be one */
      const TYPE *Y,     /* pointer to Y vector */
      const int incY     /* increment of Y vector; NOTE: NOT IGNORED */
      TYPE *A,     /* pointer to column-major matrix */
      const int lda,     /* leading dimension of A, or row-stride */
   );

Assumptions:



Subsections
Clint Whaley 2012-07-10