There are several assumptions that need to hold true for a user-supplied GEMV
primitive. First, the loop ordering must be that implied by the
`<flag>`

setting
the user supplies in the primitive description file, as discussed in
Section .
Each primitive makes assumptions about the arguments it handles,
and these assumptions are reflected in the routine name. The function name of
a GEMV primitive is:

ATL_<pre>gemv<Trans>_a1_x1_<betanam>_y1where:

`<pre>`

is replaced by the precision prefix:`s`,`d`,`c`, or`z`.`<Trans>`

is replaced by transpose specifier:`N`: NoTranspose`T`: Transpose`C`: Conjugate Transpose`Nc`: NoTranspose, with conjugation

`<betanam>`

is replaced by the beta specifier this kernel supplies. All GEMV kernels must supply the following beta specifiers and names:`b0`:`b1`:`bXi0`: for complex GEMV only, specifies when and , but the imaginary component is zero.`bX`: beta is a input variable without known characteristics.

For a given gemv primitive (either NoTranspose or Transpose), if the cpp macro
`Conj_`

is defined we want the conjugate form of that transpose setting
(i.e., `Nc` or `C`).

Each file is further compiled with differing cpp settings to generate the various beta cases. The beta macro settings and their meanings are:

CPP MACRO | MEANING |

BETA0 |
Primitive should provide |

BETA1 |
Primitive should provide |

BETAX |
Primitive should provide |

BETAXI0 |
For complex only, primitive should provide , |

where the imaginary component of beta is zero. |

In terms of the BLAS API, the GEMV kernels additionally assume

`incX = 1`

`incY = 1`

- Column-major storage of A

Higher level ATLAS routines ensure these assumptions are true before calling the primitive.

Therefore, the routine:

ATL_dgemvN_a1_x1_b0_y1supplies a primitive doing notranspose gemv, on a column-major array with , ,

ATL_cgemvNc_a1_x1_bXi0_y1:supplies a primitive doing notranspose gemv, on a column-major array whose elements should be conjugated before the multiplication, with ,

For greater understanding of how these CPP macros are used to compile multiple primitives from one file, examine the provided CASES files.

The API of the primitive is:

ATL_<pre>gemv<Trans>_a1_x1_<betanam>_y1 ( const int M, /* length of Y vector */ const int N, /* length of X vector */ const SCALAR alpha,/* ignored, assumed to be one */ const TYPE *A, /* pointer to column-major matrix */ const int lda, /* leading dimension of A, or row-stride */ const TYPE *X, /* vector to multiply A by */ const int incX, /* ignored, assumed to be one */ const SCALAR beta, /* value of beta */ TYPE *Y, /* output vector */ const int incY /* ignored, assumed to be one */ );

where,

`<pre>` : |
s |
d |
c |
z |

SCALAR | float |
double |
float* |
float* |

TYPE | float |
double |
float |
float |

Note that the meaning of M and N are slightly different than that used by the Fortran77 API, in that they give the vector lengths, not array dimensions.

Clint Whaley 2012-07-10