When presented with these options, you may be tempted to ask what cases you should optimize. On the other hand, you might also validly ask: isn't it obvious that it is always incX = incY = 1, and why did you bother building in this special case flexibility?
First, it must be acknowledged that on most systems, bandwidth contraints will make any non-unit optimization tough. Doesn't mean it can't be done, though, at least to some degree.
For an example of why you want some flexibility, consider the COPY routine. Of course, the most optimizable routine is incX = incY = 1. However, one big use of the routine I make myself is to copy from noncontiguous storage to unit stride, so that more efficient access is possible. This suggests that the ability to make incX arbitrary and incY = 1 might be useful in this routine.
For an example of a non-unit fixed stride, think of doing conjugation on a complex vector. That is essentially a real SCAL, with incX = 2 and alpha = -1.0, and if this was important to you, you could optimize that exact case.
Ultimately, one can never foresee when flexibility will be needed, anyway. With the present case, a user that knew he was accessing a vector with increments of 50, 100, 500 all the time, could create special cases for them . . .
All that said, incX=incY=1 is often the only case where optimization will have a noticable effect, so it's where I'd concentrate my efforts as long as there's not some reason to do otherwise.
Clint Whaley 2012-07-10