Limiting Factors in a Dot Product Calculation

The dot product is a simple calculation which reduces two vectors to the sum of their element-wise products. The calculation has a variety of applications and is used heavily in neural networks, linear regression and in search. What are the constraints on its computational performance? The combination of the computational simplicity and its streaming nature
The conventional wisdom is that the stages of a task should be pipelined, so you don’t need to wait for the completion of one stage before the next is started. It surprises me that it seems you can sometimes do better when performing each stage of a pipeline in a short batch. Useful optimisation opportunities
