SPRUI04E July 2015 – January 2023
The C code in Section 6.6.2.1 implements a dot product function. The inner loop is unrolled once to take advantage of the C6000's ability to operate on two 16-bit data items in a single 32-bit register. LDW instructions are used to load two consecutive short values. The linear assembly instructions in Section 6.6.2.2 implement the dotp loop kernel. Section 6.6.2.3 shows the loop kernel determined by the assembly optimizer.
For this loop kernel, there are two restrictions associated with the arrays a[ ] and b[ ]:
Bank conflict:
MVK 0, A0
|| MVK 8, B0
LDW *A0, A1
No bank conflict:
MVK 0, A0
|| MVK 4, B0
LDW *A0, A1
|| LDW *B0, B1