SPRUIG3C User guide | 德州仪器 TI.com.cn

SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1

4.2 Streaming Engines

The SEs provide the most efficient addressing, but are restricted as follows:

There are only two of them.
They can only be used for loads (not stores).
The base address is pre-initialized by the SE_OPEN step; therefore a given SE cannot be shared between multiple loads with different base addresses.

The migration tool allocates the two SE resources to what it considers to be the two highest priority loads. Loads in the innermost loop are considered to have the highest priority. The heuristic simply picks the first two loads in the innermost loop; if there are fewer than two, it moves to the next outer loop.

For SE-based loads, the migration tool generates the following sequence of steps:

In the init() function, the migration tool generates a call to the SE_init() template function in the virtual machine, which returns an SE setup vector. (The ISA spec refers to the setup vector as an SE template; here we use the term setup vector to avoid confusion with the virtual machine’s C++ templates). The setup vector is saved in the tvals structure for later access by the vloops() function. The setup vector consists of static (compile-time) and dynamic (run-time) values. The static values correspond to flags in the SE setup vector and are determined from the distribution mode and data type. These are passed as template parameters to SE_init(). The dynamic values correspond to stride and trip count values that are determined from the terms in the Agen expression and loop trip counts. These are passed as runtime arguments to SE_init().
Also in the init() function, the migration tool generates the expression that represents the base address and saves that in another field of the tvals structure.
In the vloops() function, outside the outermost loop, the migration tool generates a call to the SE_OPEN() intrinsic, passing it both the setup vector and the base address from the tvals structure.
The load instruction in the loop simply uses an __se_ac_<type> intrinsic for the access, which turns into a quasi-register operand SEn++ containing the loaded value.
As an optimization, the compiler may copy-propagate the SEn++ operand into the instruction where the value is used, thereby eliminating the load instruction altogether.
The migration tool generates a call to the SE_CLOSE() intrinsic after the loop nest.