SPRZ536A september   2022  – june 2023 AM69 , AM69A , TDA4AH-Q1 , TDA4AP-Q1 , TDA4VH-Q1 , TDA4VP-Q1

 

  1.   1
  2. 1Modules Affected
  3. 2Nomenclature, Package Symbolization, and Revision Identification
    1. 2.1 Device and Development-Support Tool Nomenclature
    2. 2.2 Devices Supported
    3. 2.3 Package Symbolization and Revision Identification
  4. 3Silicon Revision 1.0 Usage Notes and Advisories
    1. 3.1 Silicon Revision 1.0 Usage Notes
      1.      i2134
    2. 3.2 Silicon Revision 1.0 Advisories
      1.      i2049
      2.      i2062
      3.      i2063
      4.      i2064
      5.      i2065
      6.      i2079
      7.      i2097
      8.      i2102
      9.      i2120
      10.      i2134
      11.      i2137
      12.      i2146
      13.      i2157
      14.      i2159
      15.      i2160
      16.      i2161
      17.      i2163
      18.      i2166
      19.      i2177
      20.      i2189
      21.      i2190
      22.      i2196
      23.      i2197
      24.      i2205
      25.      i2215
      26.      i2216
      27.      i2219
      28.      i2232
      29.      i2234
      30.      i2242
      31.      i2244
      32.      i2245
      33.      i2249
      34.      i2253
      35.      i2271
      36.      i2272
      37.      i2278
      38.      i2279
      39.      i2310
      40.      i2311
      41.      i2312
      42.      i2320
      43.      i2326
      44.      i2351
      45.      i2362
      46.      i2366
      47.      i2371
      48.      i2372
      49.      i2378
      50.      i2381
      51.      i2383
  5.   Trademarks
  6.   Revision History

i2163


UDMAP: UDMA transfers with ICNTs and/or src/dst addr NOT aligned to 64B fail when used in "event trigger" mode

Details:

Note: The following description uses an example a C7x DSP core, but it applies to any other processing cores which can program the UDMA.

For DSP algorithm processing on C6x/C7x, the software often uses UDMA in NavSS or DRU in MSMC. In many cases, UDMA is used instead of DRU, because DRU channels are reserved in many use-cases for C7x/MMA deep learning operations. In a typical DSP algorithm processing, data is DMA'ed block by block to L2 memory for DSP, and DSP operates on the data in L2 memory instead of operating from DDR (through the cache). The typical DMA setup and event trigger for this operation is as below; this is referred to as "2D trigger and wait" in the following example.

For each "frame":

  1. Setup a TR typically 3 or 4 dimension TR.
    1. Set TYPE = 4D_BLOCK_MOVE_REPACKING_INDIRECTION
    2. Set EVENT_SIZE = ICNT2_DEC
    3. Set TRIGGER0 = GLOBAL0
    4. Set TRIGGER0_TYPE = ICNT2_DEC
    5. Set TRIGGER1 = NONE
    6. ICNT0 x ICNT1 is block width x block height
    7. ICNT2 = number of blocks
    8. ICNT3 = 1
    9. src addr = DDR
    10. dst addr = C6x L2 memory
  2. Submit this TR
    1. This TR starts a transfer on GLOBAL TRIGGER0 and transfers ICNT0xICNT1 bytes, then raises an event
  3. For each block do the following:
    1. Trigger DMA by setting GLOBAL TRIGGER0
    2. Wait for the event that indicates that the block is transferred
    3. Do DSP processing
This sequence is a simplified sequence; in the actual algorithm, there can be multiple channels doing DDR to L2 or L2 DDR transfer in a "ping-pong" manner, such that DSP processing and DMA runs in parallel. The event itself is programmed appropriately at the channel OES registers, and the event status check is done using a free bit in IA for UDMA.

When the following conditions occur, the event in step 3.2 is not received for the first trigger:

  • Condition 1: ICNT0xICT1 is NOT a multiple of 64.
  • Condition 2: src or dst is NOT a multiple of 64.
  • Condition 3: ICNT0xICT1 is NOT a multiple of 64 and src/dst address not a multiple of 64
Multiple of 16B or 32B for ICNT0xICNT1 and src/dst addr also has the same issue, where the event is not received. Only alignment of 64B makes it work.

Conditions in which it works:

  • If ICNT0xICNT1 is made a multiple of 64 and src/dst address a multiple of 64, the test case passes.
  • If DRU is used instead of UDMA, then the test passes. You must submit the TR to DRU through the UDMA DRU external channel. With DRU and with ICNTs and src/dst addr unaligned, the user can trigger and get events as expected when TR is programmed such that the number of events and number of triggers in a frame is 1, i.e ICNT2 = 1 in above case or EVENT_SIZE = COMPLETION and trigger is NONE. Then the completion event occurs as expected. This is not feasible to be used by the use-cases in question.
Above is a example for "2D trigger and wait", the same constraint applies for "1D trigger and wait" and "3D trigger and wait":

  • For "1D trigger and wait", ICNT0 MUST be multiple of 64
  • For "3D trigger and wait", ICNT0xICNT1xICNT2 MUST be multiple of 64

Workaround(s):

Set the EOL flag in TR for UDMAP as shown in following example:

  • 1D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL, CSL_UDMAP_TR_FLAGS_EOL_ICNT0);
  • 2D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL, CSL_UDMAP_TR_FLAGS_EOL_ICNT0_ICNT1);
  • 3D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL,CSL_UDMAP_TR_FLAGS_EOL_ICNT0_ICNT1_ICNT2);

There is no performance impact due to this workaround.