SPRUIG3C January   2018  – August 2019 TDA4VM , TDA4VM-Q1

 

  1.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  2. 1Overview and Scope
    1. 1.1 Comparing VCOP and C7000
    2. 1.2 About this Document
      1. 1.2.1 Documentation Conventions
    3. 1.3 Output Format
    4. 1.4 Data Types
      1. 1.4.1 40-bit Incompatibilities
      2. 1.4.2 40-Bit Detection in Host Emulation Mode
    5. 1.5 SIMD Width
    6. 1.6 VCOP Virtual Machine
  3. 2Kernel API
    1. 2.1 Overview
    2. 2.2 Parameter Block
      1. 2.2.1 Tvals Structure
      2. 2.2.2 Pblock Manipulation
  4. 3Loop Control
    1. 3.1 Overview
    2. 3.2 Loop Control and Nested Loops
    3. 3.3 Repeat Loops
    4. 3.4 Compound Conditions
    5. 3.5 Early Exit
  5. 4Addressing
    1. 4.1 Overview
    2. 4.2 Streaming Engines
    3. 4.3 Streaming Address Generators
    4. 4.4 Indexed Addressing
    5. 4.5 Circular Addressing
  6. 5Operations
    1. 5.1 Load Operations
    2. 5.2 Store Operations
      1. 5.2.1 Predicated Stores
      2. 5.2.2 Scatter and Transposing Stores
      3. 5.2.3 Optimization of OFFSET_NP1-Based Transpose
      4. 5.2.4 Rounding Stores
      5. 5.2.5 Saturating Stores
    3. 5.3 Arithmetic Operations
      1. 5.3.1 Vector Compares
      2. 5.3.2 Multiplication with Rounding, Truncation, or Left Shift
    4. 5.4 Lookup and Histogram Table Operations
      1. 5.4.1 Determination of Table Size
      2. 5.4.2 Table Configuration
      3. 5.4.3 Copy-in Operation
      4. 5.4.4 Copy-out Operation
      5. 5.4.5 Index Adjustment from Non-zero Agen
      6. 5.4.6 Lookup Operation
      7. 5.4.7 Histogram Update Operation
      8. 5.4.8 16-Way Lookup and Histogram
  7. 6Performance
    1. 6.1 Overview
    2. 6.2 Compiler Requirements
    3. 6.3 Automatic Performance Profiling
    4. 6.4 Performance Options
  8.   A Warnings and Notes
    1.     A.1 Compatibility Warnings
    2.     A.2 Efficiency Warnings

Saturating Stores

Like rounding, a VCOP store that includes saturation is translated as if were two operations: an explicit saturation operation, followed by the store. The saturation operation operates on 32-bit elements regardless of the data type of the store. VCOP supports several forms of saturation: SYMM, ASYMM, 4PARAM, and so on. Fundamentally, all of the forms operate as follows:

saturate(min, minset, max, maxset) = (x < min) ? minset 
                                   : (x > max) ? maxset : x; 

In the typical case when the saturation bounds are the same as the min/max set-to values, that is min == minset and max == maxset, the C7x translation is an efficient two instruction sequence:

VMINW        Vsrc,Vmax,Vdst 
VMAXW        Vdst,Vmin,Vdst 

If saturation bounds are to a power of 2 boundary, such as from 0 to 255, a single saturation instruction is used:

VGSATUW      Vsrc,Cwidth,Vdst 

If saturation bounds are different from the min/max set-to values, a less efficient 4-instruction sequence is required:

VCMPGTW      Vmin,Vsrc,Pred0 
VSEL         Pred0,Vminset,Vsrc 
VCMPGTW      Vmax,Vsrc,Pred1 
VSEL         Pred1,Vsrc,Vmaxset 

For unsigned vectors (see Section 1.5), unsigned forms of the compares are used.

C7x has a dedicated instruction for saturating to the range of a signed 16-bit value: VSATWH. Therefore the following statement:

__vptr_s16 out; 
out[Agen] = Vsrc.saturate(); // saturates to (-32768, 32767) 

translates to a single instruction.

VCC removes saturations that it determines to have no effect.