• Menu
  • Product
  • Email
  • PDF
  • Order now
  • TDA2xx and TDA2ex Performance

    • SPRAC21A June   2016  – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV

       

  • CONTENTS
  • SEARCH
  • TDA2xx and TDA2ex Performance
  1.   TDA2xx and TDA2ex Performance
    1.     Trademarks
    2. 1  SoC Overview
      1. 1.1 Introduction
      2. 1.2 Acronyms and Definitions
      3. 1.3 TDA2xx and TDA2ex System Interconnect
      4. 1.4 Traffic Regulation Within the Interconnect
        1. 1.4.1 Bandwidth Regulators
        2. 1.4.2 Bandwidth Limiters
        3. 1.4.3 Initiator Priority
      5. 1.5 TDA2xx and TDA2ex Memory Subsystem
        1. 1.5.1 Controller/PHY Timing Parameters
        2. 1.5.2 Class of Service
        3. 1.5.3 Prioritization Between DMM/SYS PORT or MPU Port to EMIF
      6. 1.6 TDA2xx and TDA2ex Measurement Operating Frequencies
      7. 1.7 System Instrumentation and Measurement Methodology
        1. 1.7.1 GP Timers
        2. 1.7.2 L3 Statistic Collectors
    3. 2  Cortex-A15
      1. 2.1 Level1 and Level2 Cache
      2. 2.2 MMU
      3. 2.3 Performance Control Mechanisms
        1. 2.3.1 Cortex-A15 Knobs
        2. 2.3.2 MMU Page Table Knobs
      4. 2.4 Cortex-A15 CPU Read and Write Performance
        1. 2.4.1 Cortex-A15 Functions
        2. 2.4.2 Setup Limitations
        3. 2.4.3 System Performance
          1. 2.4.3.1 Cortex-A15 Stand-Alone Memory Read, Write, Copy
          2. 2.4.3.2 Results
    4. 3  System Enhanced Direct Memory Access (System EDMA)
      1. 3.1 System EDMA Performance
        1. 3.1.1 System EDMA Read and Write
        2. 3.1.2 System EDMA Results
      2. 3.2 System EDMA Observations
    5. 4  DSP Subsystem EDMA
      1. 4.1 DSP Subsystem EDMA Performance
        1. 4.1.1 DSP Subsystem EDMA Read and Write
        2. 4.1.2 DSP Subsystem EDMA Results
      2. 4.2 DSP Subsystem EDMA Observations
    6. 5  Embedded Vision Engine (EVE) Subsystem EDMA
      1. 5.1 EVE EDMA Performance
        1. 5.1.1 EVE EDMA Read and Write
        2. 5.1.2 EVE EDMA Results
      2. 5.2 EVE EDMA Observations
    7. 6  DSP CPU
      1. 6.1 DSP CPU Performance
        1. 6.1.1 DSP CPU Read and Write
        2. 6.1.2 Code Setup
          1. 6.1.2.1 Pipeline Copy
          2. 6.1.2.2 Pipeline Read
          3. 6.1.2.3 Pipeline Write
          4. 6.1.2.4 L2 Stride-Jmp Copy
          5. 6.1.2.5 L2 Stride-Jmp Read
          6. 6.1.2.6 L2 Stride-Jmp Write
      2. 6.2 DSP CPU Observations
      3. 6.3 Summary
    8. 7  Cortex-M4 (IPU)
      1. 7.1 Cortex-M4 CPU Performance
        1. 7.1.1 Cortex-M4 CPU Read and Write
        2. 7.1.2 Code Setup
        3. 7.1.3 Cortex-M4 Functions
        4. 7.1.4 Setup Limitations
      2. 7.2 Cortex-M4 CPU Observations
        1. 7.2.1 Cache Disable
        2. 7.2.2 Cache Enable
      3. 7.3 Summary
    9. 8  USB IP
      1. 8.1 Overview
      2. 8.2 USB IP Performance
        1. 8.2.1 Test Setup
        2. 8.2.2 Results and Observations
        3. 8.2.3 Summary
    10. 9  PCIe IP
      1. 9.1 Overview
      2. 9.2 PCIe IP Performance
        1. 9.2.1 Test Setup
        2. 9.2.2 Results and Observations
    11. 10 IVA-HD IP
      1. 10.1 Overview
      2. 10.2 H.264 Decoder
        1. 10.2.1 Description
        2. 10.2.2 Test Setup
        3. 10.2.3 Test Results
      3. 10.3 MJPEG Decoder
        1. 10.3.1 Description
        2. 10.3.2 Test Setup
        3. 10.3.3 Test Results
    12. 11 MMC IP
      1. 11.1 MMC Read and Write Performance
        1. 11.1.1 Test Description
        2. 11.1.2 Test Results
      2. 11.2 Summary
    13. 12 SATA IP
      1. 12.1 SATA Read and Write Performance
        1. 12.1.1 Test Setup
        2. 12.1.2 Observations
          1. 12.1.2.1 RAW Performance
          2. 12.1.2.2 SDK Performance
      2. 12.2 Summary
    14. 13 GMAC IP
      1. 13.1 GMAC Receive/Transmit Performance
        1. 13.1.1 Test Setup
        2. 13.1.2 Test Description
          1. 13.1.2.1 CPPI Buffer Descriptors
        3. 13.1.3 Test Results
          1. 13.1.3.1 Receive/Transmit Mode (see )
          2. 13.1.3.2 Receive Only Mode (see )
          3. 13.1.3.3 Transmit Only Mode (see )
      2. 13.2 Summary
    15. 14 GPMC IP
      1. 14.1 GPMC Read and Write Performance
        1. 14.1.1 Test Setup
          1. 14.1.1.1 NAND Flash
          2. 14.1.1.2 NOR Flash
        2. 14.1.2 Test Description
          1. 14.1.2.1 Asynchronous NAND Flash Read/Write Using CPU Prefetch Mode
          2. 14.1.2.2 Asynchronous NOR Flash Single Read
          3. 14.1.2.3 Asynchronous NOR Flash Page Read
          4. 14.1.2.4 Asynchronous NOR Flash Single Write
        3. 14.1.3 Test Results
      2. 14.2 Summary
    16. 15 QSPI IP
      1. 15.1 QSPI Read and Write Performance
        1. 15.1.1 Test Setup
        2. 15.1.2 Test Results
        3. 15.1.3 Analysis
          1. 15.1.3.1 Theoretical Calculations
          2. 15.1.3.2 % Efficiency
      2. 15.2 QSPI XIP Code Execution Performance
      3. 15.3 Summary
    17. 16 Standard Benchmarks
      1. 16.1 Dhrystone
        1. 16.1.1 Cortex-A15 Tests and Results
        2. 16.1.2 Cortex-M4 Tests and Results
      2. 16.2 LMbench
        1. 16.2.1 LMbench Bandwidth
          1. 16.2.1.1 TDA2xx and TDA2ex Cortex-A15 LMbench Bandwidth Results
          2. 16.2.1.2 TDA2xx and TDA2ex Cortex-M4 LMBench Bandwidth Results
          3. 16.2.1.3 Analysis
        2. 16.2.2 LMbench Latency
          1. 16.2.2.1 TDA2xx and TDA2ex Cortex-A15 LMbench Latency Results
          2. 16.2.2.2 TDA2xx and TDA2ex Cortex-M4 LMbench Latency Results
          3. 16.2.2.3 Analysis
      3. 16.3 STREAM
        1. 16.3.1 TDA2xx and TDA2ex Cortex-A15 STREAM Benchmark Results
        2. 16.3.2 TDA2xx and TDA2ex Cortex-M4 STREAM Benchmark Results
    18. 17 Error Checking and Correction (ECC)
      1. 17.1 OCMC ECC Programming
      2. 17.2 EMIF ECC Programming
      3. 17.3 EMIF ECC Programming to Starterware Code Mapping
      4. 17.4 Careabouts of Using EMIF ECC
        1. 17.4.1 Restrictions Due to Non-Availability of Read Modify Write ECC Support in EMIF
          1. 17.4.1.1 Un-Cached CPU Access of EMIF
          2. 17.4.1.2 Cached CPU Access of EMIF
          3. 17.4.1.3 Non CPU Access of EMIF Memory
          4. 17.4.1.4 Debugger Access of EMIF via the Memory Browser/Watch Window
          5. 17.4.1.5 Software Breakpoints While Debugging
        2. 17.4.2 Compiler Optimization
        3. 17.4.3 Restrictions Due to i882 Errata
        4. 17.4.4 How to Find Who Caused the Unaligned Quanta Writes After the Interrupt
      5. 17.5 Impact of ECC on Performance
    19. 18 DDR3 Interleaved vs Non-Interleaved
      1. 18.1 Interleaved versus Non-Interleaved Setup
      2. 18.2 Impact of Interleaved vs Non-Interleaved DDR3 for a Single Initiator
      3. 18.3 Impact of Interleaved vs Non-Interleaved DDR3 for Multiple Initiators
    20. 19 DDR3 vs DDR2 Performance
      1. 19.1 Impact of DDR2 vs DDR3 for a Single Initiator
      2. 19.2 Impact of DDR2 vs DDR3 for Multiple Initiators
    21. 20 Boot Time Profile
      1. 20.1 ROM Boot Time Profile
      2. 20.2 System Boot Time Profile
    22. 21 L3 Statistics Collector Programming Model
    23. 22 Reference
  2.   Revision History
  3. IMPORTANT NOTICE
search No matches found.
  • Full reading width
    • Full reading width
    • Comfortable reading width
    • Expanded reading width
  • Card for each section
  • Card with all content

 

APPLICATION NOTE

TDA2xx and TDA2ex Performance

TDA2xx and TDA2ex Performance

This application report provides information on the TDA2xx and TDA2ex device throughput performances and describes the TDA2xx and TDA2ex System-on-Chip (SoC) architecture, data path infrastructure, and constraints that affect the throughput and different optimization techniques for optimum system performance. This document also provides information on the maximum possible throughput performance of different peripherals on the SoC.

Trademarks

NEON is a registered trademark of ARM Limited.

Cortex, Arm are registered trademarks of Arm Limited.

DesignWare is a registered trademark of Synopsys, Inc.

OMAP is a registered trademark of TI.

1 SoC Overview

1.1 Introduction

TDA2xx and TDA2ex are high-performance, infotainment-application devices, based on the enhanced OMAP™ architecture integrated on a 28-nm technology. The architecture is designed for advanced graphical HMI and Navigation, Digital and Analog Radio, Rear Seat Entertainment and Multimedia playback, providing Advanced Driver Assistance integration capabilities with Video analytics support, and best-in-class CPU performance, video, image, and graphics processing sufficient to support, among others. Figure 1 shows high-level block diagrams of the processors, interfaces, and peripherals and their various operating frequencies.

intro_spruhk5-002.gif
A. The speeds shown are the highest design targeted, non-binned OPP.
Figure 1. TDA2xx Device Block Diagram

1.2 Acronyms and Definitions

Table 1. Acronyms and Definitions

Term Definition
AESCTR Advanced Encryption Standard Counter Mode
AESECB Advanced Encryption Standard Electronic Code Book
BL Bandwidth Limiter (in L3)
BW Bandwidth (measured in megabytes per second (MB/s))
BWR Bandwidth Regulator (in L3)
CBC-MAC Cipher Block Chaining-Message Authentication Code
CCM CBC-MAC + CTR
CPU/MPU Cortex®-A15
DDR Double Data Rate
DVFS Dynamic Voltage and Frequency Scaling
EVE Embedded Vision Engine
GCM Galois Counter Mode
GMAC Galois Message Authentication Code
LPDDR Low-Power Double Data Rate
NVM Non-Volatile Memory
OCP Open Core Protocol
OTFA On-The-Fly Advanced Encryption Standard
PHY Hard macro that converts single data rate signals to double data rate
QSPI Quad Serial Peripheral Interface
ROM On-Chip ROM Bootloader
SDRAM Synchronous Dynamic Random Access Memory
SoC System On-Chip
Stat Coll Statistics collector in interconnect

1.3 TDA2xx and TDA2ex System Interconnect

The system’s interconnect and master to slave connection in the TDA2xx and TDA2ex devices is shown in Figure 2. For those initiators and peripherals not supported by TDA2ex, the slave and master ports to L3 are tied to a default value resulting in errors when trying to access a peripheral not present in the device.

vayu_soc_interconnect_diagram_sprac21.pngFigure 2. TDA2xx and TDA2ex SoC Interconnect Diagram

Broadly, the list of masters and slaves in the system are listed in Table 2 and Table 3.

Table 2. List of Master Ports in TDA2xx and TDA2ex

Master Supported Maximum Tag Number Maximum Burst Size (Bytes) Type
MPU 32 120 RW
CS_DAP 1 4 RW
IEEE1500_2_OCP 1 4 RW
DMA_SYSTEM RD 4 128 RO
MMU1 33 128 RW
DMA_CRYPTO RD 4 124 RO
DMA_CRYPTO WR 2 124 WR
TC1_EDMA RD 32 128 RO
TC2_EDMA RD 32 128 RO
TC1_EDMA WR 32 128 WR
TC2_EDMA WR 32 128 WR
DMA_SYSTEM WR 2 128 WR
DSP1 CFG 33 128 RW
DSP1 DMA 33 128 RW
DSP1 MDMA 33 128 RW
DSP2 CFG (1) 33 128 RW
DSP2 DMA (1) 33 128 RW
DSP2 MDMA (1) 33 128 RW
CSI2_1 1 128 WR
IPU1 8 56 RW
IPU2 8 56 RW
EVE1 P1 (1) 17 128 RW
EVE1 P2 (1) 17 128 RW
EVE2 P1 (1) 17 128 RW
EVE2 P2 (1) 17 128 RW
EVE3 P1 (1) 17 128 RW
EVE3 P2 (1) 17 128 RW
EVE4 P1 (1) 17 128 RW
EVE4 P2 (1) 17 128 RW
PRUSS1 PRU1 2 128 RW
PRUSS1 PRU2 2 128 RW
PRUSS2 PRU1 2 128 RW
PRUSS2 PRU2 2 128 RW
GMAC_SW 2 128 RW
SATA 2 128 RW
MMC1 1 124 RW
MMC2 1 124 RW
USB3_SS 32 128 RW
USB2_SS 32 128 RW
USB2_ULPI_SS1 32 128 RW
USB2_ULPI_SS2 32 128 RW
GPU P1 16 128 RW
MLB 1 124 RW
PCIe_SS1 16 128 RW
PCIe_SS2 16 128 RW
MMU2 33 128 RW
VIP1 P1 16 128 RW
VIP1 P2 16 128 RW
VIP2 P1 16 128 RW
VIP2 P2 16 128 RW
VIP3 P1 16 128 RW
VIP3 P2 16 128 RW
DSS 16 128 RW
GPU P2 16 128 RW
GRPX2D P1 32 128 RW
GRPX2D P2 32 128 RW
VPE P1 16 128 RW
VPE P2 16 128 RW
IVA 16 128 RW
  1. Not present in TDA2ex.

Table 3. List of L3 Slaves in TDA2xx and TDA2ex

Slave Tag Number Maximum Burst Size (Bytes)
GPMC 1 124
GPU 1 8
IVA1 SL2IF 1 16
OCMC_RAM1 1 128
DSS 1 124
IVA1 CONFIG 1 124
IPU1 1 56
AES1 1 4
AES2 1 4
SHA2MD5_1 1 4
DMM P1 32 128
DMM P2 32 128
L4_WKUP 1 124
IPU2 1 56
OCMC_RAM2 (1) 1 128
OCMC_RAM3 (1) 1 128
DSP1 SDMA 1 128
DSP2 SDMA (1) 1 128
OCMC_ROM 1 16
TPCC_EDMA 1 128
PCIe SS1 1 120
VCP1 1 128
L3_INSTR 1 128
DEBUGSS CT_TBR 1 128
QSPI 256 128
VCP2 1 128
TC1_EDMA 1 128
TC2_EDMA 1 128
McASP1 1 128
McASP2 1 128
McASP3 1 128
PCIe SS2 1 120
SPARE_TSC_ADC 1 128
GRPX2D 1 4
EVE1 (1) 16 128
EVE2 (1) 16 128
EVE3 (1) 16 128
EVE4 (1) 16 128
PRUSS1 1 128
PRUSS2 1 128
MMU 1 32 128
MMU 2 32 128
SHA2MD5_2 1 4
L4_CFG 1 124
L4_PER1 P1 1 124
L4_PER1 P2 1 124
L4_PER1 P3 1 124
L4_PER2 P1 1 124
L4_PER2 P2 1 124
L4_PER2 P3 1 124
L4_PER3 P1 1 124
L4_PER3 P2 1 124
L4_PER3 P3 1 124
  1. Not present in TDA2ex.

The L3 high-performance interconnect is based on a Network-On-Chip (NoC) interconnect infrastructure. The NoC uses an internal packet-based protocol for forward (read command, write command with data payload) and backward (read response with data payload, write response) transactions. All exposed interfaces of this NoC interconnect, both for Targets and Initiators; comply with the OCP IP2.x reference standard.

1.4 Traffic Regulation Within the Interconnect

The interconnect has internal components that can aid in the traffic regulation from a specific initiator to a specific target. The components are called Bandwidth Regulators and Bandwidth limiters. Additionally, the initiator IPs can set their respective MFLAG or MREQPRIORITY signals, which is understood by the interconnect and subsequently DMM/EMIF to give priority to a given initiator.

The default value of the various traffic regulator within the interconnect is set to a default that allows most use case to work without any tweaking. However, if there is any customization needed for a given use case, it is possible by various programmable parameters explained in subsequent sections.

1.4.1 Bandwidth Regulators

The bandwidth regulators prevent master NIUs from consuming too much bandwidth of a link or a slave NIU that is shared between several data flows: packets are then transported at a slower rate. The value of a bandwidth can be programmed in the bandwidth regulator. When the bandwidth is below the programmed value, the pressure bit is set to 1, giving priority to this master. When the bandwidth is above the programmed value, the pressure bit is set to 0 and the concerned master has the same weight as others.

Bandwidth regulators are by default enabled in interconnect with the default configuration such that the expected average bandwidth is set to zero. With any amount of traffic, the actual average bandwidth seen will be greater than zero and, hence, lower pressure bits (00b) will be enabled by default. You need to program the regulators to achieve the desired regulation in case of concurrences. If set, the bandwidth regulator discards the L3 initiator priority using the device control module.

Bandwidth regulators:

  • Regulates the traffic through priority; does not really stop traffic.
  • Sets low-priority, by default. On setting a valid bandwidth other than 0, the initiator is given higher priority when the bandwidth is lower than the threshold.
  • Does not set an upper limit, If there is no contention, the initiator can claim full share as well.

Bandwidth regulator available for the following IPs:

  • MMU2
  • EVE1, EVE2, EVE3, EVE4 – both TC0 and TC1
  • DSP1, DSP2 MDMA (CPU access port)
  • DSP1, DSP2 EDMA
  • IVA
  • GPU
  • GMAC
  • PCIe
vayu_bandwidth_regulator_mechanism_sprac21.gifFigure 3. TDA2xx and TDA2ex Bandwidth Regulator Mechanism Illustration

Programming API for the bandwidth regulator is:

set_bw_regulator(int port, unsigned int average_bw, unsigned int time_in_us { unsigned int base_address = get_bw_reg_base_address(port); WR_REG32(base_address+0x8,(int)(ceil(average_bw/8.3125)); WR_REG32(base_address+0xC,(time_in_us*average_bw)); WR(REG32(base_address+0x14,0x1); } get_bw_reg_base_address(port) { if ( port == "EVE1_TC0" ) { return L3_NOC_AVATAR__DEBUGSS_CS_DAP_INIT_OCP_L3_NOC_AVATAR_CLK1_EVE1_TC0_BW_REGULATOR; } ... }

1.4.2 Bandwidth Limiters

The bandwidth limiter regulates the packet flow in the L3_MAIN interconnect by applying flow control when a user-defined bandwidth limit is reached. The next packet is served only after an internal timer expires, thus ensuring that traffic does not exceed the allocated bandwidth. The bandwidth limiter can be used with a watermark mechanism that allows traffic to temporarily exceed the peak bandwidth.

Bandwidth limiter:

  • Limits the bandwidth by flow control (actually stops traffic)
  • Enabled by default not to cap the maximum bandwidth of any initiator. Hence, by default, it allows the maximum bandwidth that the CPU can generate without any other initiators. The upper limit can be set by programming the Bandwidth Limiter (BL) appropriately to the desired maximum bandwidth.
  • Actually sets an upper limit allowing for some maximum water mark.

Programming API for the bandwidth limiter is:

set_bw_limiter(port,limit-bw) { base_address = get_bw_limiter_base_address(port); bandwidth = int(limit_bw / 8.3125); bandwidth_int = ( bandwidth & 0xFFFFFFE0 ) >> 5; bandwidth_frac = ( bandwidth & 0x1F ); WR_REG32(base_address+0x8,bandwidth_frac); WR_REG32(base_address+0xC,bandwidth_int); WR_REG32(base_address+0x10,0x0); WR_REG32(base_address+0x14,0x1); } get_bw_limiter_base_address(port) { if ( port == "VPE_P2" ) { return L3_NOC_AVATAR__DEBUGSS_CS_DAP_INIT_OCP_L3_NOC_AVATAR_CLK1_VPE_P2_BW_LIMITER; } ... }

1.4.3 Initiator Priority

Certain initiators in the system can generate MFLAG signals that provide higher priority to the data traffic initiated by them. The modules that can generate the MFLAG dynamically are VIP, DSS, EVE, and DSP. Following is a brief discussion of the DSS MFLAG.

  • DSS MFLAG
    • DSS has four display read pipes (Graphics , Vid1, Vid2, and Vid3) and one write pipe (WB).
    • DSS drives MFLAG if any of the read pipes are made high priority and FIFO levels are below low threshold for high-priority display pipe.
    • VIDx have 32 KB FIFO and GFX has 16 KB FIFO.
    • FIFO threshold is measured in terms of 16-byte word.
    • Recommended settings for high and low threshold are 75% and 50%, respectively.
    • MFLAG can be driven high permanently through a force MFLAG configuration of the DISPC_GLOBAL_MFLAG_ATTRIBUTE register.
  • The behavior of setting the MFLAG dynamically can be realized using Figure 4.

    vayu_dss_adaptive_mflag_sprac21.pngFigure 4. TDA2xx and TDA2ex DSS Adaptive MFLAG Illustration

    The programming model used to enable dynamic MFLAG is:

    Enable MFlag Generation DISPC_GLOBAL_MFLAG_ATTRIBUTE DISPC_GLOBAL_MFLAG_ATTRIBUTE = 0x2; Set Video Pipe as High Priority DISPC_VIDx_ATTRIBUTES DISPC_VID1_ATTRIBUTES | = (1<<23); DISPC_VID2_ATTRIBUTES | = (1<<23); DISPC_VID3_ATTRIBUTES | = (1<<23); Set Graphics Pipe as High Priority DISPC_GFX_ATTRIBUTES DISPC_GFX_ATTRIBUTES | = (1<<14); GFX threshold 75 % HT , 50 % LT DISPC_GFX_MFLAG_THRESHOLD = 0x03000200; VIDx threshold 75 % HT , 50 % LT DISPC_VID1_MFLAG_THRESHOLD = 0x06000400; DISPC_VID2_MFLAG_THRESHOLD = 0x06000400; DISPC_VID3_MFLAG_THRESHOLD = 0x06000400;
  • DSP EDMA + MDMA
    • EVTOUT[31] and EVTOUT[30] are used for generation of MFLAGs dedicated to the DSP MDMA and EDMA ports, respectively.
    • EVTOUT[31/30] = 1 → Corresponding MFLAG is high.
  • EVE TC0/TC1
    • For EVE port 1 and port 2 (EVE TC0 and TC1), MFlag is driven by evex_gpout[63] and evex_gpout[62], respectively.
    • evex_gpout[63] is connected to DMM_P1 and EMIF.
    • evex_gpout[62] is connected to DMM_P2 and EMIF.
  • VIP/VPE
    • In the VIP/VPE Data Packet Descriptor Word 3, can set the priority in [11:9] bits.
    • This value is mapped to OCP Reqinfo bits.
    • 0x0 = Highest Priority, 0x7 = Lowest Priority.
    • VIP Has Dynamic MFLAG specific scheme based on internal FIFO status
      • Based on HW set margins to overflow/underflow
      • Enabled by default, no MMR control
    • Many other IPs have their MFLAG driving mechanism via the control module registers.

The CTRL_CORE_L3_INITIATOR_PRESSURE_1 to CTRL_CORE_L3_INITIATOR_PRESSURE_4 registers are used for controlling the priority of certain initiators on the L3_MAIN.

  • 0x3 = Highest Priority/Pressure
  • 0x0 = Lowest Priority/Pressure
  • Valid for MPU, DSP1, DSP2, IPU1, PRUSS1, GPU P1, GPU P2

There are SDRAM initiator priorities that control the priority of each initiator accessing two EMIFs. The CTRL_CORE_EMIF_INITIATOR_PRIORITY_1 to CTRL_CORE_EMIF_INITIATOR_PRIORITY_6 registers are intended to control the priority of each initiator accessing the two EMIFs. Each 3-bit field in these registers is associated only with one initiator. Setting this bit field to 0x0 means that the corresponding initiator has a highest priority over the others and setting the bit field to 0x7 is for lowest priority. This feature is useful in case of concurrent access to the external SDRAM from several initiators.

In the context of TDA2xx and TDA2ex, the CTRL_CORE_EMIF_INITIATOR_PRIORITY_1 to CTRL_CORE_EMIF_INITIATOR_PRIORITY_6 are overridden by the DMM PEG Priority and, hence, it is recommended to set the DMM PEG priority instead of the Control module EMIF_INITIATOR_PRIORITY registers.

The MFLAG influences the priority of the Traffic packets at multiple stages:

  • At the interconnect level, the NTTP packet is configured with one bit of pressure. This bit, when set to 1, gives priority to the concerned packet across all arbitration points. This bit is set to 0 for all masters. The pressure bit can be set to 1 either using the bandwidth regulators (within L3) or can be directly driven by masters using OCP MFlag. MFLAG asserted pressure is embedded in the packet while pressure from the BW regulator is a handshake signal b/w the BW regulator and the switch.
  • At the DMM level, the MFLAG is used to drive the DMM Emergency mechanism. At the DMM, the initiators with MFLAG set will be classified as higher priority. A weighted round-robin algorithm is used for arbitration between high priority and other initiators. Set DMM_EMERGENCY[0] to run this arbitration scheme. The weight is set in the DMM_EMERGENCY[20:16] WEIGHT field.
  • At the EMIF level, the MFLAG from all of the system initiators are ORed to have higher priority to the system traffic versus the MPU traffic when any system initiator has the MFLAG set.

1.5 TDA2xx and TDA2ex Memory Subsystem

The different memory subsystems in TDA2xx and TDA2ex and their interconnection is as shown in Figure 5.

vayu_memory_subsystem_interconnection_sprac21.gifFigure 5. TDA2xx and TDA2ex Memory Subsystem Interconnection

NOTE

TDA2ex does not contain EMIF2, OCMC RAM 2 and 3.

The EMIF is the memory controller of the system’s main memory (DRAM). Supported DRAM types are DDR2 (400 MHz), DDR3 (532 MHz), TDA2ex specifically supports DDR3 (666 MHz) and so on. This external memory controller module supports 32-bit, 16-bit mode (narrow mode), up to 2KB page size, up to 8 banks, 128 byte burst. The bank distribution and the row, column and bank access pattern is as shown in Figure 6.

test.gif
A. M is the number of columns (as determined by PAGESIZE) minus 1, P is the number of banks (as determined by IBANK) minus 1, and N is the number of rows (as determined by both PAGESIZE and IBANK) minus 1.
Figure 6. DDR Row, Column and Bank Access

All the measurements done in this report use the following configurations, unless mentioned otherwise.

  • DDR3 Memory Part Number: MT41K128M16 16 Meg × 16 × 8 banks
  • CAS write latency = 6
  • CAS Latency = 7
  • SDRAM Data Bus width: 32
  • DDR in non-interleaved mode.
  • Number of banks (P + 1) = 8
  • Number of Columns (M + 1) = 210
  • Number of Rows (N + 1) = 214
  • Size of each DDR cell = 16 bits
  • Page size is 1024 cells. This makes the effective page size = (1024 × 16 bits) × 2 = 32768 bits = 4096 Bytes = 4KB.

1.5.1 Controller/PHY Timing Parameters

Based on the DRAM data sheet, the respective timing values specified in time need to be translated into clock cycles and programmed into the MEMSS MMR. Also, parameters like refresh rate DRAM topology, CAS latencies are also programmed in these MMR. The MEMSS PHY interface also has a MMR space that needs to be configured to enable the PHY and program parameters like read latency.

1.5.2 Class of Service

For priority escalation, there is one more level of escalation that can be enabled inside MEMSS using class of service with eight priority levels. There are two classes of service (CoS) implemented in MEMSS. Each class of service can support up to three master connection IDs. Each of the three masters can be associated with the same or different priority levels. Each class of service has a priority raise counter that can be set to N, which implies N times 16 clocks. This counter value specifies the number of clocks after which MEMSS momentarily raises the priority of the class of service command (master connections are arbitrated using priority levels with in a class of service). Using masks along with master connection IDs, a maximum of 144 master connection IDs can participate in this class of service.

NOTE

  • Priority raise counter is also available to momentarily raise the priority of the oldest command in the command FIFO. This counter can be set to N, which implies N times 16 clocks.
  • Read and write execution thresholds register can be programmed to a maximum burst size after which the MEMSS arbitration will switch to executing the other type of commands (set write threshold to provide a chance for read commands to execute).
  • DRAM page open entails cost; so if you avoid multiple page closes and opens, you can obtain better performance.

 

Texas Instruments

© Copyright 1995-2025 Texas Instruments Incorporated. All rights reserved.
Submit documentation feedback | IMPORTANT NOTICE | Trademarks | Privacy policy | Cookie policy | Terms of use | Terms of sale