This application report provides information on the TDA2xx and TDA2ex device throughput performances and describes the TDA2xx and TDA2ex System-on-Chip (SoC) architecture, data path infrastructure, and constraints that affect the throughput and different optimization techniques for optimum system performance. This document also provides information on the maximum possible throughput performance of different peripherals on the SoC.
NEON is a registered trademark of ARM Limited.
Cortex, Arm are registered trademarks of Arm Limited.
DesignWare is a registered trademark of Synopsys, Inc.
OMAP is a registered trademark of TI.
TDA2xx and TDA2ex are high-performance, infotainment-application devices, based on the enhanced OMAP™ architecture integrated on a 28-nm technology. The architecture is designed for advanced graphical HMI and Navigation, Digital and Analog Radio, Rear Seat Entertainment and Multimedia playback, providing Advanced Driver Assistance integration capabilities with Video analytics support, and best-in-class CPU performance, video, image, and graphics processing sufficient to support, among others. Figure 1 shows high-level block diagrams of the processors, interfaces, and peripherals and their various operating frequencies.
Term | Definition |
---|---|
AESCTR | Advanced Encryption Standard Counter Mode |
AESECB | Advanced Encryption Standard Electronic Code Book |
BL | Bandwidth Limiter (in L3) |
BW | Bandwidth (measured in megabytes per second (MB/s)) |
BWR | Bandwidth Regulator (in L3) |
CBC-MAC | Cipher Block Chaining-Message Authentication Code |
CCM | CBC-MAC + CTR |
CPU/MPU | Cortex®-A15 |
DDR | Double Data Rate |
DVFS | Dynamic Voltage and Frequency Scaling |
EVE | Embedded Vision Engine |
GCM | Galois Counter Mode |
GMAC | Galois Message Authentication Code |
LPDDR | Low-Power Double Data Rate |
NVM | Non-Volatile Memory |
OCP | Open Core Protocol |
OTFA | On-The-Fly Advanced Encryption Standard |
PHY | Hard macro that converts single data rate signals to double data rate |
QSPI | Quad Serial Peripheral Interface |
ROM | On-Chip ROM Bootloader |
SDRAM | Synchronous Dynamic Random Access Memory |
SoC | System On-Chip |
Stat Coll | Statistics collector in interconnect |
The system’s interconnect and master to slave connection in the TDA2xx and TDA2ex devices is shown in Figure 2. For those initiators and peripherals not supported by TDA2ex, the slave and master ports to L3 are tied to a default value resulting in errors when trying to access a peripheral not present in the device.
Broadly, the list of masters and slaves in the system are listed in Table 2 and Table 3.
Master | Supported Maximum Tag Number | Maximum Burst Size (Bytes) | Type |
---|---|---|---|
MPU | 32 | 120 | RW |
CS_DAP | 1 | 4 | RW |
IEEE1500_2_OCP | 1 | 4 | RW |
DMA_SYSTEM RD | 4 | 128 | RO |
MMU1 | 33 | 128 | RW |
DMA_CRYPTO RD | 4 | 124 | RO |
DMA_CRYPTO WR | 2 | 124 | WR |
TC1_EDMA RD | 32 | 128 | RO |
TC2_EDMA RD | 32 | 128 | RO |
TC1_EDMA WR | 32 | 128 | WR |
TC2_EDMA WR | 32 | 128 | WR |
DMA_SYSTEM WR | 2 | 128 | WR |
DSP1 CFG | 33 | 128 | RW |
DSP1 DMA | 33 | 128 | RW |
DSP1 MDMA | 33 | 128 | RW |
DSP2 CFG (1) | 33 | 128 | RW |
DSP2 DMA (1) | 33 | 128 | RW |
DSP2 MDMA (1) | 33 | 128 | RW |
CSI2_1 | 1 | 128 | WR |
IPU1 | 8 | 56 | RW |
IPU2 | 8 | 56 | RW |
EVE1 P1 (1) | 17 | 128 | RW |
EVE1 P2 (1) | 17 | 128 | RW |
EVE2 P1 (1) | 17 | 128 | RW |
EVE2 P2 (1) | 17 | 128 | RW |
EVE3 P1 (1) | 17 | 128 | RW |
EVE3 P2 (1) | 17 | 128 | RW |
EVE4 P1 (1) | 17 | 128 | RW |
EVE4 P2 (1) | 17 | 128 | RW |
PRUSS1 PRU1 | 2 | 128 | RW |
PRUSS1 PRU2 | 2 | 128 | RW |
PRUSS2 PRU1 | 2 | 128 | RW |
PRUSS2 PRU2 | 2 | 128 | RW |
GMAC_SW | 2 | 128 | RW |
SATA | 2 | 128 | RW |
MMC1 | 1 | 124 | RW |
MMC2 | 1 | 124 | RW |
USB3_SS | 32 | 128 | RW |
USB2_SS | 32 | 128 | RW |
USB2_ULPI_SS1 | 32 | 128 | RW |
USB2_ULPI_SS2 | 32 | 128 | RW |
GPU P1 | 16 | 128 | RW |
MLB | 1 | 124 | RW |
PCIe_SS1 | 16 | 128 | RW |
PCIe_SS2 | 16 | 128 | RW |
MMU2 | 33 | 128 | RW |
VIP1 P1 | 16 | 128 | RW |
VIP1 P2 | 16 | 128 | RW |
VIP2 P1 | 16 | 128 | RW |
VIP2 P2 | 16 | 128 | RW |
VIP3 P1 | 16 | 128 | RW |
VIP3 P2 | 16 | 128 | RW |
DSS | 16 | 128 | RW |
GPU P2 | 16 | 128 | RW |
GRPX2D P1 | 32 | 128 | RW |
GRPX2D P2 | 32 | 128 | RW |
VPE P1 | 16 | 128 | RW |
VPE P2 | 16 | 128 | RW |
IVA | 16 | 128 | RW |
Slave | Tag Number | Maximum Burst Size (Bytes) |
---|---|---|
GPMC | 1 | 124 |
GPU | 1 | 8 |
IVA1 SL2IF | 1 | 16 |
OCMC_RAM1 | 1 | 128 |
DSS | 1 | 124 |
IVA1 CONFIG | 1 | 124 |
IPU1 | 1 | 56 |
AES1 | 1 | 4 |
AES2 | 1 | 4 |
SHA2MD5_1 | 1 | 4 |
DMM P1 | 32 | 128 |
DMM P2 | 32 | 128 |
L4_WKUP | 1 | 124 |
IPU2 | 1 | 56 |
OCMC_RAM2 (1) | 1 | 128 |
OCMC_RAM3 (1) | 1 | 128 |
DSP1 SDMA | 1 | 128 |
DSP2 SDMA (1) | 1 | 128 |
OCMC_ROM | 1 | 16 |
TPCC_EDMA | 1 | 128 |
PCIe SS1 | 1 | 120 |
VCP1 | 1 | 128 |
L3_INSTR | 1 | 128 |
DEBUGSS CT_TBR | 1 | 128 |
QSPI | 256 | 128 |
VCP2 | 1 | 128 |
TC1_EDMA | 1 | 128 |
TC2_EDMA | 1 | 128 |
McASP1 | 1 | 128 |
McASP2 | 1 | 128 |
McASP3 | 1 | 128 |
PCIe SS2 | 1 | 120 |
SPARE_TSC_ADC | 1 | 128 |
GRPX2D | 1 | 4 |
EVE1 (1) | 16 | 128 |
EVE2 (1) | 16 | 128 |
EVE3 (1) | 16 | 128 |
EVE4 (1) | 16 | 128 |
PRUSS1 | 1 | 128 |
PRUSS2 | 1 | 128 |
MMU 1 | 32 | 128 |
MMU 2 | 32 | 128 |
SHA2MD5_2 | 1 | 4 |
L4_CFG | 1 | 124 |
L4_PER1 P1 | 1 | 124 |
L4_PER1 P2 | 1 | 124 |
L4_PER1 P3 | 1 | 124 |
L4_PER2 P1 | 1 | 124 |
L4_PER2 P2 | 1 | 124 |
L4_PER2 P3 | 1 | 124 |
L4_PER3 P1 | 1 | 124 |
L4_PER3 P2 | 1 | 124 |
L4_PER3 P3 | 1 | 124 |
The L3 high-performance interconnect is based on a Network-On-Chip (NoC) interconnect infrastructure. The NoC uses an internal packet-based protocol for forward (read command, write command with data payload) and backward (read response with data payload, write response) transactions. All exposed interfaces of this NoC interconnect, both for Targets and Initiators; comply with the OCP IP2.x reference standard.
The interconnect has internal components that can aid in the traffic regulation from a specific initiator to a specific target. The components are called Bandwidth Regulators and Bandwidth limiters. Additionally, the initiator IPs can set their respective MFLAG or MREQPRIORITY signals, which is understood by the interconnect and subsequently DMM/EMIF to give priority to a given initiator.
The default value of the various traffic regulator within the interconnect is set to a default that allows most use case to work without any tweaking. However, if there is any customization needed for a given use case, it is possible by various programmable parameters explained in subsequent sections.
The bandwidth regulators prevent master NIUs from consuming too much bandwidth of a link or a slave NIU that is shared between several data flows: packets are then transported at a slower rate. The value of a bandwidth can be programmed in the bandwidth regulator. When the bandwidth is below the programmed value, the pressure bit is set to 1, giving priority to this master. When the bandwidth is above the programmed value, the pressure bit is set to 0 and the concerned master has the same weight as others.
Bandwidth regulators are by default enabled in interconnect with the default configuration such that the expected average bandwidth is set to zero. With any amount of traffic, the actual average bandwidth seen will be greater than zero and, hence, lower pressure bits (00b) will be enabled by default. You need to program the regulators to achieve the desired regulation in case of concurrences. If set, the bandwidth regulator discards the L3 initiator priority using the device control module.
Bandwidth regulators:
Bandwidth regulator available for the following IPs:
Programming API for the bandwidth regulator is:
set_bw_regulator(int port, unsigned int average_bw, unsigned int time_in_us
{
unsigned int base_address = get_bw_reg_base_address(port);
WR_REG32(base_address+0x8,(int)(ceil(average_bw/8.3125));
WR_REG32(base_address+0xC,(time_in_us*average_bw));
WR(REG32(base_address+0x14,0x1);
}
get_bw_reg_base_address(port) {
if ( port == "EVE1_TC0" ) { return
L3_NOC_AVATAR__DEBUGSS_CS_DAP_INIT_OCP_L3_NOC_AVATAR_CLK1_EVE1_TC0_BW_REGULATOR;
}
...
}
The bandwidth limiter regulates the packet flow in the L3_MAIN interconnect by applying flow control when a user-defined bandwidth limit is reached. The next packet is served only after an internal timer expires, thus ensuring that traffic does not exceed the allocated bandwidth. The bandwidth limiter can be used with a watermark mechanism that allows traffic to temporarily exceed the peak bandwidth.
Bandwidth limiter:
Programming API for the bandwidth limiter is:
set_bw_limiter(port,limit-bw) {
base_address = get_bw_limiter_base_address(port);
bandwidth = int(limit_bw / 8.3125);
bandwidth_int = ( bandwidth & 0xFFFFFFE0 ) >> 5;
bandwidth_frac = ( bandwidth & 0x1F );
WR_REG32(base_address+0x8,bandwidth_frac);
WR_REG32(base_address+0xC,bandwidth_int);
WR_REG32(base_address+0x10,0x0);
WR_REG32(base_address+0x14,0x1);
}
get_bw_limiter_base_address(port) {
if ( port == "VPE_P2" ) { return
L3_NOC_AVATAR__DEBUGSS_CS_DAP_INIT_OCP_L3_NOC_AVATAR_CLK1_VPE_P2_BW_LIMITER;
}
...
}
Certain initiators in the system can generate MFLAG signals that provide higher priority to the data traffic initiated by them. The modules that can generate the MFLAG dynamically are VIP, DSS, EVE, and DSP. Following is a brief discussion of the DSS MFLAG.
The behavior of setting the MFLAG dynamically can be realized using Figure 4.
The programming model used to enable dynamic MFLAG is:
Enable MFlag Generation DISPC_GLOBAL_MFLAG_ATTRIBUTE
DISPC_GLOBAL_MFLAG_ATTRIBUTE = 0x2;
Set Video Pipe as High Priority DISPC_VIDx_ATTRIBUTES
DISPC_VID1_ATTRIBUTES | = (1<<23);
DISPC_VID2_ATTRIBUTES | = (1<<23);
DISPC_VID3_ATTRIBUTES | = (1<<23);
Set Graphics Pipe as High Priority DISPC_GFX_ATTRIBUTES
DISPC_GFX_ATTRIBUTES | = (1<<14);
GFX threshold 75 % HT , 50 % LT
DISPC_GFX_MFLAG_THRESHOLD = 0x03000200;
VIDx threshold 75 % HT , 50 % LT
DISPC_VID1_MFLAG_THRESHOLD = 0x06000400;
DISPC_VID2_MFLAG_THRESHOLD = 0x06000400;
DISPC_VID3_MFLAG_THRESHOLD = 0x06000400;
Many other IPs have their MFLAG driving mechanism via the control module registers.
The CTRL_CORE_L3_INITIATOR_PRESSURE_1 to CTRL_CORE_L3_INITIATOR_PRESSURE_4 registers are used for controlling the priority of certain initiators on the L3_MAIN.
There are SDRAM initiator priorities that control the priority of each initiator accessing two EMIFs. The CTRL_CORE_EMIF_INITIATOR_PRIORITY_1 to CTRL_CORE_EMIF_INITIATOR_PRIORITY_6 registers are intended to control the priority of each initiator accessing the two EMIFs. Each 3-bit field in these registers is associated only with one initiator. Setting this bit field to 0x0 means that the corresponding initiator has a highest priority over the others and setting the bit field to 0x7 is for lowest priority. This feature is useful in case of concurrent access to the external SDRAM from several initiators.
In the context of TDA2xx and TDA2ex, the CTRL_CORE_EMIF_INITIATOR_PRIORITY_1 to CTRL_CORE_EMIF_INITIATOR_PRIORITY_6 are overridden by the DMM PEG Priority and, hence, it is recommended to set the DMM PEG priority instead of the Control module EMIF_INITIATOR_PRIORITY registers.
The MFLAG influences the priority of the Traffic packets at multiple stages:
The different memory subsystems in TDA2xx and TDA2ex and their interconnection is as shown in Figure 5.
NOTE
TDA2ex does not contain EMIF2, OCMC RAM 2 and 3.
The EMIF is the memory controller of the system’s main memory (DRAM). Supported DRAM types are DDR2 (400 MHz), DDR3 (532 MHz), TDA2ex specifically supports DDR3 (666 MHz) and so on. This external memory controller module supports 32-bit, 16-bit mode (narrow mode), up to 2KB page size, up to 8 banks, 128 byte burst. The bank distribution and the row, column and bank access pattern is as shown in Figure 6.
All the measurements done in this report use the following configurations, unless mentioned otherwise.
Based on the DRAM data sheet, the respective timing values specified in time need to be translated into clock cycles and programmed into the MEMSS MMR. Also, parameters like refresh rate DRAM topology, CAS latencies are also programmed in these MMR. The MEMSS PHY interface also has a MMR space that needs to be configured to enable the PHY and program parameters like read latency.
For priority escalation, there is one more level of escalation that can be enabled inside MEMSS using class of service with eight priority levels. There are two classes of service (CoS) implemented in MEMSS. Each class of service can support up to three master connection IDs. Each of the three masters can be associated with the same or different priority levels. Each class of service has a priority raise counter that can be set to N, which implies N times 16 clocks. This counter value specifies the number of clocks after which MEMSS momentarily raises the priority of the class of service command (master connections are arbitrated using priority levels with in a class of service). Using masks along with master connection IDs, a maximum of 144 master connection IDs can participate in this class of service.
NOTE