SPRACZ2 August   2022 TDA4VM , TDA4VM-Q1

ADVANCE INFORMATION  

  1.   Abstract
  2. 1Introduction
    1. 1.1 Vision Analytics
    2. 1.2 End Equipments
    3. 1.3 Deep learning: State-of-the-art
  3. 2Embedded edge AI system: Design considerations
    1. 2.1 Processors for edge AI: Technology landscape
    2. 2.2 Edge AI with TI: Energy-efficient and Practical AI
      1. 2.2.1 TDA4VM processor architecture
        1. 2.2.1.1 Development platform
    3. 2.3 Software programming
  4. 3Industry standard performance and power benchmarking
    1. 3.1 MLPerf models
    2. 3.2 Performance and efficiency benchmarking
    3. 3.3 Comparison against other SoC Architectures
      1. 3.3.1 Benchmarking against GPU-based architectures
      2. 3.3.2 Benchmarking against FPGA based SoCs
      3. 3.3.3 Summary of competitive benchmarking
  5. 4Conclusion
  6.   Revision History
  7. 5References

TDA4VM processor architecture

Using the matrix multiplication accelerator (MMA) as the acceleration for AI functions, the overall TDA4x block diagram is shown in the below Figure 2-3. Based on heterogenous architecture, the TDA4x System on Chip (SoC) optimizes entire platform around easy programming on the multi-core Cortex-A72 microprocessor unit (MPUs) while offloading compute intensive tasks such as deep learning, imaging, vision, video, and graphics processing to the specialized hardware accelerators and programable cores. High throughput and high energy efficiency are enabled by holistic system level integration of these cores using high bandwidth interconnect and smart memory architecture. An optimized system BOM is achieved by advanced integration of the system components.

GUID-C848697B-3518-4810-B98D-4490EC080DD4-low.gif Figure 2-3 Block Diagram

As we discussed in the previous section, TOPS (tera operations per second) are used to measure the deep learning performance comparison. However, actual inference time depends on the efficiency of the system architecture making use of optimum data flow in the system. So, a better performance benchmarking is inference time for a given model at a given input image resolution. If the inference time is lesser, more images can be processed resulting in higher frames per second (FPS). So, FPS divided by TOPS (FPS/TOPS) indicates the deep learning architecture efficiency. Similarly, FPS divided by Watts (FPS/Watt) is a good benchmark for energy efficiency of an embedded processor.