SPRADA8 may 2023 AM68A , TDA4VL-Q1

2 AM68A Processor

The AM68A is a dual-core Arm® Cortex® A72 microprocessor. The processor is designed as a high-performance and highly-integrated device providing significant levels of processing power, image and video processing, and graphics capability. Compared with the AM62A⁽²⁾, which is designed for the applications with one or two cameras, the AM68A enables real-time processing of four to eight 2MP cameras with improved AI performance. Figure 2-1 shows the following multiple sub-systems based on the heterogeneous architecture of the AM68A:

A dual-core Arm Cortex A72 microprocessor at 2 GHz provides up to 25K Dhrystone Million Instructions Per Second (DMIPS).
Vision Processing Accelerator V3 (VPAC3) performs image processing in Vision Image Sub-System (VISS) to support raw image sensors through de-mosaic, defective pixel correction, auto exposure, auto white balance, chromatic aberration correction (CAC), and so forth. In addition, VPAC3 includes Lens Distortion Correction (LDC), Multi-Scalar (MSC), and Bilateral Noise Filter (BNF) hardware accelerators (HWAs) to accelerate correction of distorted images, down scaling of images into multiple resolutions and noise filtering, respectively. VPAC3 in the AM68A can process 600 MP per second (MP/s) when assuming 20% system overhead.
Digital Signal Processing (DSP) and Matrix Multiplication Accelerator (MMA) are integrated together for DL acceleration as well as traditional computer vision tasks. The AM68A processor has two 512-bit C7x DSP running at 1 GHz, one of which is tightly coupled with an MMA capable of 4K (64 × 64) 8-bit fixed-point multiply accumulates per cycle. When run at 1 GHz, the AM68A provides 8 dense Trillion Operations per Second (TOPS).
H.264, H.265 encoder and decoder can encode and decode multiple channels simultaneously. This encoder and decoder supports H.264 Baseline, Main, High Profile at L5.2 and H.265 Main Profile at L5.1. The H.264, H.265 encoder and decoder can process 480 MP/s, for example, 8 channels of 2MP at 30 fps.
2x 4-lane MIPI CIS-2 RX are included in the AM68A. Two high-resolution (for example, 12MP) cameras can be directly connected to CSI-2 RX ports, and captured and preprocessed by VPAC. Capturing eight 2MP cameras is possible via MIPI CSI-2 4-to-1 aggregators.
BXS-4-64 GPU offers up to 50 Giga Floating-point Operations per Second (GFLOPS) to enable dynamic 2D and 3D rendering for enhanced viewing applications.
Display Sub-System (DSS) supports multiple displays with the flexibility to interface with different panel types such as eDP, DSP, and DPI.
Improved memory architecture and high-speed interfaces improve the system throughput by enabling high utilization of cores and HWAs. The AM68A supports up to 34 Giga Bytes Per Second (GBps) DDR memory bandwidth.

Figure 2-1 AM68A Block Diagram With Subsystems

Deep learning inference efficiency is crucial for the performance of an edge AI system. As the Performance and efficiency benchmarking with TDA4 Edge AI processors application note shows, MMA-based deep learning inference is 60% more efficient than a GPU-based one in terms of FPS or TOPS. The optimized network models for C7xMMA are also provided by the TI Model Zoo⁽³⁾, which is a large collection of DNN models optimized for C7xMMA for various computer vision tasks. The models include popular image classification, 2D and 3D object detection, semantic segmentation, and 6D pose estimation models. Table 2-1 shows the 8-bit fixed-point inference performances on AM68A for several models in the TI Model Zoo.

Table 2-1 Inference Performances of Classification, Object Detection, and Semantic Segmentation Models on AM68A

Task	Model	Image Resolution	Frame Rate (fps)	Accuracy (%)
Classification	mobileNetV2-tv	224 × 224	500	70.27⁽¹⁾
Object detection	ssdLite-mobDet-DSP-coco	320 × 320	218	34.64⁽²⁾
Object detection	yolox-nano-lite-mmdet-coco	416 × 416	268	18.96⁽²⁾
Semantic segmentation	deeplabv3lite-mobv2-cocoseq21	512 × 512	120	55.47⁽³⁾
Semantic segmentation	deeplabv3lite-regnetx800mf-cocoseq21	512 × 512	58	60.62⁽³⁾

(1) TOP-1 accuracy

(2) mAP 50-95

(3) mIoU

The multicore heterogeneous architecture of the AM68A provides flexibility to optimize the performance of edge AI system for various applications by utilizing suitable programmable cores or HWAs for particular tasks. For example, computationally intense deep learning (DL) inference can run on MMA with enhanced DL models, and vision processing, video encoding and decoding can be offloaded to VPAC3 and hardware-accelerated video codec for the best performance. Other functional blocks can be programmed in A72 or C7x. Section 3 describes in detail how edge AI systems can be built on the AM68A for various industrial (non-automotive) use cases.