• Menu
  • Product
  • Email
  • PDF
  • Order now
  • Advanced AI Vision Processing Using AM68A for Industrial Smart Camera Applications

    • SPRADA8 may   2023 AM68A , TDA4VL-Q1

       

  • CONTENTS
  • SEARCH
  • Advanced AI Vision Processing Using AM68A for Industrial Smart Camera Applications
  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2AM68A Processor
  6. 3Edge AI Use Cases on AM68A
    1. 3.1 AI Box
    2. 3.2 Machine Vision
    3. 3.3 Multi-Camera AI
  7. 4Software Tools and Support
    1. 4.1 Edge AI Software Development Kit (SDK)
    2. 4.2 Edge AI SDK Demonstrations
    3. 4.3 Edge AI Model Zoo
    4. 4.4 Edge AI Studio
  8. 5Conclusion
  9. 6Reference
  10. IMPORTANT NOTICE
search No matches found.
  • Full reading width
    • Full reading width
    • Comfortable reading width
    • Expanded reading width
  • Card for each section
  • Card with all content

 

Technical White Paper

Advanced AI Vision Processing Using AM68A for Industrial Smart Camera Applications

Abstract

The advances of deep-learning based artificial intelligence (AI) and embedded processors makes camera-based analytics a crucial technology in many industrial applications where the data from multiple cameras must be processed with high performance at low power and low latency. The AM68A processor provides various ways to optimize the performance of AI applications at the network edge with heterogeneous processing cores and remarkable integrated hardware accelerators. This processor is designed for edge AI with as many as eight cameras. The Edge AI SDK and tools provided make the development of the AI applications on AM68A simpler and faster while fully taking advantage of hardware accelerators for vision and AI processing.

Trademarks

Arm® and Cortex® are registered trademarks of Arm Limited.

All trademarks are the property of their respective owners.

1 Introduction

As vision is a primary sensor for human beings, machines also use vision to perceive and comprehend environments around them. Camera sensors provide rich information on surroundings and the advance of deep-learning based AI makes it possible to analyze enormous and complex visual data with higher accuracy. Therefore, in the applications like machine vision, robotics, surveillance, home and factory automation, camera-based analytics has become a more powerful and important tool.

Embedded processors (EP) with AI capability, that is, edge AI processors, are accelerating this trend. EP can process visual data from multiple cameras into actionable insight by mimicking the eyes and brain of a human. In contrast to cloud-based AI, where deep neural network (DNN) inference is running on the central computing devices, edge AI processes and analyzes the visual data on the systems, for example, edge AI processors, directly connected to the sensors. Edge AI technology not only makes existing applications smarter but also opens up new applications that require intelligent processing of large amounts of visual data for 2D and 3D perception.

Edge AI is specifically designed for time-sensitive applications. However, edge AI requires a low-power processor to process multiple vision sensors and execute multiple DNN inferences simultaneously at the edge, which presents challenges in size, power consumption, and heat dissipation. These sensors and processor must fit in a small form factor and operate efficiently under the harsh environments of factories, farm and construction sites, as well as inside vehicles or cameras installed on the road. Moreover, certain equipment such as mobile machines and robots necessitate functionally safe 3D perception. The global market for such edge AI processors was valued at $2.1 billion in 2021 and is expected to reach $5.5 billion by 2028(1).

This paper focuses on the highly-integrated AM68A processor and several edge AI use cases including AI Box, machine vision, and multi-camera AI. Optimizing the edge AI systems using the heterogeneous architecture of the AM68A with the optimized AI models and the easy-to-use software architecture is also discussed.

2 AM68A Processor

The AM68A is a dual-core Arm® Cortex® A72 microprocessor. The processor is designed as a high-performance and highly-integrated device providing significant levels of processing power, image and video processing, and graphics capability. Compared with the AM62A(2), which is designed for the applications with one or two cameras, the AM68A enables real-time processing of four to eight 2MP cameras with improved AI performance. Figure 2-1 shows the following multiple sub-systems based on the heterogeneous architecture of the AM68A:

  • A dual-core Arm Cortex A72 microprocessor at 2 GHz provides up to 25K Dhrystone Million Instructions Per Second (DMIPS).
  • Vision Processing Accelerator V3 (VPAC3) performs image processing in Vision Image Sub-System (VISS) to support raw image sensors through de-mosaic, defective pixel correction, auto exposure, auto white balance, chromatic aberration correction (CAC), and so forth. In addition, VPAC3 includes Lens Distortion Correction (LDC), Multi-Scalar (MSC), and Bilateral Noise Filter (BNF) hardware accelerators (HWAs) to accelerate correction of distorted images, down scaling of images into multiple resolutions and noise filtering, respectively. VPAC3 in the AM68A can process 600 MP per second (MP/s) when assuming 20% system overhead.
  • Digital Signal Processing (DSP) and Matrix Multiplication Accelerator (MMA) are integrated together for DL acceleration as well as traditional computer vision tasks. The AM68A processor has two 512-bit C7x DSP running at 1 GHz, one of which is tightly coupled with an MMA capable of 4K (64 × 64) 8-bit fixed-point multiply accumulates per cycle. When run at 1 GHz, the AM68A provides 8 dense Trillion Operations per Second (TOPS).
  • H.264, H.265 encoder and decoder can encode and decode multiple channels simultaneously. This encoder and decoder supports H.264 Baseline, Main, High Profile at L5.2 and H.265 Main Profile at L5.1. The H.264, H.265 encoder and decoder can process 480 MP/s, for example, 8 channels of 2MP at 30 fps.
  • 2x 4-lane MIPI CIS-2 RX are included in the AM68A. Two high-resolution (for example, 12MP) cameras can be directly connected to CSI-2 RX ports, and captured and preprocessed by VPAC. Capturing eight 2MP cameras is possible via MIPI CSI-2 4-to-1 aggregators.
  • BXS-4-64 GPU offers up to 50 Giga Floating-point Operations per Second (GFLOPS) to enable dynamic 2D and 3D rendering for enhanced viewing applications.
  • Display Sub-System (DSS) supports multiple displays with the flexibility to interface with different panel types such as eDP, DSP, and DPI.
  • Improved memory architecture and high-speed interfaces improve the system throughput by enabling high utilization of cores and HWAs. The AM68A supports up to 34 Giga Bytes Per Second (GBps) DDR memory bandwidth.
GUID-20230503-SS0I-F6ZS-MJBB-MDWMJLJPCC7K-low.svgFigure 2-1 AM68A Block Diagram With Subsystems

Deep learning inference efficiency is crucial for the performance of an edge AI system. As the Performance and efficiency benchmarking with TDA4 Edge AI processors application note shows, MMA-based deep learning inference is 60% more efficient than a GPU-based one in terms of FPS or TOPS. The optimized network models for C7xMMA are also provided by the TI Model Zoo(3), which is a large collection of DNN models optimized for C7xMMA for various computer vision tasks. The models include popular image classification, 2D and 3D object detection, semantic segmentation, and 6D pose estimation models. Table 2-1 shows the 8-bit fixed-point inference performances on AM68A for several models in the TI Model Zoo.

Table 2-1 Inference Performances of Classification, Object Detection, and Semantic Segmentation Models on AM68A
TaskModelImage ResolutionFrame Rate (fps)Accuracy (%)
ClassificationmobileNetV2-tv224 × 22450070.27(1)

Object detection

ssdLite-mobDet-DSP-coco320 × 320

218

34.64(2)
Object detectionyolox-nano-lite-mmdet-coco416 × 416

268

18.96(2)
Semantic segmentationdeeplabv3lite-mobv2-cocoseq21512 × 512

120

55.47(3)
Semantic segmentationdeeplabv3lite-regnetx800mf-cocoseq21512 × 512

58

60.62(3)
(1) TOP-1 accuracy
(2) mAP 50-95
(3) mIoU

The multicore heterogeneous architecture of the AM68A provides flexibility to optimize the performance of edge AI system for various applications by utilizing suitable programmable cores or HWAs for particular tasks. For example, computationally intense deep learning (DL) inference can run on MMA with enhanced DL models, and vision processing, video encoding and decoding can be offloaded to VPAC3 and hardware-accelerated video codec for the best performance. Other functional blocks can be programmed in A72 or C7x. Section 3 describes in detail how edge AI systems can be built on the AM68A for various industrial (non-automotive) use cases.

3 Edge AI Use Cases on AM68A

The popularity of edge AI technology is increasing in many existing and new use cases. The AM6xA scalable processor family is well designed for edge AI owing to a multicore heterogeneous architecture. This section introduces popular edge AI use cases which require varying input requirements, for example, resolution, frame rate, and task and computation requirements. The distribution of each task among multiple cores and HWAs in the AM68A is described to maximize the performance.

 

Texas Instruments

© Copyright 1995-2025 Texas Instruments Incorporated. All rights reserved.
Submit documentation feedback | IMPORTANT NOTICE | Trademarks | Privacy policy | Cookie policy | Terms of use | Terms of sale