The advances of deep-learning based artificial intelligence (AI) and embedded processors makes camera-based analytics a crucial technology in many industrial applications where the data from multiple cameras must be processed with high performance at low power and low latency. The AM68A processor provides various ways to optimize the performance of AI applications at the network edge with heterogeneous processing cores and remarkable integrated hardware accelerators. This processor is designed for edge AI with as many as eight cameras. The Edge AI SDK and tools provided make the development of the AI applications on AM68A simpler and faster while fully taking advantage of hardware accelerators for vision and AI processing.
Arm® and Cortex® are registered trademarks of Arm Limited.
All trademarks are the property of their respective owners.
As vision is a primary sensor for human beings, machines also use vision to perceive and comprehend environments around them. Camera sensors provide rich information on surroundings and the advance of deep-learning based AI makes it possible to analyze enormous and complex visual data with higher accuracy. Therefore, in the applications like machine vision, robotics, surveillance, home and factory automation, camera-based analytics has become a more powerful and important tool.
Embedded processors (EP) with AI capability, that is, edge AI processors, are accelerating this trend. EP can process visual data from multiple cameras into actionable insight by mimicking the eyes and brain of a human. In contrast to cloud-based AI, where deep neural network (DNN) inference is running on the central computing devices, edge AI processes and analyzes the visual data on the systems, for example, edge AI processors, directly connected to the sensors. Edge AI technology not only makes existing applications smarter but also opens up new applications that require intelligent processing of large amounts of visual data for 2D and 3D perception.
Edge AI is specifically designed for time-sensitive applications. However, edge AI requires a low-power processor to process multiple vision sensors and execute multiple DNN inferences simultaneously at the edge, which presents challenges in size, power consumption, and heat dissipation. These sensors and processor must fit in a small form factor and operate efficiently under the harsh environments of factories, farm and construction sites, as well as inside vehicles or cameras installed on the road. Moreover, certain equipment such as mobile machines and robots necessitate functionally safe 3D perception. The global market for such edge AI processors was valued at $2.1 billion in 2021 and is expected to reach $5.5 billion by 2028(1).
This paper focuses on the highly-integrated AM68A processor and several edge AI use cases including AI Box, machine vision, and multi-camera AI. Optimizing the edge AI systems using the heterogeneous architecture of the AM68A with the optimized AI models and the easy-to-use software architecture is also discussed.
The AM68A is a dual-core Arm® Cortex® A72 microprocessor. The processor is designed as a high-performance and highly-integrated device providing significant levels of processing power, image and video processing, and graphics capability. Compared with the AM62A(2), which is designed for the applications with one or two cameras, the AM68A enables real-time processing of four to eight 2MP cameras with improved AI performance. Figure 2-1 shows the following multiple sub-systems based on the heterogeneous architecture of the AM68A:
Deep learning inference efficiency is crucial for the performance of an edge AI system. As the Performance and efficiency benchmarking with TDA4 Edge AI processors application note shows, MMA-based deep learning inference is 60% more efficient than a GPU-based one in terms of FPS or TOPS. The optimized network models for C7xMMA are also provided by the TI Model Zoo(3), which is a large collection of DNN models optimized for C7xMMA for various computer vision tasks. The models include popular image classification, 2D and 3D object detection, semantic segmentation, and 6D pose estimation models. Table 2-1 shows the 8-bit fixed-point inference performances on AM68A for several models in the TI Model Zoo.
Task | Model | Image Resolution | Frame Rate (fps) | Accuracy (%) |
---|---|---|---|---|
Classification | mobileNetV2-tv | 224 × 224 | 500 | 70.27(1) |
Object detection | ssdLite-mobDet-DSP-coco | 320 × 320 | 218 | 34.64(2) |
Object detection | yolox-nano-lite-mmdet-coco | 416 × 416 | 268 | 18.96(2) |
Semantic segmentation | deeplabv3lite-mobv2-cocoseq21 | 512 × 512 | 120 | 55.47(3) |
Semantic segmentation | deeplabv3lite-regnetx800mf-cocoseq21 | 512 × 512 | 58 | 60.62(3) |
The multicore heterogeneous architecture of the AM68A provides flexibility to optimize the performance of edge AI system for various applications by utilizing suitable programmable cores or HWAs for particular tasks. For example, computationally intense deep learning (DL) inference can run on MMA with enhanced DL models, and vision processing, video encoding and decoding can be offloaded to VPAC3 and hardware-accelerated video codec for the best performance. Other functional blocks can be programmed in A72 or C7x. Section 3 describes in detail how edge AI systems can be built on the AM68A for various industrial (non-automotive) use cases.
The popularity of edge AI technology is increasing in many existing and new use cases. The AM6xA scalable processor family is well designed for edge AI owing to a multicore heterogeneous architecture. This section introduces popular edge AI use cases which require varying input requirements, for example, resolution, frame rate, and task and computation requirements. The distribution of each task among multiple cores and HWAs in the AM68A is described to maximize the performance.