Hardware Accelerated Structure From Motion on TDA4VM

Trademarks

Jacinto is a trademark of Texas Instruments Incorporated.

All trademarks are the property of their respective owners.

1 Introduction

Whether it is a simple task such as lane assist or blind spot detection, or a more complex task such as autonomous navigation, understanding the surroundings of a vehicle or robot is vital for success, and thereby safety. Vehicles and robots perceive their environment by converting data captured by sensors such as, RADARs, LiDARs and Cameras in to a format that can be consumed by the vehicle’s decision-making engine. Light Detection and Ranging (LiDAR)-based maps tend to be the most accurate, however, they are typically cost prohibitive for most vehicles or robots. Therefore, RADAR and camera-based solutions tend to be more widely used.

The SFM algorithm is one of the more widely used algorithms for camera-based mapping. The SFM algorithm by itself outputs a point-cloud (a set of points extracted from surrounding objects), which can then be consumed by some type of mapping algorithm. The application described in this article feeds the point-cloud to an Occupancy Grid mapping algorithm to generate a map of the surroundings.

In automotive and robotics applications, the steps of receiving sensor data, converting the data to a usable format, and prescribing actions based on the perceived environment, are typically performed on an embedded platform. The Jacinto 7 TDA4x family of high-performance SoCs by Texas Instruments are designed from the ground up particularly to address the varied algorithmic needs of the automotive, industrial and robotics markets. The Structure From Motion, or SFM algorithm is one such algorithm around which the device was designed. As a result the key computational blocks of the algorithm map seamlessly to either hardware accelerators or general purpose processing cores on the TDA4VM device. This article describes the SFM based OG mapping algorithm, the TDA4VM device, and how the algorithm maps to the device to enable a high-fidelity real-time map of the environment, before showing some example implementations on the device with corresponding outputs.

2 What is Structure From Motion?

In Computer Vision, the position of an object with respect to a vehicle is ascertained using images from two cameras, mounted on known disparate locations, looking at the object in question. In particular, key points from the object in both images are extracted and matched, and then using a process known as Triangulation the locations of the points that make up the object are deciphered. The process of distinguishing the position of a point in space using two cameras is known in the Computer Vision community as Stereo Vision, or Stereo Depth Estimation, and the set of points generated from all the correspondences in the two images is referred to as a point-cloud. Even as Stereo Vision is widely used by the automotive and robotics communities, it comes at a high system cost in terms of both dollars and image processing requirements, because it requires two high-precision cameras, capturing images at a relatively high frequency.

In contrast, Structure From Motion, or SFM, is an algorithm that can generate a point-cloud from a single camera in motion. As the name implies, in SFM, we have one camera which due to motion is in two distinct locations at two consecutive time instances, which effectively is the same as placing two cameras in distinct locations, given the objects in the frame have not moved between the two time instances, and given we know the relative motion of the camera. Thus, one can effectively use the same theory as in Stereo Vision to generate a point-cloud, from just one camera.

SFM algorithms come in two primary flavors, traditional Computer Vision based and Deep Learning based. Even though both flavors can be executed on TDA4VM, in this document, the focus in on the former, an algorithm based on traditional Computer Vision techniques. The point cloud generated by the SFM algorithm needs to then be utilized to generate a map of the surroundings, and in the application described here a 2D OG mapping approach is utilized for the mapping task.

3 Introduction to Occupancy Grid Mapping

Occupancy Grid Maps are a widely used method to represent the environment surrounding a vehicle or robot, because they can be consumed by a variety of ADAS applications ranging from parking, to obstacle identification, to curb detection, and be stored efficiently. An OG map is a representation of a vehicle’s surroundings in the form of a 2-Dimensional (2D) or 3-Dimensional (3D) grid. Each grid cell in the map has a corresponding state, for example occupied, free or unknown, computed from information received from sensors such as RADARs, Cameras, or LiDARs. A simple illustrative example of an OG map is shown in Figure 3-1.

OG maps can either be centered on the vehicle or robot, or centered on an arbitrary frame of reference. The former is known as an ego-centric OG map and the latter as a world-centric OG map. OG maps are further categorized as accumulated or instantaneous, based on whether or not information from previous frames is used.

Figure 3-1 Left: Illustration of World; Right: Illustration of an OG Map
(Green cells empty, Red cells occupied, and white unknown)

This application report describes a world-centric 2D OG map constructed using a point-cloud generated from measurements acquired by a monocular camera. This OG map has three states – occupied, free or unknown. The primary focus of this application report will be on an instantaneous OG mapping algorithm. Then, towards the latter part of the article, an accumulated OG mapping application that is also included with the Software Development Kit (SDK) that accompanies the TDA4VM device is briefly described. The next section describes how to take a point-cloud as input and generate an OG map based on the point-cloud and the location of the vehicle/robot.