Our Latest News

What are the Most Advanced Positioning Technologies Applied to Self-driving Vehicles

1 Abstract

Real-time, accurate and robust localization is critical for autonomous vehicles (AVs) to achieve safe and efficient driving, and real-time performance is essential for AVs to realize their current position in time to make decisions. To date, no review paper has quantitatively compared the real-time performance between different localization techniques based on various hardware platforms and programming languages and analyzed the relationship between localization methods, real-time performance, and accuracy. Therefore, this paper discusses the state-of-the-art localization techniques and analyzes their overall performance in AV applications. For further analysis, this paper first proposes an equivalent comparison method based on the localization algorithm operating capability (LAOC) to compare the relative computational complexity of different localization techniques; then, the relationship between methodology, computational complexity, and accuracy is comprehensively discussed. The analysis results show that the maximum difference in computational complexity of the localization methods is about 107 times, while the difference in accuracy is about 100 times. Vision and data fusion-based localization techniques have the potential to improve accuracy by a factor of about 2-5 compared to LiDAR-based localization. LiDAR and vision-based localization can reduce computational complexity by increasing the efficiency of image alignment methods. Compared to LiDAR and vision-based localization, data fusion-based localization can achieve better real-time performance because each individual sensor does not need to develop complex algorithms to achieve its optimal localization potential.V2X technology can improve localization robustness. Finally, potential solutions and future directions for localization of AVs based on quantitative comparison results are discussed.

2 Introduction

Autonomous vehicles (AVs) are expected to play a key role in future intelligent transportation systems because of their potential to ensure safe driving, relieve traffic stress, and reduce energy consumption. Research on AVs is now in the road testing phase. For example, Baidu has tested the Apollo 5.0 system in complex road scenarios, such as curves or intersections without special markings [1]. The Google Waymo project has also completed more than 10 million miles of road tests on public roads in the United States and 7 billion miles in simulations [2]. However, the industry still needs to address several key challenges before AVs can be commercialized. These challenges include a) proposing real-time, accurate and cost-efficient self-localization solutions; b) implementing real-time and accurate environment-aware models; and c) enabling intelligent, safe and efficient decision making in complex scenarios. At the same time, the environment perception and decision making modules significantly rely on real-time and accurate self-localization of autonomous vehicles for safe driving. Therefore, self-localization is one of the core elements of AV. Moreover, safe driving, such as collision avoidance, can only be ensured when automatic localization achieves millisecond real-time performance and centimeter-level accuracy [3]. As a typical approach, map matching algorithms are widely used in many localization solutions equipped with LIDAR [4], radar [5], cameras [6] or V2X [7]. One of the map matching methods uses existing maps to match detected environmental features (e.g., corners and road markings) to obtain vehicle location information. Another technique is SLAM used in applications without an a priori map. It achieves vehicle localization by simultaneously constructing an environment model (map) for sequential mapping. The map algorithm focuses on abstract data extracted from various sensors, such as LIDAR, radar, cameras, or a combination of them. In the case of sensor-based localization techniques, it relies on on-board sensors to estimate the absolute or relative position of the vehicle volume vehicle. It has been discussed in detail in a previous review [8]. In many sensor-based localization studies, the “sensor” is considered as the primary localization sensor, and the authors attempt to explore an innovative approach based mainly on its measurement, aiming at solving the localization challenges in some specific scenarios. This does not mean that the localization system uses only a single sensor to achieve vehicle localization. As an example to explain this concept, for IMU-based localization, reference [9] proposes an interactive multi-model (IMM) approach that improves localization robustness and integrity performance in such driving scenarios by using IMU and odometer sensor data to eliminate system drift caused by global positioning system (GPS) interruptions or GPS signal blocks. Sensor-based localization techniques can guide the deployment of AV localization systems, including how to select sensors, localization algorithms, fusion algorithms, and computational resources that can meet real-time performance. In addition, a focus on positioning inputs (sensor hardware) can provide the reader with a better understanding of the advantages and disadvantages of different system deployments in terms of accuracy, real-time performance, robustness, and cost. Therefore, this survey will discuss different sensor-based localization techniques starting with in-vehicle sensors, followed by V2X localization techniques, and finally data fusion-based localization. Figure 1 shows the different self-localization technologies for vehicles, including on-board sensors, V2X, and data fusion-based technologies. On-board sensor-based localization systems, including active and passive sensor-based techniques, rely on on-board sensors to sense the surrounding environment and then estimate the vehicle position. V2X-based positioning methods communicate with nodes in the surrounding environment (e.g., neighboring vehicles or infrastructure) to receive information about their positional information and include vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) based technologies that can provide multiple reference coordinates for positioning algorithms. Data fusion is not a direct location sensing method, but a post-processing location sensing technique. The goal is to fuse measurements from various sensors to obtain better results than individual sensors.

Active sensor-based positioning actively senses the surroundings through on-board sensors (including LIDAR, radar and ultrasonic sensors) to estimate vehicle position. They are based on the same ranging principle, which is based on the Time of Arrival (TOA) method. They differ in the signal carriers, i.e. laser, radio and ultrasound for LIDAR, radar and ultrasonic sensors. The differences in signal carrier wavelengths lead to significant variations in the cost and accuracy of these sensors. For example, lidar usually has the highest cost but the best accuracy, while the opposite is true for ultrasonic [10]-[13]. Passive sensor-based positioning passively receives information about the environment from which the vehicle position is calculated. The sensors include GPS, IMU and vision (e.g., monocular or binocular cameras). Depending on the spatial triangulation method, GPS requires three or more satellites in an open sky area to acquire the vehicle position (2-10 m.) GPS has the advantage of low cost, but in urban environments it often suffers from multipath and non-line-of-sight (NLOS) errors and slow position update rates. imu uses high frequency sampling rate (>100Hz) to measure vehicle acceleration and rotational speed. Therefore, the position and orientation of the vehicle can be derived from the heading projection for a given initial attitude [14]. Despite its fast position refresh rate and high reliability, the IMU is also prone to a large number of cumulative errors. Vision-based localization estimates vehicle position by using images from monocular or binocular cameras as input. This is similar to human vision systems that determine obstacle locations based on planar triangulation. The rich environmental information in the images can provide satisfactory localization performance under appropriate lighting conditions, but consumes significant memory and computational resources. V2V-based localization refers to autonomous vehicles under vehicular ad hoc network (VANET) that use dedicated short-range communication (DSRC) or long-term evolution techniques to determine the bit positions of other vehicles, thus improving the position accuracy of the vehicle. V2I-based localization refers to the communication between the target vehicle and static infrastructures, using their precisely known positions to determine the target vehicle location. Types of infrastructure include magnetic tags, radio frequency identification (RFID) tags, roadside units (RSUs), and GPS base stations. V2X-based positioning has a wide global sensing range (300m [15]), but may be affected by network latency and urban congestion. Numerous surveys have been published summarizing the existing self-localization techniques and comprehensively discussing their advantages and disadvantages as well as the potential applications of each sensor-based approach. However, in evaluating the various localization methods, the latest review papers focus only on the following aspects.

a) economical and reliable localization techniques [8], where economical corresponds to the cost of the localization system and reliable corresponds to the localization performance (both accuracy and reliability), which can be implemented in various driving scenarios (e.g., snowy weather).

b) Accuracy, reliability and availability [16], where availability corresponding to the positioning system should be available in different environments, such as GPS-based positioning systems in tunnels, V2V approaches in the presence of communication delays.

c) Robustness and scalability [17], [18], where the robustness corresponding to the positioning system operates for long periods of time with low failure rates in different seasons and traffic conditions, and the scalability corresponds to the ability of the vehicle to handle large-scale autonomous driving.

The real-time performance of automatic localization is one of the key metrics for evaluating AVs for safe driving. The above survey also mentioned that researchers should carefully consider the computational load and real-time performance of different technologies when designing positioning systems. However, so far, no survey has compared and discussed in depth the real-time performance of different self-localization technologies. by comparing the response time of driver behavior to the decision process [28]. The paper conducted a literature review to show the reaction time of AV behavior from the moment of perceiving an obstacle to the moment of executing a control action, as shown in Table I. According to computer simulations and practical tests, the reaction time of the entire decision process of an AV is typically reduced to 0.5 s in order to satisfy safe driving. however, in extreme cases, the detection and recognition module, the planning and decision module, and the execution module will occupy almost 0.5 s, which leads to a very limited execution time reserved for the localization module. Therefore, fast real-time localization solutions can save computational resources for other modules of the AV system, such as decision making, to implement complex strategies to ensure safe driving. Currently, the real-time performance of different localization solutions is presented on various hardware platforms and programming languages. Directly using the data provided by each self-localization research paper to compare real-time performance is not meaningful and does not reflect the relative consumption of memory and computational resources in AVs. There is also no investigation to quantify the computational complexity of various localization solutions under the same benchmark, which is related to the real-time performance and deployment cost of the localization system. The purpose of this paper is to investigate the existing state-of-the-art localization technologies and focus on the innovative algorithms or methods of each proposed solution and the overall localization performance in terms of real-time performance, accuracy, and robustness; to propose an equivalence method to quantitatively compare the relative real-time performance among different localization solutions based on various hardware platforms and programming languages; and finally to summarize the existing localization technologies and to base on the AV in quantitative comparison results to discuss potential solutions and future directions. Table II summarizes the relationships and differences between the survey of the paper and the recently available surveys.

3 Active sensor-based localization

LIDAR-based localization

LIDAR-based localization usually requires a pre-constructed reference map to match with the point cloud data or LIDAR reflection intensity data. However, in the absence of an a priori map, it will use SLAM techniques to construct a real-time map to match with the previously generated map. In the application of AVs, high-dimensional maps contain rich feature information, which improves the position estimation accuracy but reduces the storage efficiency and increases the processing time [29], [30]. Im et al [31] built a 1D angular map based on the vertical angles of buildings on both sides of an urban road for matching and localization. They used iterative endpoint fitting to extract the features of the vertical angles and constructed the corner feature maps based on the length and orientation of the vertical lines. Then, they apply feature matching and point cloud data to calculate the vehicle position. The method reduces the matching time and map data file size (about 14KB/km) because less feature information is extracted. However, the maximum horizontal position error reaches 0.46m; moreover, this method is not applicable to areas without buildings. Reference [32] constructed a 2D occupancy grid map consisting of a dense map of road reflections based on road markers and a probabilistic occupancy grid map based on vertical structure. First, they constructed a 1D extended line map (ELM) by extracting the line features of road markers and corners. These elements contain only the latitude and longitude information of the two endpoints of the line. Then, they converted the ELM into a 2D grid map for matching in the localization process. Compared to [31], [32] added road marker features to improve accuracy performance, but the method increased the ELM data size to 134 KB/km. 2D planar map matching for LiDAR localization is very popular in current research. For example, Levinson et al [4] obtained vehicle localization by using a SLAM-style relaxation algorithm to construct a flat ground reflection map without any potentially moving objects and then using a partial filter (PF) to associate the LIDAR. To further improve the robustness, reference [33] used probability maps represented as Gaussian distributions of remittance values instead of previous maps represented as fixed infrared remittance values. It allows stationary objects in the map and consistent angular reflectance to be quickly identified by Bayesian inference. Offline SLAM is then used to align overlapping trajectories in the previous sequential maps, which allows the localization system to continuously learn and improve the maps. Compared to the method in reference [4], reference [33] improves the localization accuracy and robustness of AVs in dynamic urban environments. However, the map data size for both methods has increased to about 10 MB per mile. Other related algorithms [29], [35]-[42] can be found in specific papers. Matching based on 3D maps can achieve more accurate locations because it contains height information of environmental objects. Reference [43] constructed a 3D map by extracting road marker features. Then, the system uses normal distribution transform (NDT) to handle uncertain information, after which robustness and precise positioning are derived based on PF. However, the 3D NDT approach may require a large amount of memory to hold ND voxels (up to 100 MB in total for matching 3D ND voxels [30]), which results in localization times as long as second level [44].Li et al [45] proposed to construct 3D occupancy grid maps and then use a hybrid filtering framework (i.e., a cubature Kalman filter and PF combination) to compute large-scale outdoor localization and reduce the map data size. Despite the reduced data size, experiments show that the method can maintain stable and reliable localization performance, which means that the localization error is less than 0.097m.

Radar-based localization

Compared to LIDAR and vision-based localization, radar-based localization can meet real-time performance requirements because of its high memory efficiency and low computational load [46], [47]. However, radar-based SLAM faces the risk of data alignment errors in map matching, as sometimes unrealistic features are extracted, leading to the risk of low localization accuracy [5]. The trajectory-oriented extended Kalman filter (EKF)-SLAM technique uses the Fourier-Melling transform to sequentially align the radar images and calculate the vehicle position without matching features to avoid the risk posed by such features. The disadvantage is that the localization error reaches 13m (average) [46]. Reference [48] aimed to extend semi-Markov chains by a Levy process to improve robustness in long-term changing environments, with 83% of estimated position errors less than 0.2 m. For rain and snow conditions, [49] built a reference map by modeling the uncertainty of error propagation and then matching radar images for reliable localization. Reference [50] proposed a clustering SLAM technique that uses a density-based stream clustering algorithm to cluster radar signals in dynamic environments. An environment scan without measurement noise is proposed for map matching.PF is used to compute the vehicle position using the results of this matching. The map size used in this technique is only 200 KB. In addition, references [51] and [52] proposed a joint spatial and Doppler-based optimization framework to further improve the localization speed. The framework expresses the reference point cloud by constructing a sparse Gaussian mixture model, which is a sparse probability density function that reduces the computational complexity. The method achieves a localization refresh rate of up to 17 Hz. reference [47] constructs a reference map using radar scan data of the same road. Then, iterative closest point (ICP) is used to match the radar image to estimate the vehicle position. Finally, EKF smoothing estimation is applied. This technique reduces the computational load of map matching due to the small size of the required mapping data. However, the challenge lies in the need to obtain the latest data from the same model of the sensor and samples from the same road as the reference for matching. In addition, Reference [53] designed a vehicle-located ground-penetrating radar (LGPR) system to construct a subsurface map of the road. This system can resist signal interference from complex weather because its radar is mounted under the chassis for scanning the ground. In addition, it can achieve high accuracy (RMSE of 12.7 cm) and excellent real-time performance (~126 Hz refresh rate). However, the authors also mentioned that the height of the LGPR array needs to be further reduced to accommodate more passenger cars.

Ultrasonic-based localization

Ultrasonic-based localization is widely used for indoor robot localization due to low-cost ultrasonic sensors. However, short detection distances and sensitivity to ambient temperature, humidity, and dust limit the wide use of ultrasonic sensors in AV localization [54], [55].Moussa et al [56] implemented an ultrasonic-based aided navigation solution using the EKF algorithm. This solution uses ultrasonic sensors as the primary sensor for localization when GPS cannot limit the drift of the vehicle position and enhance the robustness of the system. It achieves excellent real-time performance (~92Hz refresh rate) but position errors of up to 7.11 m. Jung et al [13] used ultrasonic sensors, encoders, gyroscopes, and digital magnetic compasses, as well as a SLAM approach to estimate the absolute position of the vehicle. The average position update time of this method is up to 10.65 s. In addition, the long SLAM calculation process may lead to some cumulative errors in the positioning system caused by the IMU before the position update. Therefore, the average travel distance that can satisfy the position accuracy requirement is only about 5.2 m. In conclusion, the ultrasound-based localization technique can achieve a low-cost and low-power localization system. However, its positioning accuracy and robustness still cannot meet the requirements of autonomous driving.

Discussion

Accurate and robust feature detection methods in LIDAR-based map matching techniques can improve the accuracy and robustness of AV localization [57]. In conclusion, as far as the LIDAR-based 1D map matching technique is concerned, the computational load and memory usage in feature alignment is low because the method uses only a few anomalous lines as features, such as the vertical angles shown in references [31] and [32]. However, this approach needs to address the challenges in the absence of vertical buildings on the roadside. Compared to 1D maps, 2D maps contain rich feature types but increase the map storage space. The intensity-based 2D map approach can enhance the road representation in snowy road scenes. Hybrid map-based algorithms can reduce memory usage and address the tradeoff between real-time performance and localization accuracy, such as the topological metric map shown in reference [38]. 3D map-based matching algorithms can obtain accurate and robust locations that benefit from 3D features. However, it requires the largest computational resources compared to 1D map and 2D map-based approaches, which will increase the deployment cost of AV localization systems. Radar is a cost-effective solution compared to high-cost LiDAR-based localization, but the low resolution of the environment model obtained by millimeter-wave radar and the lack of object height information make it difficult to achieve robustness and accuracy in localization systems. Currently, radar is widely used as an auxiliary positioning sensor to detect the distance between a vehicle and an obstacle. The detection range of ultrasonic sensors (~3m) determines that ultrasonic-based localization is mainly used for short-range localization applications, such as automatic parking, where several reference targets are located at close distances.

4 Passive sensor-based positioning

GPS-based positioning

GPS can provide a low-cost and efficient positioning solution for AV. However, GPS is often affected by NLOS, multipath, or signal blockage in cities, all of which pose challenges in providing reliable vehicle localization targets [58], [59]. Current mainstream GPS-based positioning improves accuracy and reliability through position correction techniques, including fusion of measurements from different sources [60], filtering of anomalous signals [61], and map assistance [62]. Reference [63] improved GPS-based positioning by fusing measurements from other sources, including GPS, RFID, and V2V. The authors analyzed the accuracy of different data sources and filtered out redundant connections. They keep only the connections with the desired accuracy to achieve the robustness requirements in a GPS degraded environment. The position accuracy of the proposed method is about 2.9m and the computational complexity is about 0.8% of [64]. Reference [61] proposed a GPS anomaly signal identification processing framework to improve the robustness of GPS-based positioning. The framework can decide to output original GPS, estimated GPS, or GPS with anomalous signals removed depending on the quality of the original GPS. unlike the previous two techniques, Lu et al [65] improve GPS accuracy by matching low precision open source maps. However, the limitation of this method is that it is difficult to extract the lane marking features in road intersections. Meanwhile, [66] proposed a global navigation satellite system (GNSS)-based positioning method by removing anomalous GPS signals and combining it with the terrain height assistance of digital maps. Reference [67] improved GNSS accuracy by matching NLOS signal delays. Nevertheless, the position RMS errors of [66] and [67] are still as high as about 10 m in urban scenarios. in conclusion, achieving reliable and accurate vehicle positioning using standalone GPS receivers is difficult.

IMU-based positioning

IMUs are a component of inertial navigation systems (INS) that can measure acceleration and pitch rate and are robust against interference [68]. However, the IMU cannot be used by autonomous driving systems to calculate positions over long distances due to the drawback of cumulative errors. In this case, IMUs are widely used as one of the backup sensors or fusion sources to ensure continuous positioning in case of short interruptions of the primary positioning sensor [69]. Reference [70] proposed the use of a closely coupled (TC) scheme based on heading projection (DR) to improve the accuracy performance in cities. Reference [71] used a modified TC with anomalous GPS measurement suppression to achieve continuous positioning in GPS invalid environments.Wang et al [72] proposed a scheme based on a set of autoregressive, moving average prediction models and occupancy grid constraints to further improve the positioning accuracy; the scheme also reduces the cumulative error of the DR system and multipath interference on GPS. Other related algorithms [73]-[76] can be found in specific papers. In addition to the DR method, pattern recognition of the pitch rate signal output from the IMU can also be used to calculate the vehicle position. The principle of this method is to extract the vibration and motion patterns of the vehicle by analyzing the pitch rate signal. Then, pattern matching is performed using a pre-constructed index map for position estimation. The technique has no cumulative error and thus has a reasonable accuracy (about 5m). However, the drawback is that it is susceptible to measurement noise [68], [77], [78].

Vision-based localization

Vision-based localization can usually be achieved with reasonable accuracy. The prevalence of multicore CPUs and GPUs and their increased power of parallel image processing have alleviated the pressure caused by the high computational complexity of such localization methods [79], [80]. Reference [81] uses four fisheye cameras, a pre-built map and the current vehicle position pose to detect symmetric parking markers within a given range in an autonomous parking scene. The detection results are then used as direction markers to match with the pre-constructed map. This method enables vehicle localization with a parallel position error of 0.3 m and a localization time of 0.04 s. Du et al [82] developed an improved sequential RANSAC algorithm to efficiently extract lane lines from images for feature matching; they achieved a position error of approximately 0.06 m and a localization refresh rate of 0.12 s in scenes with lane lines. Reference [83] constructed a lightweight 3D semantic map based on road landmarks for feature matching, and then minimized the residual alignment error to estimate the vehicle position. The map reduces memory usage, which only leads to four iterations of image matching. However, the drawback of this approach is that it still requires further testing when used in curved scenarios. Other related algorithms [6], [84], [85], [86], [87] can be found in specific papers. Meanwhile, reference [88] developed a topological model to obtain a set of possible nodes close to the captured image from a reference map. Then, they matched the extracted overall features with the possible nodes of the nearest nodes. Finally, by correlating the features of this node with the local features in the image, reliable vehicle localization was achieved with a position accuracy of 0.45m. However, this method suffers from the illumination sensitivity, which may lead to localization failure. Reference [89] proposed an extended Hull census transform method for semantic description and feature extraction from a full range of image datasets to construct topological maps. By combining content- and feature-based image retrieval methods for scene recognition, this work achieves robust localization with about 85.5% confidence in changing luminance and dynamic obstacle scenes by matching the recognition results to the topological map. However, the challenge of this technique is its position refresh cycle of up to 2 seconds.

Discussion

In conclusion, the analysis of passive sensor-based localization techniques shows significant advantages in obtaining low-cost AV localization. However, it is important to note that independent passive sensors cannot meet the accuracy and robustness requirements.GPS is often affected by NLOS, multipath, or signal blockage in cities, which poses a challenge to the consistency and integrity of positioning. GPS-based positioning can be improved by fusing GPS measurements from different sources, defective signal boundaries, and map aids. When GPS signals are not available, DR systems can provide real-time consistent vehicle locations. For example, as shown in [9], the DR-based IMM approach reduces system drift and improves positioning robustness and integrity in GPS outage or GPS signal module environments. However, GPS-based and IMU-based localization still need to further improve the accuracy, consistency, and integrity performance of GPS-IMU signals in the presence of long-term anomalies. Vision-based localization can achieve a localization RMSE of 0.14 m. However, reasonable localization times typically require systems with GPUs for acceleration. In addition, the reliability of the camera under insufficient lighting or adverse weather conditions (e.g., fog and rain) still needs further study. The above discussion suggests that data fusion techniques will be the trend to achieve cost-efficient localization solutions by fusing multiple low-cost sensors. Meanwhile, recent studies on sensor fault detection and identification methods in references [90]-[93] show significant advantages in improving the robustness of localization, such as IMM-based fault identification methods, multi-model and fuzzy logic-based fault detection methods, etc. Future research needs to focus on these techniques and defect data modeling methods.

5 V2X-based localization

V2V-based localization

V2V-based localization does not require vehicles to be equipped with high-precision sensors to achieve accurate positions under VANET. Instead, it can achieve reasonable positional accuracy by fusing coarse positional information from other connected vehicles [94]. However, the drawback is that the insufficient or non-uniform distribution of participating vehicles on the road may lead to insufficient localization accuracy [95], [96]. Liu et al [15] proposed a weighted least squares-dual difference method based on sharing GPS pseudo-distance measurements with other vehicles to calculate inter-vehicle distances. They used a distributed position estimation algorithm to fuse the shared data and achieved a positioning accuracy of about 4 m. This solution reduces the effect of random noise and improves the accuracy of calculating inter-vehicle distances. Reference [97] proposed the use of a Bayesian approach to fuse GPS location information from other vehicles with the target vehicle GPS location and inter-vehicle distance for vehicle localization. This method can significantly reduce the localization uncertainty. To eliminate the challenge that participating vehicles require a predefined dynamic motion model to achieve data fusion, reference [98] computes a confidence level about the current position of the vehicle, which is a probability that the vehicle position can be inferred and propagated in the VANET. Then, they use angle of arrival and TOA techniques to measure inter-vehicle distances to show the relative positions of neighboring vehicles. Finally, the vehicle positions are estimated by calculating the weight sum of adjacent positions; the positions include relative positions and beliefs. The location accuracy of this method is about 1.95m, but the refresh rate is up to 1.4s (7 vehicles accessing the network). Other related algorithms [99]-[103] can be found in specific papers.

V2X-based localization

V2I-based localization infers vehicle location based on the location of nearby infrastructure. It enables accurate, real-time and robust localization performance.The advantages of V2I technique include high accuracy localization of infrastructure, stable data sources independent of time and low computational complexity. References [104] and [105] proposed magnetic marker-based V2I localization. First, magnetic markers with unique Gaussian pole array distributions are arranged at regular intervals on the road and the location and distribution of each marker are stored in a database. Then, each marker is detected and its Gaussian distribution is calculated while the vehicle is in motion. Finally, the vehicle location is determined by searching the database for that distribution. This method minimizes the effect of distortion and achieves centimeter-level (<10 cm) localization accuracy.RFID technologies, including low-cost RFID readers and RFID tags, are also used for localization.RFID tags are deployed on the road and vehicles equipped with RFID readers can determine the location based on the tags [106], [107]. As for the disadvantages, these technologies require high density infrastructure and are vulnerable to infrastructure congestion. Other related algorithms [108]-[113] can be found in specific papers.

Discussion

From the review of V2X localization techniques, both V2V and V2I solutions do not require expensive dedicated hardware. For V2V-based solutions, an adequate and uniform distribution of participating vehicles on the road can improve localization accuracy and robustness. However, the increasing number of vehicles may lead to a higher system computational overhead without much improvement in accuracy. Efficient clustering architectures for creating hierarchical structures between nodes can provide accurate V2V communication services under VANETs with long distances. The challenge of accurate information exchange between vehicles can be overcome by further research on such architectures.The CMM approach can provide a potential way to eliminate multipath errors between antennas, but the problem of propagating signal delays still needs to be further addressed.Signal delays for V2X systems are recommended to be within 10 ms [3]. Signal degradation and packet loss can be addressed by optimizing network parameters (e.g., data baud rate, propagation frequency, and antenna power), as discussed in detail in a previous survey [8]. RFID-based V2I systems can enable cost-efficient AV location. However, these methods require high density infrastructure and are vulnerable to infrastructure congestion. RFID-based technologies are well suited for applications where AVs travel on fixed routes, such as sightseeing buses at zoos or container handling trucks at ports. Optimizing the relationship between RSU height, propagation angle and transmission power can ensure signal strength and wide network coverage for RSU-based V2I positioning. Although signal delay still needs to be further addressed to improve positioning accuracy.

6 Data fusion-based localization

Multi-sensor based data fusion localization

Previous discussions have shown that no independent sensors can meet the accuracy, real-time, and reliability requirements for AV localization. Data fusion of multiple sensors shows great potential for achieving accurate, real-time, and reliable self-localization. Reference [114] developed an interactive multi-model (IMM) filter consisting of a vehicle dynamics model and a vehicle kinematics model to achieve cost-efficient AV localization through the use of low-cost sensors.GPS data and in-vehicle sensor (i.e., wheel speed sensor and steering angle sensor) data are used for this filter.The IMM filter can be based on various driving scenarios to weigh the appropriate model for the data fusion implementation. This approach allows for reasonable localization performance in a 32-bit embedded processor. Reference [115] suggests the use of three IMM-based UKF construction models to fuse low-cost sensor data, such as GPS and inertial sensors. The model reduces most of the uncertainty noise from inertial sensors, predicts and compensates for positioning errors, and can achieve a position accuracy of 1.18 m during GPS interruptions. For dynamic maneuvering situations, such as strong acceleration, high-speed turns, and starts and stops, Ndjeng et al [116] showed that an IMM-based positioning system using low-cost sensors (e.g., IMU, odometer, and GPS) outperformed an EKF-based positioning system. They concluded from practical experiments that the robustness performance of IMM-based localization outperforms the high variability of EKF-based vehicle dynamics manipulation. Other related algorithms [117]-[131] can be found in specific papers.

Map-based data fusion for localization

Map-based data fusion techniques are based on multi-sensor measurements and improve the localization performance by adding map information. For example, Suhr et al [132] proposed to fuse low-cost sensors with digital maps to improve the real-time performance. They represent lane and road marking features as a set of key points and use a front-view camera module to process the captured images. This solution reduces memory usage and computational overhead; moreover, its position refresh rate is about 100 Hz and position accuracy is about 0.5 m. Tsai et al [133] proposed a data-driven motion model that does not use inertial sensors to eliminate the challenge of integration errors. They corrected the GPS position and the lateral distance of the camera by using a high-definition map, and then used both information as fused data. The method reduced the position error by 1/3 compared to pure GPS positioning.Gruyer et al [134], [135] proposed a map-aided data fusion method based on an accurate digital map, GPS, IMU, and two cameras to obtain the AV lateral position with subdivisional accuracy. They first estimated the distances from the vehicle to the road markers on the left and right sides of the vehicle through two side cameras. Then, they used the EKF to estimate the vehicle position by GPS and IMU sensor measurements. Finally, they combined the previously estimated vehicle positions with the matched line segment positions obtained through a point-to-line segment based map matching algorithm to further improve the localization accuracy and reliability. Other related algorithms [136]-[139] can be found in specific papers.

Discussion

The analysis shows that techniques based on low-cost multi-sensor (e.g., GPS, IMU, camera, and odometer) data fusion can provide a cost-effective commercial positioning solution for self-driving vehicles. Multi-sensor data fusion techniques that incorporate GPS measurements still need to address GPS integrity issues. IMM-based fusion approaches can reduce much of the uncertainty noise from inertial sensors and improve positioning accuracy and robustness during GPS outages or GPS signal blocking. However, IMM still achieves positioning errors in the meter range. By modeling the defect data as intervals, the interval approach enables vehicle localization with high integrity and consistency. The localization RSSE and update time of this method can be about 15 cm and about 170 ms, respectively. interval technology can provide a potential fusion-based localization solution for the market. However, the overall localization performance in different complex environments still needs further validation to achieve full AVs. a collaborative approach with map fusion can also lead to accurate and robust localization solutions. For example, reference [136] shows a collaborative approach that can enhance localization accuracy and robustness by fusing with multiple sensors (e.g., GPS, cameras, etc.), SLAM, and maps. In addition, attention can be paid to fault detection and identification techniques for different sensors to ensure more robust AV localization. In summary, the above discussion shows that data fusion-based techniques have great potential to trade-off commercial autonomous vehicle localization performance between economy, real-time, accuracy, and robustness.

7 Accuracy and Real-Time Performance Discussion

Related Work on Positioning Performance Evaluation

Real-time, accurate and robust AV positioning is one of the key elements to ensure safe driving. Performance comparisons of different positioning techniques can guide sensor selection and research purposes of AV systems. Many works related to the accuracy and robustness performance comparison of different localization algorithms have been published.Zhang et al [138] theoretically analyzed the convergence and consistency of RI-EKF-SLAM and compared its localization performance with SO(3)-EKF-SLAM.Zhang et al. used 1D, 2D and 3D simulations to compare the RI-EKF-based SLAM and optimization-based SLAM in terms of accuracy and consistency performance. [140]. In addition, Mourllion et al [141] demonstrated the performance of Kalman filter variants, such as EKF, UKF, and first- and second-order division differences (DD1 and DD2), in the prediction step of vehicle localization.Gruyer et al [142] compared these KF variants using accuracy- and filter uncertainty- and consistency-based criteria as well as multisensor experimental measurements of the overall localization process (prediction and correction steps).Ndjeng et al [116] evaluated the accuracy and robustness performance of IMM-based and EKF-based low-cost localization systems in dynamic maneuver scenarios. To date, few works have compared the real-time performance of localization. References [6] and [149] compared the localization time of the same solution for CPU and GPU based platforms. Reference [143] ran filtering algorithms on CPUs and GPUs to compare their execution times. However, the above real-time performance comparison only runs the same algorithm on various platforms. The real-time performance of different localization solutions is demonstrated on various hardware platforms and programming languages. In addition, the overall solution localization time is affected by the data extraction and raw search steps, the core localization algorithm execution, and the map storage and update (if a map is used). In order to perform a fast real-time performance comparison of different solutions without actual tests, the paper first assumes that the localization times shown in the different research papers are related to the complete localization solution and not only to the algorithms. Secondly it is assumed that the code running in each solution makes full use of all computational sources. Thus, the localization times of different solutions can be converted to the same benchmark based on different hardware computing power and programming language execution efficiency. The real-time performance of the different solutions can then be compared approximately and quantitatively.

Equivalent comparison method

The discussion of different localization techniques shows that AV localization relies heavily on CPU and GPU as hardware platforms and MATLAB and C/C++ as programming languages. It is well known that different hardware has different computational capabilities. For example, GPUs are 52 times faster than CPUs when processing LIDAR 3D point cloud data using filtering algorithms [143]. For programming languages, C/C++ is a compiled language that is translated into machine language before execution.MATLAB is an interpreted language in which each line of code must be read and interpreted by an interpreter during execution, which makes it much slower than compiled languages [144], [145]. Therefore, when comparing the real-time performance of different positioning techniques, it is important to consider the use of hardware and programming language. As a first step, the equivalent conversion factors of the localization algorithm operating capability (LAOC) between CPU/GPU families and between CPUs and GPUs must be determined.All CPUs/GPUs in the CPU/GPU family originate from hardware platforms with different localization techniques. In this paper, single precision floating point (SPFP) peak performance is used to determine the LAOC equivalent conversion relationship for the GPU/CPU family, since positioning algorithms typically involve SPFP operations. In the CPU family, the SPEC CPU2006 benchmark [146] aims to compare the computationally intensive performance of different CPUs at the hardware level. This depends on factors such as processor, memory architecture, and bus. The benchmark allows a comprehensive evaluation and comparison of the hardware performance of different CPUs [147]. Therefore, the LAOC equivalent conversion relationship between CPU families is based on SPECfp2006 [148], which gives the relative peak floating point operations per second (FLOPS) performance of CPUs. For normalization, the minimum value of the relative peak FLOPS performance shown in this paper is used as the baseline for which the LAOC equivalent conversion factor is determined.The LAOC equivalent conversion factor between CPU families is used by using, as shown in Table III.

For the GPU family, factors that affect the FLOPS function include frequency f, number of cores N, and single-precision fusion multiplication and addition operations (FMA) per cycle for each core FMA. fMA can be found on the official website of the selected GPU. The theoretical single-precision peak performance can be estimated by using the following equation.

For the same data transfer and replication that can represent the actual SPFP computational power of the GPU, the conversion relationship between GPU families is based. For normalization, the paper defines the minimum FLOPS peak performance given in this paper as the baseline with the LAOC equivalent conversion factor. the LAOC equivalent conversion factor between GPU families is calculated by using a, as shown in Table IV.

For LAOC equivalence between CPU and GPU, Charmette et al [6], [149] have performed many representative works in comparing CPU and GPU computational performance in positioning applications. In this paper, the conversion factors between CPU and GPU are based on the findings of their recent study [6]. The conclusions show that the positioning time of GPU is about 45 times higher than CPU for the same method. The authors mention that only one core in a dual-core CPU is used for localization. Therefore, the paper considers that the peak FLOPS performance of the CPU in [6] is half of that of the same dual-core CPU, as shown in Table III. The LAOC equivalent conversion factor between CPU and GPU in [6] is determined. The thesis considers C/C++ as the programming language benchmark with its LAOC equivalent conversion factor set to. MATLAB is set to. Finally the thesis selects the benchmark peak FLOPS performance as the hardware benchmark and C/C++ as the programming language benchmark. The positioning times based on different hardware and programming languages must be transferred to this benchmark for comparison. The conversion method is given by the following equation.

Method validation

In this paper, reference [29] is used to evaluate the proposed LAOC-based equivalence comparison method. Reference [29] compares the localization times of the same solutions based on CPU and GPU platforms. the TR and TC for CPU and GPU, and the LAOC equivalent conversion factors h and s for hardware and software are listed in Table V, respectively. Table V shows that the difference in positioning time before conversion is due to the different hardware platforms (CPU and GPU). The post-conversion positioning time is considerably higher because the hardware benchmark has the lowest peak FLOPS performance and the same programming language. In addition, the conversion results show that, this means that the CPU and GPU based localization times are similar after the conversion. This is because both Solution A and Solution B are the same solution but implemented in different hardware platforms. Therefore, the LAOC-based equivalence comparison method is reasonable and can be used to approximate and quantitatively compare different localization solutions. Table VI summarizes the relative computational complexity of the different positioning techniques calculated using equation (2).

Discussion

1) Accuracy and real-time performance: This section quantitatively compares the computational complexity and position errors of all the above mentioned localization techniques. Figure 2 shows that in LiDAR-based localization, the 3D map-based approach outperforms the 2D map-based approach in terms of accuracy because it contains rich feature information. However, the 3D map-based technique increases the memory usage and computational load, resulting in a higher computational complexity of the algorithm. Moreover, although the accuracy differences between 2D map-based techniques are small, the computational complexity varies widely due to the different approaches. For example, the computational complexity of the 2D GMM matching technique in [29] is about 2000 times higher than the combination of the multilayer RANSAC alignment and 2D map matching methods in [42]. Compared to LIDAR-based localization techniques, radar and ultrasonic-based localization techniques have lower computational complexity because they emit low-density electromagnetic waves. The computational complexity and positional errors of radar localization are intermediate between lidar and ultrasonic localization; although the combination of particle swarm optimization and grid map matching methods achieves reasonable localization performance, the method requires rigorous sensor deployment. Due to the low accuracy of ultrasonic sensors, the position accuracy of ultrasonic-based techniques has a position accuracy of about 10m.

Figure 3 shows that for pure GPS positioning in the open sky, the GPS receiver can output position information with a frequency of 1 Hz and an accuracy of 2-10m, regardless of the vehicle operating system. Compared to other sensor-based positioning, IMU-based technology allows for the lowest computational complexity due to its fast position refresh rate, but its cumulative error results in a positioning error of about 1m in only 10 minutes of driving time. In vision-based localization, the rich environmental information contained in the images makes the computational complexity similar to that of the LIDAR-based approach. However, vision cannot accurately measure the range of surrounding objects due to image quality and lens distortion challenges. As a result, its localization accuracy is lower than that of LIDAR-based techniques. In addition, its computational complexity decreases with the dimensionality of the reference map, but its positional accuracy does not vary much.

As shown in Figure 4, the real-time performance of V2X-based localization is better compared to LIDAR and vision-based localization, but its accuracy is unsatisfactory due to the challenges of signal delay or insufficient participating nodes.

Figure 5 shows that data fusion-based techniques can achieve a balance in terms of accuracy and real-time performance compared to other sensor-based localization. This is because it leverages the strengths of each sensor to reduce the impact of the other sensors’ weaknesses, and each individual sensor does not require the development of complex algorithms to achieve its optimal localization potential.

In summary, the computational complexity of the different sensor-based localization techniques differs by a factor of about 10^7 at most, while the position error differs by a factor of about 100. Table VII summarizes the performance of different sensor technologies in terms of accuracy and real-time performance.

2) Application scenario: The accuracy and real-time performance to meet the safe driving requirements of AV applications are position error and position output frequency, which are required to be less than 30 cm [3] and 100 ms [151], respectively. The analysis shows that LIDAR, vision and data fusion based localization has the potential to meet the accuracy performance. LIDAR and vision-based techniques use powerful processors, such as high-performance GPUs and multicore CPUs, to meet real-time performance requirements. Data fusion-based techniques fusing multiple low-cost sensors (e.g., cameras, GPS, IMUs, and on-board sensors) are less computationally complex than LiDAR and vision-based techniques. In summary, fusion techniques have considerable potential to achieve cost-effective autonomous positioning. Furthermore, Table VII can guide the choice of localization solutions in different scenarios. For urban environments where pedestrians and vehicles are highly involved in traffic, positioning accuracy and real-time requirements are highest compared to other common driving environments. Although LIDAR-based, vision and LIDAR- or vision-based data fusion technologies may increase the cost of hardware deployment to achieve real-time performance, these technologies can achieve precise localization accuracy. Freeway and suburban scenarios have fewer pedestrians and vehicles around the AV. The accuracy requirements in these scenarios may be lower than those in urban environments. However, AVs require long-range detection sensors to sense surrounding obstacles and high-frequency position outputs for high-speed driving. Therefore, localization technologies with long-range sensor sensing and real-time performance may be a potential option, such as data fusion, radar, and V2V-based technologies. AVs used as city buses or sightseeing buses travel at lower speeds due to fewer obstacles in the dedicated lanes, so accuracy and real-time performance requirements are lower than in the above cases. In this case, low-cost data fusion, V2I and radar-based localization technologies may be the preferred solution. In automated parking scenarios, the detection distance and positioning real-time performance do not need to be as high as in the above-mentioned applications. Therefore, low-cost ultrasonic and radar technologies may be the most promising options.

8 Conclusion

This paper has reviewed the latest self-localization techniques based on active sensors, passive sensors, V2X and data fusion and quantitatively compared their performance in terms of accuracy and computational complexity. Compared to 1D map and 3D map matching methods, LiDAR-based 2D map matching methods show the most significant promise for balancing the localization performance of commercial AVs between cost, accuracy, real-time, and robustness. However, LIDAR-based localization is more expensive than other sensor-based localization (e.g., radar-based localization, vision-based localization, and V2X-based localization). In addition, the real-time performance of LiDAR-based (2D) solutions may be limited by the system computing power and require powerful CPU/GPU acceleration, which can increase the deployment cost of AVs. Further improvements in LiDAR-based (2D) solutions are needed to reduce the localization update time using low-cost processors. Passive sensor-based localization solutions show significant advantages in terms of low deployment costs. The challenge is that for typical passive sensors, such as GPS-based sensors and IMUs, the completeness and consistency of localization makes the technology still difficult to apply to AV. vision-based localization allows for high accuracy vehicle location, but may require GPU acceleration to process large amounts of image data. Camera reliability in poor lighting or inclement weather also needs to be further addressed. v2X technology can provide a cost effective AV positioning solution over the wide range of signal strength and network coverage of a VANET. RFID-based technology is well suited for fixed-route AV applications, such as tour buses at zoos and container handling trucks at ports. However, signal latency and packet loss issues in V2X systems need to be further optimized to improve positioning accuracy and consistency. Compared to other sensor-based localization solutions, data fusion-based technologies have the greatest potential to trade-off the economic, real-time, accurate and robust localization performance of commercial AVs. For example, interval theory-based techniques can achieve vehicle positioning with high integrity and consistency by fusing low-cost sensor data (e.g., GPS, IMU, and odometer). Further research and validation of the technology under different changing environments and various driving conditions (e.g., long-distance driving) is critical before commercialization. In addition, comparative analysis between real-time and accuracy performance has shown a maximum difference of about 100 times in position error between different sensor-based localization technologies. LIDAR, vision and data fusion-based localization techniques have the potential to meet the accuracy requirements (less than 30 cm) for safe AV driving. LIDAR-based techniques achieve the best positioning accuracy compared to other sensor-based techniques, and different LIDAR-based methods achieve similar positional accuracy. In addition, high-dimensional map matching or intensity-based matching methods can reduce the position error by about 2-3 times, but can increase the computational complexity by about 20-2000 times. Vision- and data-fusion-based localization has a potential to improve position accuracy by about 2-5 times compared to LiDAR-based localization. In terms of real-time performance, the maximum variation in computational complexity between different sensor-based technologies is about 10^7 times. It has significant room for improvement compared to accuracy. IMU, ultrasonic, multi-sensor fusion and radar-based self-localization can meet the real-time performance requirements for safe driving with low-cost processors (<100ms), while LIDAR and vision-based localization can be achieved in real-time by using powerful processors. However, IMU, ultrasonic and radar-based technologies have insufficient positioning accuracy and are often used as secondary positioning solutions in AV. LIDAR-based techniques have the highest computational complexity and approximately 2000 times the maximum variance compared to different methods. Focusing on improving LIDAR image alignment methods can improve the real-time localization performance of LIDAR-based techniques. Vision-based localization has a similar computational complexity to the LiDAR-based approach, with a maximum variance of about 1000 times compared to the different approaches. Improving the efficiency and accuracy of capturing image correlations can improve accuracy and real-time performance. In addition, matching low-dimensional features can reduce computational complexity, but has no substantial impact on accuracy. Compared to LiDAR and vision-based localization, data fusion-based localization achieves better real-time performance because each individual sensor does not require the development of complex algorithms to achieve its optimal localization potential. In addition, it achieves the best balance between accuracy and real-time performance. In conclusion, LIDAR, vision and data fusion based technologies still have much to offer in terms of real-time performance. The discussion shows that no single sensor can meet all the localization requirements for autonomous driving. Compared to other single sensor-based technologies, data fusion-based technologies will be the focus of research to achieve cost-efficient self-localization for AV. In addition to traditional fused information sources such as GPS and IMU, V2X will be a promising solution, mainly due to its excellent robustness to light and weather. It has a wide detection range (~300m), which can increase the data sources and improve their stability. However, the trade-off between accuracy, real-time performance and robustness still needs further research. In addition, future research needs to focus on sensor fault detection and identification techniques and defect data modeling methods to ensure robust and consistent AV localization. With the rise of new emerging methods, such as machine learning and deep learning. The performance of map-based localization can be enhanced as artificial intelligence algorithms have great potential to learn features automatically. We refer the reader to the recent survey by Fayyad et al [152], which provides a comprehensive overview of deep learning-based localization.

9 Ref.

[1] Real-TIme Performance-Focused LocalizaTIon Techniques for Autonomous Vehicle: a Review

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663