Abstract
Self-driving technology companies and the research community are accelerating the pace of use of machine learning longitudinal motion planning (mMP) for autonomous vehicles (AVs). This paper reviews the current state of the art in mMP, with an exclusive focus on its impact on traffic congestion. The paper identifies the availability of congestion scenarios in current datasets, and summarizes the required features for training mMP. For learning methods, the major methods in both imitation learning and non-imitation learning are surveyed. The emerging technologies adopted by some leading AV companies, such as Tesla, Waymo, and Comma.ai, are also highlighted. It is found that: (i) the AV industry has been mostly focusing on the long tail problem related to safety and has overlooked the impact on traffic congestion, (ii) the current public self-driving datasets have not included enough congestion scenarios, and mostly lack the necessary input features/output labels to train mMP, and (iii) although the reinforcement learning approach can integrate congestion mitigation into the learning goal, the major mMP method adopted by industry is still behavior cloning, whose capability to learn a congestion-mitigating mMP remains to be seen. Based on the review, the study identifies the research gaps in current mMP development. Some suggestions for congestion mitigation for future mMP studies are proposed: (i) enrich data collection to facilitate the congestion learning, (ii) incorporate non-imitation learning methods to combine traffic efficiency into a safety-oriented technical route, and (iii) integrate domain knowledge from the traditional car-following theory to improve the string stability of mMP.
Self-driving cars are around the corner, quite literally. And yet, despite numerous studies ( 1 – 7 ) on the potential impacts of autonomous vehicles (AVs) and connected and autonomous vehicles (CAVs) on traffic flow, a reliable car-following (CF) model describing the longitudinal dynamics of AVs is still lacking. This makes evaluating the impact of AVs on traffic flow challenging. Recent empirical experiments reveal that the existing longitudinal control systems on level-2 AVs are string unstable ( 8 – 10 ), which indicates that small perturbations (e.g., speed fluctuations) tend to grow upstream of a platoon, and eventually lead to full stop-and-go motions. Those empirical findings are surprising, and indicate that AVs might cause more traffic congestion even than human drivers. The results also distinguish from the successful design or implementation of the string-stable longitudinal controller in the literature, including both adaptive cruise control (ACC) and cooperative adaptive cruise control (CACC) algorithms ( 11 – 15 ). It is conjectured here that the gap between the practice and the theory may result from: (i) the longitudinal control of level-2 AVs, also known as ACC, not factoring string stability in its design; (ii) in real-world scenarios, some other issues (e.g., safety, efficiency, comfort, or user acceptability) being weighted more than string stability performance, thus the controller will suppress the string stability properties to satisfy other performance metrics ( 16 ); and (iii) the hardware equipment (sensing devices and actuators) not being capable of realizing the string-stable control command. The rough and choppy measurements, and the slow-response actuator installed on economy daily cars, require the control command to be heavily filtered before being exerted on the vehicle (otherwise the vehicle would behave in an undesirably jerky manner), which makes string stability not achievable. Given the undesired string unstable ACC, it is possible that current AV systems might induce more instability than human drivers, which could induce more traffic congestion and emissions. From a traffic perspective, there is a critical need for a deeper understanding of AVs’ longitudinal behaviors to predict their impact on traffic congestion.
Meanwhile, the current AV technology is fast evolving thanks to the recent advancements in computer vision and machine learning. Notably, we are witnessing a fundamental shift from the traditional radar-based ACC, which relies solely on radar ( 17 ), to camera-included advanced driver-assistance systems (ADAS). The transition is reasonable and as expected, because the traditional radar-based ACC has a limited functionality from its pure reliance on the radar sensor and hard-coded human-crafted rules. Additionally, the inherent structure of radar-based ACC may lead to issues such as: (i) inability to adapt to variable speed limits, respond to the ambient traffic proactively, or predict upcoming incidents, (ii) inability to navigate in stop-and-go traffic because of the limitations in detecting slow-moving or still objects, and (iii) to alleviate the traffic oscillations, the hard-coded CF rules also require more human efforts in examining and tuning the controller.
This shift from radar to cameras can be game-changing because vision opens the gate for incorporating more machine learning methods, such as mMP, for planning. The leading company, Tesla, is famous for its camera-based autonomy solution and its latest full self-driving (FSD) function features “traffic-aware cruise control” ( 18 ). Starting from May 2021, Tesla completely abandoned radar on new releases of its FSD software ( 19 ). Although FSD’s cruise control demonstrates multiple intelligent features, there is no reliable evidence to show whether its longitudinal motion planning is powered by neural networks or the traditional rule-based ACC with extra augmentations. Recently, many other automakers have been catching up and also starting to integrate cameras into the longitudinal control module. A brief summary could be seen in Table 1. In general those automakers adopt a similar ADAS, which adds camera for lane keeping and collision avoidance, and enables low-speed cruise control in stop-and-go traffic where a single radar often fails. General Motors (GM) and Nissan seem to be slightly different. Instead of using radar, GM’s current ACC function is reportedly only using camera ( 20 ), and its upcoming Super Cruise ( 21 ) would be a hands-free function using LiDAR maps of highways. Nissan ( 22 ) has delivered level-3 autonomous driving using a complex suite of sensors similar to that of Tesla. Nissan also claims to be the first automaker that incorporates the three-dimensional high-definition map.
Latest Advanced Driver-Assistance System Technologies from Major Automakers in 2020
On the other hand, although there exist hundreds of AV automakers, there are far fewer AV service providers. The major ADAS service providers with their major customers and collaborators are summarized in Figure 1. More detailed information on service providers of ADAS and other AV technologies are attached in the Appendix. It indicates that, despite the many different brands of AVs, their impact on traffic flow is likely to be similar to each other.

Main suppliers and customers of advanced driver-assistance systems (ADAS).
While the level-2 market AVs are proprietary and no explicit knowledge about their longitudinal control methods is available, the self-driving technology companies/institutions have been more transparent and exhibited a clear goal to achieve and adopt the mMP. Waymo published its feature-engineering mMP approach in Bansal et al. ( 30 ). Remarkably, an end-to-end mMP model was recently open-sourced by Comma.ai, an aftermarket self-driving company which retrofits regular cars with a mono-camera phone. Similar self-driving service is also seen at Mobieye ( 31 ), part of Intel, which helps regular cars to function as AVs with only a single camera device. Similarly, many other self-driving technology companies have published their datasets which indicate mMP methods toward the longitudinal autonomy. Readers are referred to Scale ( 32 ) for a full list of those public self-driving datasets, which are filtered by data type, traffic scenario diversity, and annotation. On the other hand, a plethora of research papers (33–37) have been proposed to accomplish mMP using different learning approaches.
With all that being said, it is highly possible that mMP will be the future of AVs, for both level-2 commercial vehicles and the higher-level FSD cars according to the definition of the Society of Automotive Engineers. As its impacts on traffic congestion are essential and have not yet received enough attention, an in-depth review is necessary to understand the state-of-the-art mMP methods, with the purpose of promoting more traffic-friendly AVs in the long run.
There already exist some review works on AV planning algorithms in the literature, but their focus is not related to traffic congestion or mMP methods. For example, Babak et al. ( 38 ) is limited to the engineering perspective only, focusing on sensors and embedded systems for AVs. Tesla ( 17 ), Katrakazas et al. ( 39 ), and Paden et al. ( 40 ), in the robotics literature, discussed the traditional motion planning approaches like graph search, trajectory optimization, and optimal control methods, which are out of the scope of this study. Quite a few reviews focused on the rule-based AV control, especially for CACC ( 41 ). Attempts to consolidate more relevant studies on mMP of AVs are available. Ni et al. ( 42 ) introduced the development of AVs and basics of deep learning methods, as well as summarizing recent research on theories and applications of deep learning for AVs. However, they aimed to identify challenges and solutions in learning algorithms and took an overview from the vehicle perspective. A summary or discussion from the system perspective, such as the impact of mMP on traffic congestion, has not been presented. Similar conclusions can be drawn from the reviews by Schwarting et al. ( 43 ) and Yurtsever et al. ( 44 ). To the best of the authors’ knowledge, the only work that overviews learning-based AV control methods from artificial intelligence (AI) in the field of transportation engineering is Di and Shi ( 45 ). Nonetheless, that survey was focused primarily on how to deal with interactions between AVs and human-driven vehicles, especially by reference to academic works.
Compared with the existing review papers on AV control, the aim of this study is to provide a comprehensive outlook to consolidate the existing knowledge base of upcoming mMP of AVs and their impacts on traffic congestion. Specifically, this review paper aims to answer the following questions:
Data: Do existing self-driving datasets contain congested scenarios? Do they include the necessary features/labels to train a congestion-mitigating mMP?
Learning method: What are the potential strengths and weaknesses of the typical learning methods in their impact on traffic congestion?
Domain knowledge: How could expert knowledge of traffic flow help the AI community build the congestion-mitigating mMP?
To this end, the paper is organized as follows. The second section introduces available open datasets for AV development; the third section summarizes learning methods for AV control; the fourth section discusses the major limitations and challenges arising from these previous works; the fifth section proposes how to utilize traffic domain knowledge to leverage current mMP, and the final section presents the discussion and outlook based on this review work.
Datasets for mMP
A typical framework in modern autonomous driving systems is shown in Figure 2. Among those pillars, the mMP in this paper falls into the driving policy/path planning module. Following the pipeline, the related components of training data, model input and output are reviewed, as well as the learning methods for mMP.

Fixed modules in modern autonomous driving systems.
Available Open Datasets
Two recent studies (47, 48) provided good reviews of the existing open datasets, which covered data scale, contents (camera or LiDAR, object annotation), road scenarios (urban streets or highway), weather conditions and test vehicle type. Most of the current open datasets are designed to assist computer vision development, even leaving out the some information (e.g., acceleration, trajectory data) required to mimic human driving. From the perspective of traffic congestion, Table 2 summarizes the datasets including the position information that is necessary for learning mMP. This paper also shows specific concerns of the driving scenarios and traffic conditions which are certainly related to mMP model.
Open Datasets and Simulators for Training Autonomous Driving Systems
Note: H = highway; U = urban; T = trajectory data; C = camera data; Y = yes; N = no; ACC = adaptive cruise control.
Among the currently existing datasets, nuScenes ( 54 ) and HighD ( 62 ) have shown some consideration of congestion. The nuScenes dataset collected data from Boston and Singapore, two cities that are known for their dense traffic and highly challenging driving (242 km traveled at an average of 16 km/h). The HighD dataset was recorded at six different locations near Cologne, Germany. However, the authors are not aware of any studies that have used nuScenes or HighD to train an autonomous driving system. Recently, more AV companies from the industry, like Waymo and Lyft, have released some open datasets. Waymo’s dataset ( 49 ) does not provide direct information on trajectory, one needs to derive it using kinematics information. Lyft’s dataset ( 65 ) does not cover congestion scenarios. Tesla has not revealed any plan to publish its dataset yet, but the authors conjecture that with their large deployment of vehicle fleets it would be highly possible to gain enough congestion data. Remarkably, the L3Pilot dataset will record the autonomous driving behavior and the trajectories of 13 OEM autonomous driving systems, which includes 1,000 drivers and 100 cars in various driving conditions (i.e., different weather and traffic conditions) across 10 countries in Europe. The comprehensive coverage and enriched features of the L3Pilot dataset can significantly enhance the research on autonomous driving. However, the L3Pilot autonomous project is still ongoing and the corresponding dataset is not yet available to the public. Thus, the current overall situation indicates the lack of consideration of congestion in both academia and industry. Next-Generation Simulation (NGSIM), an open dataset consisted of two-dimensional trajectories, has been widely used in CF studies for decades. Different from the datasets from the AV industry, the traffic density in NGSIM often varies significantly and covers both full states from free flow to traffic jams. It also exhibits a high degree of vehicle interaction near traffic bottlenecks like on-ramps or off-ramps. The diversity of driving scenarios and the interaction among vehicles makes NGSIM especially valuable for learning driving behaviors under congestion. However, it does not provide any image or LiDAR data compatible with sensors for AVs. Moreover, the OpenACC dataset ( 64 ) provides the highway trajectory data of multiple vehicles driven by different commercial ACC systems. However, similar to the NGSIM dataset, the OpenACC dataset does not provide any video data or vehicle sensor recordings that be can leveraged in end-to-end mMP.
A general issue in those open datasets is that it is unclear whether those miles were driven by human drivers, traditional ACC controllers, or new mMP models. It becomes a major limitation when researchers attempt to reverse-engineer those current mMP models or simply use the data for training. It might also explain why the applications of those open datasets to transportation studies are still very limited. Overall, the current datasets from the self-driving industry are very limited for analyzing the impact of mMP method on traffic congestion. While more and more commercial ACC products are expected to be equipped with mMP in the future, it would be beneficial for research purposes if car companies were to share their driver data.
Simulator Datasets
While it is costly to collect data from the real world, hi-fidelity driving simulators have also been developed to train AVs. CARLA ( 66 ) and TORCS ( 67 ) might be the most popular open-source simulators for autonomous driving research. Related studies based on those simulators can be found in Chen et al. ( 68 ), Panwai et al. ( 69 ), Codevilla et al. ( 70 ), Mirowski et al. ( 71 ), Tan et al. ( 72 ). CARLA can define diverse sensor suites and is also able to generate congested traffic scenarios. A specific method of transferring driving policies from simulations to the real world was shown in Müller et al. ( 73 ). Note that those simulators also make the reinforcement learning (RL) method feasible by providing an interactive environment for agents to learn.
Both academia and industry have been using simulator datasets to test AV software and hardware. For example, in academia, developers from CMU and MIT used TROCS and Talos simulators, respectively, to test their algorithms in simulation before porting them to the vehicle for practical road test (74, 75). Recently researchers have used simulated LiDAR data to develop and test algorithms for AV off-road ground navigation using the MSU autonomous vehicle simulator ( 76 ). To supply the critical events and corner cases for the evaluations of AVs efficiently and effectively, Feng et al. ( 77 ) leveraged RL algorithms to generate naturalistic adversarial critical events in CARLA to test the safety performance of AVs. In industry, simulator datasets have been used by car manufacturers not only to eliminate modeling errors and validate control systems for AVs ( 78 ), but also to evaluate the powertrain performance and the analysis of energy consumption of AVs (79, 80). Waymo (79, 81) and Uber ( 82 ) developed simulator platforms to generate realistic scenarios from their real-world datasets to improve the safety and performance of AVs.
For the impact on traffic congestion, similar simulation-based methods can be adopted to generate more driving scenarios related to the traffic efficiency, besides the safety-oriented experiments. However, even though the simulation-based method is efficient in generating supplementary data, simulating realistic behavior of human drivers in a complex traffic environment remains a difficult task, because surrogate models used in simulation will inevitably induce model bias and over-simplified behaviors. The simulation environment constructed with such a surrogate model can lead to undesired and biased performance measure of AVs. Alternatively, we could use a simple CF model known to be string stable to train a string-stable mMP. Despite the potential benefits of simulator datasets, studies incorporating them to develop a congestion-mitigation mMP model have not been reported. Some studies from the transportation research domain might be close (83, 84), using a simple traffic simulator to train a single AV to stabilize mixed traffic. However, the studies using more high-fidelity driving simulator data to investigate string-stable mMP models have not been found.
Learning Method
Behavior Cloning
A simple yet effective learning method for mMP is to map model inputs to outputs directly, which can be represented as a function mapping from the input features
(i) End-to-end mMP. The end-to-end learning approach behaves similarly to a black box, which takes in the raw video data and outputs the longitudinal vehicle control command (e.g., speed, acceleration, throttle response). Even though the end-to-end approach preserves the advantages of self-optimizing and requiring less manual effort in implementation, it does confront difficulties and challenges in capturing and processing crucial features from raw video frames. Specifically, the video data in traffic congestion would contain multiple clusters and pose great difficulty to image processing and feature extraction. In addition, the congestion data may contain undesired noise or become excessively random for neural networks to learn, which might trigger under-fitting or over-fitting issues. The strategies reported in the existing literature solely rely on two categories of neural networks: convolutional neural networks (CNN) and recurrent neural networks (RNN). For instance, Kim and Canny ( 85 ), Bojarski et al. ( 86 ), Chen and Huang ( 87 ), and Sharma et al. ( 35 ) utilized deep CNNs concatenating with multiple fully connected layers to predict the vehicle steering wheel angles, which demonstrated a decent performance in the real-world driving scenario. Moreover, researchers are also contributing to the vehicle longitudinal command. Considering the spatial-temporal characteristics and the memory impact of vehicle longitudinal trajectories, the long short-term memory (LSTM) or gated recurrent unit (GRU) augmented deep CNN (52, 88, 89 ) are applied to artificially forget or remember the historical frame features to improve the accuracy of vehicle longitudinal commands (i.e., speed, acceleration) prediction.
(ii) Mid-level learning. The mid-level learning method is more interpretable compared with end-to-end learning approach because of its explicit hierarchical structure. The first segment of mid-level learning is to extract the useful CF features (e.g., inter-vehicle spacing, relative speed, lane position, etc.) using computer vision algorithms, then the second segment correspondingly retrofits the CF model with specific neural network. Remarkably, Zhou et al. ( 90 ) showcased the effectiveness of an RNN-based CF model in capturing the traffic oscillation characteristics, which provides an insight on including RNN (e.g., LSTM, GRU) in the deep neural network to retrofit the CF behavior in congested traffic condition. Moreover, some studies (91–94) have demonstrated that by arranging the kinematic information of multiple neighbor vehicles in Laplacian-like feature matrices or tensors and applying graph convolution network to seize the inter-dependency and social pooling of data, performance in predicting the states of ego vehicles could be improved. This phenomenon indicates that features with higher dimension and organized in connected structure might lead to higher accuracy. Under this circumstance, it is also significant to evaluate those hand-crafted features with regard to the model accuracy and parsimony, such that a trade-off can be achieved between model complexity and accuracy.
(iii) Mixed (hybrid) learning approach. As including more useful features in the tensor can boost the prediction performance, some studies have also included another sub-task (e.g., semantic segmentation, image augmentation) to extract those useful features in the training process or incorporated other information (e.g., vehicle kinematic states, ambient traffic information) into the end-to-end learning to improve the model accuracy. For instance, George et al. ( 34 ), Yang et al. ( 95 ), Hsu et al. ( 96 ), and Li et al. ( 97 ) pooled the vehicle kinematic information with the features obtained from video frames using concatenating layer to enhance the prediction of steering angle and acceleration. Xu et al. ( 52 ) conducted a semantic segmentation aside of the longitudinal and lateral end-to-end learning, and added the loss function of semantic segmentation to the driving loss function of end-to-end learning to reinforce the prediction accuracy. It was found that the simultaneous learning of semantic segmentation could outperform both the end-to-end and mid-level learning methods.
Remarkably, the BC method has gained wide popularity within the industry. Waymo’s research paper ( 30 ) reported that, even with 30 million examples and mid-level input and output for motion planning, a pure BC method is not sufficient to train a safe AV. To tackle this, they synthesized more “corner” cases through adding perturbations to the normal driving data. However, it is conjectured that it might not lead to much difference since the longitudinal motion planning under normal driving scenarios is not strengthened by “corner” cases. Although Tesla has not published any official research documents on its motion planning technology, from its investor conference event in April 2019 ( 98 ), one could speculate that Tesla most probably adopts the BC method as well, and the supervised learning model is evolving with the large deployment of vehicle fleets on the roads. Currently, Tesla is adopting a feature-engineering approach rather than an end-to-end method. Evidence can be found from the videos ( 99 ) on the Autopilot official website, in which entities such as vehicles, traffic lights, or cones are all labeled and annotated separately. Moreover, at the Scaled Machine Learning conference in February 2020 ( 100 ), Tesla revealed the neural network architecture applied in the FSD, from which it appears that Tesla is applying a HydraNet for pooling different neural networks which conduct different tasks of perceptions and predictions (e.g., labeling, annotation, semantic segmentation, per pixel depth prediction) but share the same backbone. Correspondingly, the HydraNet fuses the information from all cameras and radars to create a bird’s-eye view for navigating the vehicle.
IRL and GAIL
Another pipeline of imitation learning is to recover the implicit reward function of human driving using inverse reinforcement learning (IRL). IRL defines the cost function of a trajectory
where
Noticing the immense computational cost in recovery of the true reward policy, Ho and Ermon (
107
) found that human driving behaviors can be mimicked directly using generative adversarial imitation learning (GAIL) without discovering a cost function first. GAIL trains the self-driving policy
In a recent work, GAIL was applied to the task of autonomous driving on highway scenario using NGSIM dataset ( 108 ). The result shows that the recurrent GAIL is surprisingly able to capture many desirable properties consistent with real trajectories. Bhattacharyya et al. ( 109 ) extended GAIL to multi-agent learning for highly interactive driving cases. Although the methodology of GAIL is sound, there do not appear to be more follow-up studies from the academic community or industry.
Reinforcement Learning (RL)
The success of imitation learning largely depends on the availability and distribution of labeled data, which are costly to collect. To circumvent this problem, another stream in mMP is working on the non-imitation method, RL, which follows a pipeline as shown in Figure 3. Since RL methods need expert-designed reward functions, they can be designed according to the basic driving rules for autonomous driving, such as gaining faster speed and avoiding collisions. Pan et al. ( 110 ) used RL to train an autonomous driving policy with a pre-defined reward function encouraging higher speed and penalizing crashes. In more recent work, Chen et al. ( 111 ) implemented several deep RL methods and showed good driving performance with dense surrounding traffic. Guo et al. ( 112 ) used the RL method to learn the longitudinal motion planning for AVs to reduce fuel consumption as well as to maintain acceptable travel time. Shalev-Shwartz et al. ( 113 ) applied multi-agent RL in a highly interactive merging case to generate a set of feasible trajectories and then feed a hand-designed cost function to the trajectory planner to select the most smooth and safe trajectory, which makes the longitudinal motion planning no longer a pure BC process. DeepTraffic ( 114 ), a simulation and deep RL environment developed by MIT, has also shown the success of RL in navigating AVs on a congested seven-lane highway. Other similar studies based on RL and traffic simulators can be found in Sallab et al. ( 115 ), Kendall et al. ( 116 ), and Liang et al. ( 117 ).

Architecture of training autonomous driving in simulation using reinforcement learning (RL).
It is worth noting that, in the RL context, the model input also plays an important role because it directly determines the state space that an RL agent can observe. Chen et al. ( 111 ) reduced the state complexity through feature representation based on the raw image, which makes the problem more tractable and computationally efficient. Despite some studies using RL to stabilize mixed traffic in a loop ( 118 ) or near the merging areas ( 83 ), there does not yet appear to have been any success in learning a string-stable mMP for single AVs.
In summary, the involvement of the major motion planning methods is shown in Figure 4, which depicts the transition from traditional rule-based methods to the state-of-the-art machine learning methods. Note that most learning methods fall into the range of BC, and although many alternative learning methods for BC have been proposed in the literature, the leading AV companies still stick to BC (30, 98). Here RL is not considered as a BC method since RL does not directly learn from expert demonstration. It does not require large amounts of data but a high-fidelity simulator. Also, the performance of RL heavily depends on the human-designed reward functions that govern the training process and resulting policies.

Trend of major learning methods for mMP of autonomous vehicles (AV).
Limitations of mMP
Based on the previous review, this section will discuss the current limitations of mMP research with regard to its impact on traffic congestion.
Systematic Lack of Training Data
Datasets that can completely cover regular driving scenarios are still unavailable, let alone the “corner” cases that threaten the robustness of mMP. No driving data were found for multi-lane highways, on-ramp and off-ramp bottlenecks, or generally congested traffic conditions. Since most neural network methods cannot generalize well to unseen situations, the authors believe that the incompleteness of datasets might lead to biased or even unpredicatable CF behaviors. Issues of such limitations in biased datasets were also discussed by Codevilla et al. ( 119 ).
Incomplete Feature Representation
While perception modules can extract human-interpretable features as model inputs for mMP, those hand-selected features may not fully capture all the influencing factors for driving decisions. For example, the specific location information might be totally ignored in model input. From the industry, no information has been revealed about whether the localization results are incorporated into motion planning. While human drivers respond to different locations with varying driving behaviors, such as the “relaxation” phenomenon discovered by Laval and Leclercq ( 120 ), we still do not know whether mMP will react differently in traffic bottlenecks, such as on-ramps or off-ramps.
Codevilla et al. ( 70 ) and Sauer et al. ( 121 ) conditioned the BC with high-level command input for intersections. The included high-level commands are able to resolve ambiguities in the mapping from single image input to low-level commands (steering and speed). It is argued here that in highway driving, such ambiguities will also arise between the exiting and non-exiting vehicles ( 120 ). Thus it would be worthwhile to incorporate driving intention in motion planning for AVs. However, only Tesla ( 98 ) has reported a related project to infer the lane change intention of leading vehicles and integrate it for motion planning.
Limitations in Learning Algorithms
According to Kuefler et al. ( 108 ), the BC method has been successfully used to produce driving policies for simple scenarios such as CF on freeways. However, Wheeler et al. ( 122 ) and Lefèvre et al. ( 123 ) reported different results when applying BC to nuanced states with little or no experience, showing that BC can only produce accurate predictions up to a few seconds. Their results indicate that BC usually demands large amounts of training data, and becomes inaccurate when generalized to unseen experiences. Remarkably, when LSTM or GRU are included in the neural network to retrofit the longitudinal command, these two types of RNN could also face difficulties in transfer learning, posing challenges in generalizing the model. Moreover, the stop-and-go speed profile and the fluctuated and choppy acceleration triggered by congestion appreciably contribute to the difficulties of retrofitting the vehicle longitudinal command, entailing a more intelligent neural network model to capture the fluctuations and discontinuities in the vehicle CF model during congestion. The poor data distribution generated from driver heterogeneity in congestion also contributes to the randomness of the model trained by BC, which casts extra doubt on the generalization of a BC model. Thus, BC could significantly suffer from the scarcity of training data and can be biased because of poor data distribution.
Although IRL and GAIL can circumvent some of the issues with the BC method, they still succumb to the pitfall of imitation learning methods. Chen et al. ( 111 ) summarized three major issues with imitation learning: (i) it needs to collect a huge amount of expert driving data in real-world and in real time, which can be costly and time-consuming, (ii) it can only learn driving skills that are demonstrated in the dataset. This might lead to serious issues given unseen experience in test process, and (iii) since the human driver experts act as the supervision for learning, it is impossible for an imitation learning policy to exceed human-level performance. From the traffic flow perspective, it is argued here that either BC or other deep imitation learning methods will be cumbersome, especially with incomplete datasets lacking the important driving scenarios mentioned above. According to Gao et al. ( 124 ), both BC and IRL algorithms implicitly assume that the demonstrations are complete, meaning that the action for each demonstrated state is fully observable and available. Obviously, this assumption does not hold for the mMP problem.
The existence of limitations with imitation learning methods highlights the potential of non-imitation methods like RL in learning a “better” driving policy to reduce congestion and improve overall traffic efficiency. It is not easy to achieve, though. The major issue with using the RL method is the dependence on a reward function, which must be hand-crafted based on engineering experience and has to be applicable to all driving scenarios ( 125 ). RL methods might cause undesirable driving behaviors by directly transferring their driving policy learned in non-congestion states. Besides, it is argued here that adopting RL transforms the problem of mMP from imitating human demonstrations to searching for a policy that complies with a hand-crafted reward rule. Also, it should be pointed out that RL requires high-fidelity simulation platforms, which must be able to model accurately the appearance of the environment, the physics of vehicles, and the behavior of other participants ( 98 ). Especially important is the modeling of vehicle dynamics to represent the effects of gravity accurately, which has been found to be a key factor in reproducing empirical traffic flow instabilities ( 126 ).
In summary, RL seems to be the only hope to develop “optimal policies” that could potentially outperform human drivers. Despite the difficulty in designing a good reward function, and the requirement of a more realistic traffic environment, it is believed that the “trail-and-error” principle in RL is worth borrowing. Note that Tesla already seems to be working in this direction, and it is able to use the natural traffic environment to test itsalgorithms and collect ground-truth data. Again, it remains unknown whether Tesla has considered the congestion impact in its development program.
Traffic Domain Knowledge
Overall, the current research on mMP is devoting most of its efforts to the long tail safety problem, while its impact on congestion has been almost completely ignored. Through the above review, the major limitations in current datasets and learning methods have been identified, and now some potential future studies are proposed which aim at equipping the learning process with related traffic domain knowledge to fill in the research gap.
Here the main intellectual achievements in traditional CF theory are summarized which are probably worth noting for learning approaches to combine. Concerning the impact on traffic congestion, the most important human CF properties might include: memory and prediction, randomness, and string stability.
Memory and Prediction
For memory and prediction, LSTM, a type of RNN, has been adopted by mMP studies (30, 52, 127 ) to address the impact of memory on future speed choice. Lefèvre et al. ( 123 ) conducted a comparative evaluation of parametric and non-parametric approaches for speed prediction during highway driving. Their study showed that the CF models can perform well for short-term speed prediction, but deep neural networks behave better for long-term prediction. To evaluate the relative performance of different learning methods on the same dataset, Kuefler et al. ( 108 ) compared the GAIL and BC methods using the same two-dimensional trajectories from NGSIM. Their work demonstrated that BC has the best short-horizon performance, and GAIL outperforms other methods including CF models for long-horizon tasks.
The CF models have realized the merits of introducing memory to improve prediction for a long time. Studies have also attempted to make some modifications to the traditional CF models based on their original form. Lee (
128
) revised the linear GHR model (129, 130) to account for the relative speed over a period of time:
It appears that mMP is able to imitate human driving with such memory and prediction property. For example, AVs will decelerate in advance when realizing potential decelerating or cut-in behaviors ahead of them. Notably, Elon Musk ( 98 ) also mentioned that Tesla can even predict a curving path that cannot be seen by humans because of road geometry or limited sight distance. The prediction power of mMP might outperform human behavior. Tesla has also demonstrated that its prediction can be used to infer the intention of other vehicles, such as cut-in behaviors which will be incorporated in AVs’ motion planning. It is conjectured such prediction can improve traffic stability, because AVs can predict disruptive lane changes and prepare to decelerate first, instead of abrupt deceleration without any prediction. Those studies and new technologies pertinent to memory and prediction help to demonstrate the potential of AVs to dampen future traffic congestion.
String Stability and Safety
The literature has shown that most CF models are unable to replicate string stability consistent with empirical human driving data. These models are all deterministic, including stimulus-response models ( 130 ), optimal velocity models ( 133 ), IDM(Intelligent Driver Model) ( 134 ) and FVDM model ( 135 ), safe-distance model ( 136 ), desired-headway model ( 137 ), and psycho-physical models (138, 139).
Sun et al. ( 140 ) conducted a comprehensive review on the methods for stability analysis and their applicability to CF models. They classified the traditional CF models into three categories: basic CF models, time-delayed CF models, and cooperative CF models, based on the assumption of a connected environment ( 140 ). Common methods in the literature for string stability analysis have also been reviewed in detail. However, those methods applicable for traditional CF models do not apply to mMP because of its lack of explicit mathematical formulations.
More importantly, Sun et al. pointed out some inconsistency between the results using analytical method and numerical simulation, which may result from some of the major assumptions or relaxations: (i) since the methods for string analysis are mostly based on linear equations, the non-linear CF models are approximated, which causes certain numerical errors; (ii) the platoon is always assumed to remain in equilibrium before a small perturbation is added when analyzing string stability, which goes against real traffic conditions where different driving regimes need to be considered, and (iii) the methods of linear stability analysis are only suitable for small perturbations and the non-linear effects caused by large perturbations such as hard braking do not apply. Those studies indicate that the string stability of mMP will be hard to capture because of the non-linear neural network architectures. Reasonable methods should depend on numerical studies. Therefore, to analyze the string stability of mMP, one has to approximate those proprietary “black boxes” with traditional CF models or a separate neural network, and then conduct numerical simulations for further analysis.
Moreover, safety (collision prevention) is another significant issue in mMP (actually it could be weighted the most in AV control design). In congested traffic, with the randomness and disturbances induced by human drivers, abrupt braking could be inevitable to guarantee safety, which could consequently jeopardize the string stability performance. Under this circumstance, how mMP will trade off collision avoidance and the smoothness of traffic in congestion remains to be analyzed and researched. A feasible direction could be making collision avoidance a local-level safety objective while using string stability as a system-level safety objective, and mMP will iteratively optimize these two objectives. Specifically, the local safety objective monitors the immediate safety status of the ego vehicle, preventing collisions with adjacent vehicles during driving tasks. The system-level safety objective could be evaluated as a long-term target, whose focus will be the smoothness (string stability) of the traffic. The reason is that the smooth traffic can alleviate the fluctuations of acceleration and enforce vehicles to operate closer to the equilibrium, which can further prevent collisions in the surrounding traffic. Correspondingly, a specific boundary function needs to be scrutinizing the safety status during AV operation. Beyond the boundary, mMP can resort to optimizing the system-level performance to alleviate traffic oscillation, while within the boundary, the value of local safety will overwhelm the system-level string stability concern. Therefore, the smoothness of traffic can be an essential criteria of how AVs fit in the traffic in a long-term perspective, while collision prevention is the critical function for AVs to operate safely in a short time span. This is a significant issue to be carefully balanced such that AVs can scale up and benefit the traffic system.
Randomness
Laval et al. ( 141 ) showed that stochastic errors during the acceleration process are the cause of stop-and-go waves. They developed a parsimonious family of CF models that are able to reproduce most traffic instabilities, including traffic oscillations and capacity drop, based on stochastic processes to describe drivers’ desired accelerations. It was found that this component is crucial for capturing realistic formation and propagation of traffic oscillations. This is probably the simplest CF model that captures driver random errors while accelerating and produces realistic traffic oscillations. Follow-up models that incorporate human error have also been formulated within this family (142, 143) and also for other well-known CF models ( 144 ).
To the best of the authors’ knowledge, the stochastic property of mMP has not been well addressed or used for analyzing traffic congestion. It is not advisable, however, to add stochastic components to these methods because it will result in exacerbated traffic oscillations. On the contrary, one should try to minimize this error as much as possible, which should have a positive effect on congestion.
Connections Between CF Models and Neural Networks
While most mMP methods do not show a direct relationship with traditional CF models, it was revealed that a mathematical equivalence between mMP and CF models can be found under simple settings ( 145 ). A linear CF model will become interchangeable with a deep neural network given the same input and output. For equivalence in a real AV system, Xu et al. ( 52 ) showed that an mMP network can be replaced with a traditional CF model given speed and distance extracted from sensor data. It is argued here that mMP and CF models are mathematically equivalent if the mid-level methods generate position/distance-based learning affordances (features) as model input for mMP module. Since CF models adopt design variables of position and speed and output acceleration, the mMP will boil down to a similar problem which maps the position or speed of surrounding cars to ego-vehicle acceleration. But such equivalence does not apply when the output of mMP becomes a predicted trajectory within a few seconds.
The mathematical connection between mMP and the CF models should result from the approximation power of neural networks, which has been discussed rigorously in the literature. Kolmogorov (
146
) proved a general theorem stating that any real-valued continuous function
where
Discussion and Outlook
This survey serves as a preliminary study to investigate the impact of AVs on traffic congestion in the future. It found that mMP is rapidly developing based on the efforts of the leading technology companies like Tesla and Comma.ai. Although mMP has not yet been widely applied, most automakers have already equipped enough hardware (sensors) to their latest car models to make mMP possible in the short-term future. Through the review it was also found that the AV industry has been mostly focusing on the long tail problem caused by “corner errors” related to safety, while the impact of AVs on traffic efficiency is almost ignored. In detail, none of the existing public datasets provides sufficient data that can be applied to the training of a congestion-mitigation mMP, and the major learning approach for mMP adopted by the industry is still BC. Albeit some non-imitation methods such as RL are proposed in the literature, there has not been noticable success in training a congestion-mitigation or string-stable mMP for AVs in the existing literature, let alone its implementation in industry.
Research is needed to understand better the characteristics of mMP and their impact on traffic congestion. The authors suggest the following research directions.
Analyzing the Impact of AV by Approximation and Retrofitting
Since the current AV technologies are sealed as “black boxes,” the only way to understand their behavior and impact is to approximate and retrofit AVs using surrogate models. Noticing a certain level of equivalence between CF models and mMP, we can try to approximate the proprietary mMP by calibrating specific CF models. Similarly, in light of the universal approximation power of neural networks, it is also possible to find surrogate deep neural network models for currently unknown mMP models. Therefore, given a trained mMP, there are two different approaches to understanding its characteristics, either by calibrating a parameterized CF model or training a deep neural network as approximation. Both of the two methods will pave the way for further studies to analyze the impact of mMP on safety and string stability in traffic congestion.
Data Enrichment for Congestion-Oriented Research
Based on this investigation, there is insufficient data suitable for researching autonomous driving mMP in congested traffic. Most existing data are biased to emerging autonomous driving tasks such as object detection or safety issues in corner cases. Therefore, it is recommended that the industries and academic institutes should put more emphasis on the collection of data from AVs (not human-driven vehicles) in congestion, and potentially publish the data for further insights.
Incorporating Expert Knowledge from Traffic Domains
For future development of mMP it is advisable that planning agencies create incentives for the AV industry to put more emphasis on the impact of AVs on traffic congestion, rather than only focusing on the long tail problem of “corner errors.” Relevant expert knowledge from traffic domains is worth noting, including but not limited to the properties of string stability revealed by traditional CF studies, impact of memory and prediction, the stochastic accelerations, and the equivalence between CF models and neural networks.
Conclusion
The paper has mainly surveyed and discussed the mMP for AVs, while leaving some other important factors including connectivity and the cooperation between AV industry and transportation agencies. The authors believe the emerging technology of connectivity also provides a great opportunity to benefit the traffic, as more real-time data enable AVs to execute traffic-friendly control algorithms. Additionally, the cooperation between AV industries and transportation agencies is also essential for improving the performance of AVs in congested traffic, and providing incentives for the smooth transition from human-driven vehicles to AVs.
Footnotes
Appendix. Major AV Technology Suppliers and Customers
A detailed graph showing the relations of major suppliers and customers in AV technology is included (Figure 5). A full table is shared via the link: https://wwc20.github.io/AV-technique-suppliers/
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: J. Laval, H. Zhou, S. Peeta; data collection: H. Zhou, W. Wu, A. Zhou, Y. Wang, Z. Qing; analysis and interpretation of results: H. Zhou, A. Zhou, Y. Wang, Z. Qing, W. Wu; draft manuscript preparation: H. Zhou, A. Zhou, J. Laval, Y. Wang, Z. Qing, W. Wu. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by NSF CPS grant \#1932451 and \#1826162.
