Trajectory-Based Spatiotemporal Multi-Task Multi-Graph Network for Traffic State Prediction

Abstract

As a critical application in intelligent transportation systems, traffic state prediction still faces various challenges, such as unsatisfactory capability of utilizing multi-source data and modeling spatiotemporal network relevancies. Therefore, we propose a trajectory-based multi-task multi-graph convolutional network (Tr-MTMGN), a novel spatiotemporal deep learning framework for traffic state prediction on a citywide scale. This method firstly mines the underlying information from vehicle trajectories and designs a multi-graph convolution block to investigate spatial correlations. Sequentially, the multi-head self-attention layer is integrated into the multi-task learning framework to capture the temporal dependencies of the traffic state. The proposed model was evaluated on field data collected in Zhangzhou, China, and demonstrated superior performance when compared with existing state-of-the-art baselines.

Keywords

data and data science artificial intelligence data analytics deep learning neural networks big data

Traffic state prediction is a process of mining traffic patterns and predicting traffic trends. As an important component of the intelligent transportation system, accurate results of traffic state prediction can efficiently ease congestion, provide decision-making support for traffic managers, and aid drivers to make an appropriate travel planning.

Predicting the future traffic state is a challenging task owing to the complex spatiotemporal correlation of traffic state data. A great deal of research adopts deep learning methods to solve this task. The latest works often apply the graph convolutional network (GCN) model to heterogeneous spatial correlations and exploit sequence modeling for capturing non-linear temporal dependence ( 1 – 3 ). Although significant progress has been made toward improving the predictive accuracy, the following issues have not been fully resolved.

First, most existing methods consider spatial distance correlation only and ignore trajectories when modeling spatial features. Trajectories can instructively provide knowledge about the diffusion of traffic flow and behavioral driving choices. An example is illustrated in Figure 1. Detectors C and E are more relevant to Detector A compared with Detector D because of the large volume of traffic passing through them (the arrows indicate the direction of traffic flow, and thicker lines indicate greater flow). Moreover, previous methods generally predicted only one type of traffic state parameter, ignoring the interaction between different traffic state parameters.

Figure 1.

An example of different spatial correlations among loop detectors.

To address the above shortfalls, this study proposes a trajectory-based multi-task multi-graph convolutional network (Tr-MTMGN) to predict traffic speed and volume simultaneously for arterial networks in both short-term and long-term. The main contributions of this paper are summarized as follows:

Using the skip-gram model, a trajectory transition relevance matrix is constructed by investigating the transition patterns of the trajectory and traffic flow diffusion among loop detectors.

A multi-graph convolution block is designed to fuse various relevance graphs and model the spatial correlations of the traffic state.

A multi-head self-attention layer is integrated into the multi-task learning framework to capture the temporal dependencies of the traffic state.

Accordingly, extensive experiments were conducted to evaluate the performance of the proposed model on a field-collected dataset in Zhangzhou, China, and a significant improvement was found compared with other state-of-the-art baselines.

Literature Review

Traffic state prediction has been developed for many years domestically and abroad and can be divided into model-driven and data-driven methods.

The model-driven approach begins with the relationship between traffic volume, speed, and density, and includes the queuing-theory model, the cell transmission model, and the micro basic graph model ( 4 – 6 ). However, these traffic state prediction models must be established based on detailed prior knowledge. Otherwise, it is difficult to describe the complex traffic state data features.

The data-driven method predicts and evaluates the traffic state based on the statistical regularity of data, and is divided into parametric and non-parametric models. Parametric models include linear regression models and Kalman filtering models ( 7 – 10 ). These models have a simple algorithm and convenient calculation because they presuppose a regression function the parameters of which are determined using the original data. However, owing to their static dependence on a singular assumption, it is difficult for these models to fit the dynamic non-linear trend of the traffic state. Non-parametric models overcome this limitation effectively and learn the statistical regularity by adaptively fitting a function using sufficient historical data. Examples of non-parametric models include K-nearest neighbor models, support vector regression models, fuzzy logic models, and Bayesian network models ( 11 – 16 ). Unlike the traditional parametric model, these models do not rely on a static hypothesis and can thus capture the dynamic non-linear characteristics of traffic data. However, these methods are only applicable for predicting the traffic state on a single road or small-scale experimental area, and it is therefore difficult for them to perform effectively around a complex urban road network.

In recent years, deep learning models have attracted considerable attention because of their good predictive performance and strong data fitting ability. Research on traffic state prediction methods based on deep learning has mainly focused on the design of deep learning models for the extraction of spatiotemporal characteristics from traffic data. Early traffic prediction methods based on deep learning mostly combined convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to model the spatial and temporal correlations of the traffic state ( 17 – 19 ). These studies usually divide the experimental area into grids, which cannot reflect the connectivity of road sections and the topology of the road network.

Therefore, some researchers proposed graph neural networks (GNNs) as alternatives to CNNs to extract the features of non-Euclidean data. Yu et al. proposed a spatiotemporal graph convolution network (STGCN) to extract spatiotemporal features from traffic data by stacking a spatiotemporal convolution block (ST-Conv block) composed of a GCN and a gated CNN ( 20 ). Guo et al. proposed an attention-based spatial-temporal graph convolutional network (ASTGCN) that integrates the attention mechanism ( 21 ). Geng et al. predicted the demand for ride-hailing in urban areas and proposed a spatiotemporal multi-graph convolutional network (ST-MGCN) ( 22 ). Lv et al. proposed a deep learning framework for traffic flow prediction, known as temporal multi-graph convolutional network (T-MGCN), by identifying several semantic correlations between roads and coding them into multiple graphs ( 23 ). Chen et al. proposed multi-range attentive bicomponent GCN (MRA-BGCN), which constructs a node graph and an edge graph based on the distance between nodes and the interaction patterns of different edges, respectively ( 24 ). Lu et al. proposed a spatiotemporal adaptive gated graph convolution neural network (STAGCN) to selectively extract spatial correlations in multilayer GCNs and correct the information bias caused by predefined spatial correlation ( 2 ). Zhang et al. proposed a deep-learning model: dynamic spatial-temporal convolutional network (DSTCN), which can generate dynamic graphs based on data to represent dynamic spatial relationships between road segments ( 25 ). Li et al. designed the fusion graph to extract various spatial correlations between different regions by fusing spatial graph, semantic graph, and spatial-semantic graph ( 26 ). Li et al. designed the dynamic adjacency matrix generated from a hyper-network to describe the dynamic characteristics of road networks more effectively ( 27 ).

The above methods attempt to accurately describe the spatial dependence of the traffic state. However, only a few studies have considered the trajectory information of vehicles. Trajectories imply the patterns of traffic flow transition and diffusion, which are of great reference value for the prediction of traffic state.

Methods

This section elaborates on the proposed Tr-MTMGN model. The architecture of Tr-MTMGN comprises four modules: an input layer, a multi-graph convolution layer, a multi-task multi-head self-attention layer, and an output layer, as shown in Figure 2. The input layer aggregates the traffic speed and volume data of each loop detector. The multi-graph convolution layer captures multiple spatial dependencies based on historical trajectories. The multi-task multi-head self-attention layer, based on a soft-parameter sharing strategy, extracts temporal patterns of speed and volume. The output layer produces multiple time steps in the prediction results for speed and volume. These modules are described in detail below.

Figure 2.

The architecture of trajectory-based multi-task multi-graph convolutional network (Tr-MTMGN).

Preliminaries

Definition 1 (loop detectors network graph): In this paper, we define the loop detector network on a graph $G = (V, E, A)$ . Many roads have two-way lanes and the traffic state from two opposite lanes is often different, therefore $V = {v_{i}}_{i = 1, 2, \dots, N}$ is a set of nodes where each node $v_{i}$ represents one direction of loop detectors. $E = {e_{ij}}_{i, j = 1, 2, \dots, N}$ is a set of edges and each edge $e_{ij}$ denotes the correlation strength between $v_{i}$ and $v_{j}$ . $A ϵ R^{N \times N}$ is the weighted adjacency matrix of the road network graph, where larger $A_{v_{i,} v_{j}}$ means that the two nodes have a higher correlation. We construct three types of graph: the weighted topological graph $G_{top} = (V, E_{top}, A_{top})$ , trajectory transition graph $G_{tra} = (V, E_{tra}, A_{tra})$ , and pattern similarity graph $G_{sim} = (V, E_{sim}, A_{sim})$ . The details of the multi-graph construction are elaborated in the Methodology section.

Definition 2 (traffic state): The traffic state is measured by the average speed of one timestep (5 min) of the loop detector (hereinafter referred to as the speed) and the number of vehicles passing through the loop detector (hereinafter referred to as the volume). The speed and volume are defined as $X_{t}^{s} ϵ R^{N}$ and $X_{t}^{v} ϵ R^{N}$ , respectively, at timestep $t$ , where $N$ is the number of loop detectors.

Problem (traffic state prediction): The traffic state prediction problem can be regarded as finding a function f (·) to map the $P$ historical traffic state to the future $Q$ traffic state at time t, on the premise of road graphs $G_{top}, G_{tra}, G_{sim}$ . The process of urban traffic state prediction is formulated as Equation 1.

[X_{t - P + 1}^{s}, \dots, X_{t}^{s}; X_{t - P + 1}^{v}, \dots, X_{t}^{v}; G_{top}, G_{tra}, G_{sim}] \to^{f (\cdot)} [X_{t + 1}^{s}, \dots, X_{t + Q}^{s}; X_{t + 1}^{v}, \dots, X_{t + Q}^{v}]

(1)

Construction of Multiple Loop Detector Graphs

We modeled three types of correlation among loop detectors graphically as follows: 1) weight topological graph $G_{top} = (V, E_{top}, A_{top})$ , which encodes the vehicles’ average driving distance between nodes; 2) trajectory transition graph $G_{tra} = (V, E_{tra}, A_{tra})$ , which encodes the vehicles’ transition pattern between nodes; and 3) pattern similarity graph $G_{sim} = (V, E_{sim}, A_{sim})$ , which encodes the similarity of traffic state between nodes. The process of constructing multiple graphs is as follows:

Trajectory transition graph $G_{tra}$

In an urban road network, the traffic state of a road segment has a strong correlation with the upstream and downstream road segments. For example, the congestion of a road section may be caused by its upstream section, where numerous vehicles moving downstream could appear. Trajectories provide information about vehicles’ origin and destination (O-D), as well as drivers’ choices of routes, which implies the dependence on upstream and downstream flows. Thus, we plan to study trajectory information to infer the laws of traffic flow transition and diffusion.

Node2vec is a widely used graph embedding method. It is based on a random walk strategy and obtains a series of node sequences by adjusting the parameters of breadth-first search and depth-first search ( 28 ). The skip-gram model or Continuous Bag-of-Words Model (CBOW) model is then employed to obtain the embedding vector of each node. Moreover, some traffic state prediction studies adopt node2vec in position embedding ( 29 ). However, sequences based on random walk cannot reflect real traffic situations. Therefore, inspired by node2vec, we used the skip-gram model to capture the patterns of trajectory transition and traffic flow diffusion based on real trajectories instead of random walks.

First, we used the Hidden Markov Model (HHM) map-matching method to match the GPS trajectories to the road section and generate the vehicle paths ( 30 ). Next, the paths were converted into sequences of loop detectors according to the loop-link correspondence table. Analogous to the text corpus data in natural language processing, a sequence of loop detectors can be regarded as a piece of text and each loop detector as a word that makes up the text. We then input these loop-detector sequences into the skip-gram model and train it ( 31 ). The sliding window size in the hyperparameters was used to control the order of the neighbors. We exploited a negative sampling strategy to reduce the computational complexity ( 32 ). Finally, for each loop detector, we obtained the corresponding embedding vector that encodes the trajectory transition information from the trained hidden layer parameters. Figure 3 demonstrates the process of generating the embedding vectors of the trajectory transition features.

Figure 3.

The process of capturing the patterns of trajectory transition.

In this study, cosine similarity was adopted to construct the trajectory transition adjacency matrix. The greater the similarity of the embedding vectors of the two nodes, the greater the number of vehicles on the road between them, which indicates that the two nodes are upstream and downstream, and the traffic states of the two nodes have a strong correlation. The process of constructing the trajectory transition adjacency matrix can be formulated as Equation 2:

τ_{ij} = \frac{e_{i} \cdot e_{j}}{‖ e_{i} ‖ ‖ e_{j} ‖} = \frac{\sum_{m}^{M} e_{i}^{m} \times e_{j}^{m}}{\sqrt{\sum_{m}^{M} {(e_{i}^{m})}^{2}} \times \sqrt{\sum_{m}^{M} {(e_{j}^{m})}^{2}}}

(2)

where

$e_{i}$ = the embedding vector of loop detector $i$ that encodes the trajectory transition information,

$e_{j}$ = the embedding vector of loop detector $j$ that encodes the trajectory transition information,

$e_{i}^{m}$ = the $m$ th element of the embedding vector, and

$τ_{ij}$ = an element of the trajectory transition adjacency matrix.

2. Weighted topological graph $G_{top}$

We believe that the closer the loop detectors are to each other, the greater the correlation between them. The weighted topological structure of a road network can be represented as a directed graph $G_{top} = (V, E_{top}, A_{top})$ . Previous studies generally considered the Euclidean distance in calculating the correlation between nodes, but ignored the topological property of the road network. This study uses the average length of all vehicle paths between nodes to calculate the degree of correlation. The weight $ξ_{ij}$ of nodes $v_{i}$ and $v_{j}$ is denoted as the exponential length decay, which is defined by Equation 3.

ξ_{ij} = {\begin{matrix} \exp (- \frac{l_{ij}^{2}}{σ^{2}}), l_{ij} \leq 1 km \\ 0, otherwise \end{matrix}

(3)

3. Pattern similarity graph $G_{sim}$

The regularity of the traffic flow diffusion and the transition between loop detectors vary dynamically according to the real-time traffic state. This study extracts the real-time non-linear change pattern of the traffic state to study the dynamic change regularity of the correlation between loop detectors. We designed a different approach to model the nonlinear patterns of traffic and account for dynamic correlations among loop detectors, which consists of two steps.

First, the traffic state tensor $X ϵ R^{N \times (2 \times P)}$ of the real-time input model is embedded into a high-dimensional vector through two fully connected layers. This process is shown in Equation 4:

e_{i}^{sim} = F C_{2} (F C_{1} (x_{i}))

(4)

where

$x_{i}$ = the real-time traffic state feature tensor of node $i$ , and

$F C_{1}$ and $F C_{2}$ = two different fully connected layers.

Second, the obtained high-dimensional vectors are used to calculate the pairwise dot-product similarity between each node. We used Softmax to normalize the similarity and obtain the elements of the pattern similarity adjacency matrix $A_{sim} \in R^{N \times N}$ . The calculation process can be formulated as Equation 5.

μ_{ij} = Softmax ({(e_{i}^{sim})}^{T} \cdot e_{j}^{sim}), i, j \in N

(5)

where

$μ_{ij}$ = an element of the state similarity adjacency matrix of the pattern similarity graph.

It is noteworthy that the parameters of two fully connected layers are co-trained with Tr-MTMGN. The pattern similarity graph is a dynamic graph that adaptively changes with the input data. In this study, it will be used to adjust the weight of the adjacency matrix of trajectory transition graph.

Trajectory-Based Multi-Task Multi-Graph Convolutional Network (Tr-MTMGN)

Module a: Input Layer

The goal of the input layer is to aggregate the speed and volume data of each loop detector. We aggregate the speed tensor $X^{s} ϵ R^{N \times P}$ and volume tensor $X^{v} ϵ R^{N \times P}$ of historical $P$ time steps into traffic state feature tensor $X ϵ R^{N \times (2 \times P)}$ through concatenation. Then, $X$ is transformed to $H^{0} ϵ R^{N \times d_{model}}$ using a fully connected layer.

Module b: Multi-Graph Convolutional Layer

GCN is an efficient operation to extract node features through the graph’s structural information. Kipf and Welling proposed a GCN that simplifies ChebNet by a first-order approximation (Equation 6) as follows ( 33 , 34 ):

Y = \tilde{A} XW

(6)

where

$\tilde{A}$ = the normalized adjacency matrix with self-loop,

$X ϵ R^{N \times D_{in}}$ = the input graph signals for $N$ nodes with $D_{in}$ channels,

$Y ϵ R^{N \times D_{out}}$ = the output, and

$W ϵ R^{D_{in} \times D_{out}}$ is the learnable parameter matrix.

To fuse the spatial correlations of the weighted topological graph $G_{top}$ , trajectory transition graph $G_{tra}$ , and pattern similarity graph $G_{sim}$ , a multi-graph convolution block based on GCN is designed in this study.

It is difficult to use the trajectory transition graph whose adjacency matrix is fixed to describe the dynamic characteristics of the regularity of the traffic state diffusion and transition. To adjust the adjacency matrix of the trajectory transition graph dynamically, we fuse the pattern similarity graph updated in real time to generate compound adjacency matrix $A_{t & s}$ , which can be formulated as Equation 7.

A_{t & s} = A_{tra} ⊙ A_{sim}

(7)

where

$⊙ = Hadamard$ product.

The weighted topology graph adjacency matrix and compound adjacency matrix are then input into different GCNs networks. The outputs of the two GCNs networks are aggregated as the final result of the multi-graph convolution block. The aggregation function includes sum, average, max-pooling, and concatenation. We select concatenation because it can prevent information from disappearing ( 2 ). The structure of $l$ th multi-graph convolution block is shown in Figure 4. The calculation formula is expressed as Equation 8:

H^{l} = concat (H_{top}^{l}, H_{t & s}^{l}) = concat (ReLU (A_{top} H^{l - 1} W_{1}), ReLU (A_{t & s} H^{l - 1} W_{2}))

(8)

where

$H^{l} ϵ R^{N \times d_{model}}$ the output of $l$ th multi-graph convolution block,

$H^{l - 1} ϵ R^{N \times d_{model}}$ = the output of $(l - 1) th$ multi-graph convolution block,

$W_{1}$ and $W_{2}$ = the graph convolution parameter matrix, and

$concat$ = the concatenation operation.

Figure 4.

The structure of multi-graph convolution block.

After fusion, we obtain $H^{l} ϵ R^{N \times (2 d_{model})}$ , which is the output of muti-graph convolution block. The receptive field can be expanded by stacking convolution modules, where the output of the last multi-graph convolution block is defined as $H_{spatial} ϵ R^{N \times (2 d_{model})}$ .

Module c: Multi-Task Multi-Head Self-Attention Layer

Given the different variation patterns of speed and volume in the temporal dimension, we introduce a multi-task learning framework based on the soft-parameters sharing approach to capture the temporal dependencies of speed and volume. Although long short-term memory (LSTM) and gated recurrent unit (GRU) have been widely used in time-series prediction tasks, RNN-based models still have several limitations, such as time-consuming training and unstable gradients. Meanwhile, the attention mechanism has achieved outstanding results in time-series prediction. Therefore, we utilize a multi-head self-attention mechanism to extract the features in the temporal dimension, as shown in Figure 5 ( 35 ).

Figure 5.

The structure of multi-head self-attention mechanism.

First, we feed the output of the multi-graph convolution layer, $H_{spatial}$ , into a fully connected layer to change the shape into $R^{P \times N}$ , making the temporal dimension equal to $P$ . Consequently, $H_{spatial}$ is simplified as $X \in R^{P \times N}$ . We treat $X$ as a sequence: $X = (x_{1}, x_{2}, \dots, x_{r}, \dots, x_{p})$ .

Second, for head $i$ , we multiply $X$ by $W_{i}^{Q}$ , $W_{i}^{K}$ , and $W_{i}^{V}$ to map $X$ into three subspaces: query subspace $Q_{i} = (q_{1, i}, q_{2, i}, \dots, q_{r, i}, \dots, q_{p, i}) \in R^{T \times N}$ , key subspace $K_{i} = (k_{1, i}, k_{2, i}, \dots, k_{r, i}, \dots, k_{p, i}) \in R^{T \times N}$ , and value subspace $V_{i} = (v_{1, i}, v_{2, i}, \dots, v_{r, i}, \dots, v_{p, i}) \in R^{T \times N}$ . The map is computed using Equation 9:

Q_{i} = {XW}_{i}^{Q}, \cdot \cdot K_{i} = {XW}_{i}^{K}, \cdot \cdot V_{i} = {XW}_{i}^{V}

(9)

Third, we use scaled dot-product attention to calculate the attention score ${\hat{ω}}_{i}$ in head $i$ via Equation 10:

\cdot {\hat{ω}}_{i} = softmax (ω_{i}) = softmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d}}) .

(10)

Fourth, we derive $B_{i} = (b_{1}, b_{2}, \dots, b_{r}, \dots, b_{p}) \in R^{T \times N}$ using the following formula, which is the output of head $i$ (Equation 11):

B_{i} = V_{i} {\hat{ω}}_{i}^{T} .

(11)

Finally, we concatenate all outputs of each head and project again to obtain the final values, as formulated in Equation 12:

H_{temporal} = Concat (B_{1}, . . ., B_{h}) W^{m}

(12)

Module d: Output Layer

When we obtain the high-dimensional spatiotemporal features of speed and volume, we make a multi-step prediction from two distinct fully connected layers that have two hidden layers for two tasks. We set the output-channel number of the last fully connected layer as the length $Q$ of the prediction step to predict the speed or volume of future the $Q$ time step.

Loss Function of Multi-Task Learning Mode

We aim to minimize the prediction error with regard to the speed and volume. In the multi-task framework (multi-head self-attention layer), we utilize the L2 norm of two network parameters as a constraint to encourage parameter similarity. The loss function is defined by Equation 13:

Loss = \frac{1}{2} (ms e_{s} + ms e_{v}) + α θ_{s} - {θ_{v}}_{F}^{2},

(13)

where

$ms e_{s}$ = the loss of the speed prediction task,

$ms e_{v}$ = the loss of the volume prediction task,

$‖ \cdot ‖_{F}^{2}$ = the L2 norm,

$α$ = the variable that adjusts the similarity of the two sets of parameters (here we set $α$ as 0.01),

$θ_{s}$ = the parameter of the multi-task multi-head self-attention layer for speed, and

$θ_{v}$ = the parameter of the multi-task multi-head self-attention layer for volume.

Datasets

In this section, we evaluate the performance of the proposed Tr-MTMGN, on the comprehensive urban traffic dataset of Zhangzhou, China. The dataset includes traffic state data from loop detectors and trajectory data from floating vehicles. The road map and the distribution of loop detectors are presented in Figure 6.

Figure 6.

The distribution of selected loop detectors on Zhangzhou dataset.

Traffic state data contain the average speed and volume per minute, which are collected by loop detectors installed in the middle of roads. We selected 67 loop detectors with a missing-data rate below 20% and used linear interpolation to fill the missing values. The number of nodes in this study was 134 because each detector could extract data on two-way lanes. Traffic state data were collected from May 1–31, 2017. We aggregated the data every 5 min; therefore, each node contained 288 data points per day.

Floating vehicles’ GPS data collected by floating vehicles (i.e., taxis and buses) in the urban area were from April 21–30, with approximately 15 s intervals between two consecutive GPS samples. We cleansed the vehicle trajectory as follows: 1) duplicate records were eliminated; 2) the trajectory was split at this position if the GPS reading was off for over 10 min (e.g., if the GPS device was turned off or was malfunctioning); 3) the trajectory was split at this position if the driver stayed on the same road segment for over 10 min (e.g., if the taxi arrived at the end of the last trip); 4) the trajectory was split at this position if no path existed between two consecutive GPS points (e.g., if the GPS point was outside the digital road network we had drawn); and 5) GPS points with extreme speed were removed (i.e., if the speed between two consecutive GPS points exceeded 80 km/h).

Experimental Results And Comparison

Experiment Setting

Z-score normalization was applied for data preprocessing. Of the traffic state data, 70% are used for training, 20% for testing, and the remaining 10% for validation ( 29 ). We implemented our model using the PyTorch deep learning framework. During the training phase, the batch size was set to 32, the epoch was set to 100, and the learning rate was set to 0.001. An ADAM optimizer was adopted to train the model. The state data from the previous hour ( $P = 12$ ) were utilized to predict the state for the next hour ( $Q = 12$ ). The size of the sliding window was 2, whereas that of $d_{model}$ is 32. The number of multi-graph convolutional blocks was set to 2. The number of heads in multi-task multi-head self-attention layer was 2. The mean absolute error (MAE) and root mean square error (RMSE) were adopted for the evaluation:

MAE:

MAE (y, \hat{y}) = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} | .

(14)

RMSE:

RMSE (y, \hat{y}) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2} .}

(15)

Comparison with Baseline

Several studies have proven that deep learning approaches outperform traditional time series methods and machine learning models. Thus, we compared Tr-MTMGN with the following models, (the configuration of the baseline models is shown in Table 1):

FNN: feed-forward neural network with two hidden layers

LSTM: long short-term memory neural network with one layer

CNN: convolutional neural network with two layers

GCN: graph convolutional neural network with two layers

CNN-LSTM: a combined model composed of CNN and LSTM

GCN-LSTM: a combined model composed of GCN and LSTM

STAGCN: spatiotemporal adaptive gated graph convolution network ( 2 )

We independently repeated the experiments three times and reported the average results. Table 2 shows the comparison of different approaches for 10, 30, and 60 min speed and volume prediction. The results reveal the following:

1) The spatiotemporal combined models (CNN-LSTM, GCN-LSTM, STAGCN, and Tr-MTMGN) are better than the single models (LSTM, CNN, and GCN) because the combined models can capture features of the traffic state from both the spatial and temporal dimensions.

2) Tr-MTMGN outperforms all baselines in both tasks, meaning the proposed multi-graph convolution layer based on trajectories and multi-task multi-head self-attention layer in the Tr-MTMGN model can accurately mine the complex spatiotemporal features of the traffic state.

3) The accuracy of speed prediction is higher than that of volume prediction because the trend of speed change is smoother on urban roads. Traffic volume is influenced by intersection signal timing, traffic congestion, and other factors, resulting in greater fluctuations in data, which makes it more difficult to predict.

Table 1.

The Configuration of the Deep Learning Baselines on Zhangzhou Datasets

Model	Configuration
FNN	class FNN (nn.Module): def __init__(self,linear1_in_dim=12,linear1_out_dim=64,linear2_in_dim=64,linear2_out_dim=1):
LSTM	class LSTM (nn.Module): def__init__(self,in_dim=134,hidden_size=64,num_layer=1):
CNN	class CNN (nn.Module): def__init__(self,Conv2d1_in_ channels=1,Conv2d1_out_ channels=64, Conv2d2_in_ channels= 64,Conv2d2_out_ channels=64,kernel_size=3):
CNN-LSTM	class CNN-LSTM (nn.Module): def__init__(self,Conv2d1_in_ channels=1,Conv2d1_out_ channels=64, Conv2d2_in_ channels= 64,Conv2d2_out_ channels=1,kernel_size=3, LSTM_in_dim=134,LSTM_hidden_size=64,_LSTM_num_layer=1):
GCN-LSTM	class GCN-LSTM (nn.Module): def__init__(self,node_num=134,graph_dim=64,GCN_layer=2 LSTM_in_dim=134,LSTM_hidden_size=64,_LSTM_num_layer=1):
STAGCN	class STAGCN (nn.Module): def__init__(self,node_num=134,seq_len=12,pred_len=12,graph_dim=16,tcn_dim=10,atten_head=4):

Note: CNN = convolutional neural network with two layers; CNN-LSTM = a combined model composed of CNN and long short-term memory neural network with one layer (LSTM); FNN = feed-forward neural network with two hidden layers; GCN = graph convolutional neural network with two layers; GCN-LSTM = a combined model composed of GCN and long short-term memory neural network with one layer (LSTM); LSTM = long short-term memory neural network with one layer; STAGCN = spatiotemporal adaptive gated graph convolution network.

Table 2.

Prediction Performance of Trajectory-Based Multi-Task Multi-Graph Convolutional Network (Tr-MTMGN) and Other Baselines

Model	Mean absolute error (10/30/60 min)		Root mean square error (10/30/60 min)
Model	Speed	Volume	Speed	Volume
FNN	3.912/3.936/4.121	6.523/6.736/7.563	7.857/8.326/8.833	10.803/11.369/11.826
LSTM	3.897/3.917/3.964	6.478/6.675/7.455	8.075/8.465/8.712	10.922/11.252/11.739
CNN	3.812/3.886/3.973	6.454/6.691/7.461	7.457/7.674/8.004	9.981/10.368/10.862
GCN	3.793/3.845/3.909	6.428/6.683/7.314	7.012/7.358/7.713	9.872/10.026/10.778
CNN-LSTM	3.762/3.831/3.884	6.373/6.721/7.166	6.982/7.203/7.655	9.375/9.946/10.684
GCN-LSTM	3.757/3.821/3.876	6.368/6.701/7.158	6.951/7.147/7.608	9.287/9.741/10.602
STAGCN	3.754/3.792/3.845	6.361/6.673/7.149	6.944/7.131/7.519	9.269/9.724/10.589
Tr-MTMGN	3.661/3.678/3.795	6.059/6.303/6.926	5.064/5.113/5.355	8.962/9.160/10.405

Effect of Components

To further investigate the effect of spatial and temporal dependence modeling, we evaluated the variants of Tr-MTMGN by removing or replacing the components from the model.

First, to investigate the effect of the multi-graph structure, we evaluated Tr-MTMGN with different adjacency matrix settings, described as follows:

Topology-GN: The graph convolution operation is performed using only the topology graph.

Transition-GN: The graph convolution operation is performed using only the interaction graph.

Similarity-GN: The graph convolution operation is performed using only the similarity graph.

The comparative MAE of the above variants above is shown in Figure 7. The following phenomena can be observed:

1) Tr-MTMGN outperforms all variants in the two tasks. It means that the weighted topological graph, trajectory transition graph, and state similarity graph constructed in our study can fully mine the spatial correlations of the traffic state from multiple perspectives.

2) Topology-GN is performs poorly in both tasks, indicating that the spatial correlation of the traffic state is not strongly correlated with the topological distance of the road network.

3) Transition-GN has a greater MAE compared with that of Similarity-GN for speed prediction tasks, whereas the volume prediction results are the opposite. This phenomenon indicates that the trajectory transition graph obtained by the floating vehicle trajectories implies the traffic flow transfer law in the urban road network, such that the trajectory transition graph can play a more important role in the volume prediction task. The state similarity graph depicts the similarity of the changing trend of traffic states between different locations. Thus, it is more advantageous for capturing the correlation of traffic speed.

We then compared Tr-MTMGN with the following variants to demonstrate the effect of the multi-head self-attention mechanism:

MGN-LSTM: The self-attention mechanism is replaced with LSTM.

MGN-GRU: The self-attention mechanism is replaced with GRU.

The results of these models are shown in Figure 8. We can see that:

1) Tr-MTMGN is still the best at speed and volume prediction tasks, which shows that it can effectively capture the temporal correlation, since the multi-head self-attention mechanism introduced in this study can adaptively assign different weights to the traffic state at different time steps in the input data.

2) MGN-GRU is only slightly better at long-term prediction of speed and is considerably better than MGN-LSTM at volume prediction. This may be because GRU has fewer parameters compared with LSTM and is, therefore, easier to train to obtain better results.

Figure 7.

The mean absolute error (MAE) of speed prediction (left) and volume prediction (right) under different configurations of adjacency matrix.

Figure 8.

The mean absolute error (MAE) of speed prediction (left) and volume prediction (right) under different configurations of adjacency under different temporal feature capturing models.

Effect of Multi-Task Learning

To test whether multi-task learning can improve the performance of each task, we compared Tr-MTMGN with the following variants:

Tr-MGN: It is the single-task version of Tr-MTMGN. Tr-MGN consists of a multi-graph convolutional network and a multi-head self-attention mechanism, which is similar to the structure of Tr-MTMGN.

Tr-MTMGN(HS): It uses the hard parameter sharing mechanism in both the spatial and the temporal feature capturing layer, as shown in Figure 9a.

Tr-MTMGN(SS): It uses the soft parameter sharing mechanism in both the spatial and the temporal feature capturing layer, as shown in Figure 9b.

The results of the models above are displayed in Figure 10. Accordingly, the following conclusions can be drawn:

1) The single-task variant is the weakest at predicting both speed and volume because the model cannot correlate the latent correlation among different types of data.

2) MTMGN(SS) outperforms MTMGN(HS) because the former distinguishes the temporal and spatial features between speed and volume using a soft-parameter sharing strategy. However, an increase in the number of parameters can lead to an increase in training difficulty.

3) Tr-MTMGN achieves the best results for both tasks because the simultaneous inputs of speed and volume data are correlated, thereby improving the performance of the model. Moreover, we used only the soft-parameter sharing strategy in the multi-head self-attention layer. Further, Tr-MTMGN is better than MTMGN(HS) in that it captures the temporal features of speed and volume and reduces the number of parameters compared with MTMGN(SS). The results prove the effectiveness and innovation of the multi-task learning framework proposed in this study.

We compare the computation time of Tr-MGN, Tr-MTMGN(HS), Tr-MTMGN(SS), and Tr-MTMGN. The results are shown in Table 3. We can observe that:

1) Although the computation time of Tr-MGN is the shortest, it has the highest error of all methods.

2) The computation time of Tr-MTMGN(SS) is longest as the model possesses lots of parameters and needs more computational costs.

3) The proposed Tr-MTMGN demonstrate a moderate time consumption, inferior to Tr-MTMGN(HS) and slightly higher than Tr-MTMGN(SS). Whereas, the model has the optimal prediction accuracy as shown in Figure 10. To obtain a better prediction accuracy, a slight sacrifice of computing time is acceptable.

Figure 9.

The multi-task learning variants based on different parameter sharing mechanisms: a version of Tr-MTMGN using the hard parameter sharing mechanism in both the spatial and the temporal feature capturing layer (Tr-MTMGN[HS]) (left) and a version of Tr-MTMGN using the soft parameter sharing mechanism in both the spatial and the temporal feature capturing layer (Tr-MTMGN[SS]) (right).

Figure 10.

The mean absolute error (MAE) of speed prediction (left) and volume prediction (right) under different multi-task learning variants.

Table 3.

The Computation Time of Different Multi-Task Learning Variants

Model	Computation time
Model	Training (s/epoch)	Inference (s)
Tr-MGN	119.8	17.5
Tr-MTMGN (HS)	134.7	24.7
Tr-MTMGN (SS)	143.7	29.6
Tr-MTMGN	137.6	26.0

Note: Tr-MGN = the single-task version of Tr-MTMGN; Tr-MTMGN = trajectory-based multi-task multi-graph convolutional network; Tr-MTMGN(HS) = a version of Tr-MTMGN using the hard parameter sharing mechanism in both the spatial and the temporal feature capturing layer; Tr-MTMGN(SS) = a version of Tr-MTMGN using the soft parameter sharing mechanism in both the spatial and the temporal feature capturing layer.

Result Visualization

We attempted to visualize the results to intuitively illustrate the performance of Tr-MTMGN. Figure 11 shows a heat map of the 10 min prediction and the ground truth of all the nodes on May 25, 2017.

Figure 11.

The heat map for next 10 min prediction and ground truth.

To further illustrate the performance of our model, we randomly selected the No. 95 loop detector and plotted the multi-step predictions and ground truth, as shown in Figure 12. As expected, our model accurately reflected the spatiotemporal distribution of future traffic states.

Figure 12.

The visualization of traffic state prediction results of No. 95 node.

This paper proposes a novel spatiotemporal deep learning framework, Tr-MTMGN, for traffic state prediction on a citywide scale. Specifically, this approach deeply mines trajectory information when constructing multiple spatial graphs and designs a multi-graph convolution block using the GCN. A multi-head self-attention layer then builds on the multi-task learning framework based on a soft-parameter sharing strategy to capture the temporal dependencies of speed and volume. Experiments on a real traffic dataset in Zhangzhou, China, verified that our proposed framework outperforms existing state-of-the-art solutions. However, there are two existing limitations in our model. Firstly, although our model can be expended to accommodate different scenarios, it still needs to retrain a new set of parameters. Secondly, our model has achieved a commendable result, but the parameter sharing mechanisms and the loop detectors graph proposed in this study can be promoted more efficiently. In the future, we would design more intelligent parameter sharing mechanisms and optimize loss functions for balancing the predicational accuracy and the computational cost. Furthermore, exploiting the underlying spatial-temporal relevancies from real-time GPS data or high-resolution connected vehicle trajectory data might assist in capturing the spatiotemporal features of the traffic state more accurately and improve prediction precision.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: J. Fang, M. Xu, W. Chen; data collection: J. Fang, W. Chen, M. Xu; analysis and interpretation of results: J. Fang, W. Chen, M. Xu; draft manuscript preparation: J. Fang, W. Chen, M. Xu, Y. Liu, T. Bi. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant entitled “Connected vehicle big data driven expressway multi-objective coordinated control fusing deep learning and traffic flow model,” award number 71901070).

ORCID iDs

Jie Fang

Mengyun Xu

Yuxuan Liu

Ting Bi

References

Park

Lee

Bahng

Tae

Jin

Kim

Choo

ST-GRAT: A Novel Spatio-Temporal Graph Attention Networks for Accurately Forecasting Dynamically Changing Road Speed. Proc., 29th ACM International Conference on Information & Knowledge Management, Association for Computing Machinery, New York, NY, 2020, pp. 1215–1224.

Gan

Jin

Zhang

Spatiotemporal Adaptive Gated Graph Convolution Network for Urban Traffic Flow Forecasting. Proc., 29th ACM International Conference on Information & Knowledge Management, Association for Computing Machinery, New York, NY, 2020, pp. 1025–1034.

Zhang

Huang

Xia

Dai

Zhang

Zheng

Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 17, 2021. pp. 15008–15015.

X. Y.

Liu

H. Y.

J. Q.

Analysis of Subway Station Capacity with the Use of Queueing Theory. Transportation Research Part C: Emerging Technologies, Vol. 38, 2014, pp. 28–43.

Wei

Cao

Sun

Total Unimodularity and Decomposition Method for Large-Scale Air Traffic Cell Transmission Model. Transportation Research Part B: Methodological, Vol. 53, 2013, pp. 1–16.

Sha

Impacts of Traffic Management Measures on Urban Network Microscopic Fundamental Diagram. Journal of Transportation Systems Engineering and Information Technology, Vol. 13, No. 2, 2013, pp. 185–190.

Sun

Zhang

Ran

Interval Prediction for Traffic Time Series Using Local Linear Predictor. Proc., 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749), Washington, WA, IEEE, New York, 2004, pp. 410–415.

Dudek

Pattern-Based Local Linear Regression Models for Short-Term Load Forecasting. Electric Power Systems Research, Vol. 130, 2016, pp. 139–147.

Van Hinsbergen

C. P.

Schreiter

Zuurbier

F. S.

Van Lint

J. W. C.

Van Zuylen

H. J.

Localized Extended Kalman Filter for Scalable Real-Time Traffic State Estimation. IEEE Transactions on Intelligent Transportation Systems, Vol. 13, No. 1, 2011, pp. 385–394.

10.

Ojeda

L. L.

Kibangou

A. Y.

De Wit

C. C.

Adaptive Kalman Filtering for Multi-Step Ahead Traffic Flow Prediction. Proc., 2013 American Control Conference, Washington, D.C., IEEE, New York, 2013, pp. 4724–4729.

11.

Zhang

X. L.

Short-Term Traffic Flow Forecasting Based on K-Nearest Neighbors Non-Parametric Regression. Journal of Systems Engineering, Vol. 24, No. 2, 2009, pp. 178–183.

12.

C. H.

J. M.

Lee

D. T.

Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems, Vol. 5, No. 4, 2004, pp. 276–281.

13.

Yao

Z. S.

Shao

C. F.

Gao

Y. L.

Research on Methods of Short-Term Traffic Forecasting Based on Support Vector Regression. Journal of Beijing Jiaotong University, Vol. 30, No. 3, 2006, pp. 19–22.

14.

Smola

A. J.

Schölkopf

A Tutorial on Support Vector Regression. Statistics and Computing, Vol. 14, No. 3, 2004, pp. 199–222.

15.

Yin

Wong

C. K.

Urban Traffic Flow Prediction Using a Fuzzy-Neural Approach. Transportation Research Part C: Emerging Technologies, Vol. 10, No. 2, 2002, pp. 85–98.

16.

Sun

Zhang

A Bayesian Network Approach to Traffic Flow Forecasting. IEEE Transactions on Intelligent Transportation Systems, Vol. 7, No. 1, 2006, pp. 124–132.

17.

Hou

Edara

Network Scale Travel Time Prediction Using Deep Learning. Transportation Research Record: Journal of the Transportation Research Board, 2018. 2672: 115–123.

18.

Liu

Zheng

Feng

Chen

Short-Term Traffic Flow Prediction with Conv-LSTM. Proc., 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, IEEE, New York, 2017, pp. 1–6.

19.

Yao

Tang

Wei

Zheng

Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 5668–5675.

20.

Yin

Zhu

Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. arXiv Preprint arXiv:.04875, 2017.

21.

Guo

Lin

Feng

Song

Wan

Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 922–929.

22.

Geng

Wang

Zhang

Yang

Liu

Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 3656–3663.

23.

Hong

Chen

Zhu

Temporal Multi-Graph Convolutional Network for Traffic Flow Prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, No. 6, 2020, pp. 3337–3348.

24.

Chen

Xie

Cao

Gao

Feng

Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 3529–3536.

25.

Zhang

Kan

Zhang

Cao

Zhao

Dynamic Spatial–Temporal Convolutional Networks for Traffic Flow Forecasting. Transportation Research Record: Journal of the Transportation Research Board, 2023. https://doi.org/10.1177/03611981231159407.

26.

Jin

Huang

Cui

Huang

Qiao

Yoo

DMGF-Net: An Efficient Dynamic Multi-Graph Fusion Network for Traffic Prediction. ACM Transactions on Knowledge Discovery from Data, Vol. 17, No. 7, 2023, pp. 1–19.

27.

Feng

Yan

Jin

Yang

Sun

Jin

Dynamic Graph Convolutional Recurrent Network for Traffic Prediction: Benchmark and Solution. ACM Transactions on Knowledge Discovery from Data, Vol. 17, No. 1, 2023, pp. 1–21.

28.

Grover

Leskovec

node2vec: Scalable Feature Learning for Networks. Proc., 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, New York, NY, 2016, pp. 855–864.

29.

Zheng

Fan

Wang

Gman: A Graph Multi-Attention Network for Traffic Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 1234–1241.

30.

Newson

Krumm

Hidden Markov Map Matching Through Noise and Sparseness. Proc., 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, Association for Computing Machinery, New York, NY, 2009, pp. 336–343.

31.

Mikolov

Chen

Corrado

Dean

Efficient Estimation of Word Representations in Vector Space. arXiv Preprint arXiv:1301.3781, 2013.

32.

Goldberg

Levy

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method. arXiv Preprint arXiv:1402.3722, 2014.

33.

Kipf

T. N.

Welling

Semi-Supervised Classification with Graph Convolutional Networks. arXiv Preprint arXiv:.02907, 2016.

34.

Defferrard

Bresson

Vandergheynst

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Advances in Neural Information Processing Systems, Vol. 29, 2016, pp. 3844–3852.

35.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Polosukhin

Attention Is All You Need. Advances in Neural Information Processing Systems, Vol. 30, 2017, pp. 6000–6010.