LaneGCN 论文解读
Learning Lane Graph Representations for Motion Forecasting
paper link:
https://arxiv.org/abs/2007.13732
PPT:
https://www.cs.toronto.edu/~byang/slides/LaneGCN.pdf
Architechture
Lane Graph + Actor Map:
construct lane graph from vectorized map data to preserve the map structure and can avoid information loss 构建矢量化地图信息,避免地图信息丢失
LaneGCN:
extends graph convolutions with multiple adjacency matrices and along-lane dilation
- to capture complex topology and long range dependencies of the lane graph.
exploit a fusion network consisting of four types of interactions:
actor-to-lane
,lane-to-actor
,actor-to-actor
,lane-to-lane
.- present both actors and lanes as nodes in the graph and use a 1D CNN and LaneGCN to extract the features for the actor and lane nodes respectively, and then exploit spatial attention and another LaneGCN to model four types of interactions.
Difference between VectorNet and LaneGCN:
- VecotrNet uses vanilla graph networks with undirected full connections; LaneGCN uses connected lane graph following the map topology and propose task specific multi-type and dilated graph operators.
- VectorNet uses polyline-level nodes for interactions; LaneGCN uses polyline segments as map nodes to capture higher resolution.
Lane Graph Representations for Motion Forecasting
ActorNet: Extracting Traffic Participant Representations
Each Trajctory is represented as a sequence of displacement , where is the 2D displacement from time step to t, and T is the trajectory size.
For trajectories with sizes smaller than , we pad them with zeros. We add a binary mask to indicate if the element at each step is padded or not and concatenate it with the trajectory tensor, resulting in an input tensor of size .
1D CNN is used to process the trajectory input for its effectiveness in extracting multi-scale features and efficiency in parallel computing. The output of ActorNet is a temporal feature map, whose element at is used as the actor feature. The network has 3 groups/scales of 1D convolutions.
Each group consists of 2 residual blocks, with the stride of the first block as 2. We then use a Feature Pyramid Network (FPN) to fuse the multi-scale features, and apply another residual block to obtain the output tensor. For all layers, the convolution kernel size is 3 and the number of output channels is 128. Layer Normalization and the Rectified Linear Unit (ReLU) are used after each convolution.
MapNet: Extracting Structured Map Representation
General Architecture:
- part 1: building a lane graph from vectorized map data;
- part 2: applying our novel LaneGCN to the lane graph to output the map features.
Map Data:
In this paper, we adopt a simple form of vectorized map data as our representation of HD maps. Specifically, the map data is represented as a set of lanes and their connectivity. Each lane contains a centerline, i.e., a sequence of 2D BEV points, which are arranged following the lane direction (see Fig. 3, top). For any two lanes which are directly reachable, 4 types of connections are given: predecessor
, successor
, left neighbour
and right neighbour
.
Lane Graph Construction:
first define a lane node as the straight line segment formed by any two consecutive points (grey circles in Fig. 3) of the centerline. The location of a lane node is the averaged coordinates of its two end points. Following the connections between lane centerlines, we also derive 4 connectivity types 4 connectivity types for the lane nodes, i.e., predecessor
, successor
, left neighbour
and right neighbour
.
We denote the lane nodes with , where is the number of lane nodes and the -th row of is the BEV coordinates of the -th node. We represent the connectivity with 4 adjacency matrices , with .
We denote , as the element in the -th row and -th column of . Then if node is an -type neighbor of node .
LaneConv Operator:
Node Feature: Each lane node corresponds to a straight line segment of a centerline. To encode all the lane node information, we need to take into account both the shape (size and orientation) and the location (the coordinates of the center) of the corresponding line segment. We parameterize the node feature as follows,
where indicates a multi-layer perceptron and the two subscripts refer to shape and location, respectively. is the location of the i-th lane node, i.e., the center between two end points, and are the BEV coordinates of the node starting and ending points, and is the -th row of the node feature matrix , denoting the input feature of the -th lane node.
LaneConv: To aggregate the topology information of the lane graph at a larger scale, we design the following LaneConv operator:
where and are the adjacency and the weight matrices corresponding to the -th connection type respectively. Since we order the lane nodes from the start to the end of the lane, and are matrices obtained by shifting the identity matrix (diagnal 1) one step towards upper right (non-zero superdiagonal) and lower left (non-zero subdiagonal). and can propagate information from the forward and backward neighbours whereas and allow information to flow from the cross-lane neighbours. It is not hard to see that our LaneConv builds on top of the general graph convolution and encodes more geometric (e.g., connection type/direction) information. As shown in our experiments this improves over the vanilla graph convolution.
Dilated LaneConv:
Functionality: The model needs to capture the long range dependency along the lane direction for accurate prediction.
the k-dilation LaneConv operator is defined as follows:
where is the -th matrix power of . This allows us to directly propagate information along the lane for steps, with a hyperparameter. Since is highly sparse, one can efficiently compute it using sparse matrix multiplication. Note that the dilated LaneConv is only used for predecessor and successor, as the long range dependency is mostly along the lane direction.
LaneGCN:
With Eq.(2) and Eq.(3), we get a multi-scale LaneConv operator with C dilation size as follows:
where is the -th dilation size. We denote this multi-scale layer.
Fusion Net
Four types fusion modules:
- A2L: introduces real-time traffic information to lane nodes, such as blockage or usage of the lanes.
- L2L: updates lane node features by propagating the traffic information over the lane graph. -> LaneGCN
- L2A: fuses updated map features with real-time traffic information back to the actors.
- A2A: handles the interactions between actors and produces the output actor features, which are then used by the prediction header for motion forecasting.
We implement L2L using another LaneGCN, which has the same architecture as the one used in our MapNet (see Section 3.2). In the following we describe the other three modules in detail. We exploit a spatial attention layer for A2L, L2A and A2A. The attention layer applies to each of the three modules in the same way. Taking A2L as an example, given an actor node i, we aggregate the features from its context lane nodes j as follows:
with the feature of the -th node, a weight matrix, the compositon of layer notmalization and RelU, and , where denotes the node location.
Prediction Header
Take after-fusion actor features as input, a multi-modal prediction header outputs the final motion forecasting. For each actor, it predicts possible future trajectories and their confidence scores.
The header has two branches, a regression branch to predict the trajectory of each mode and a classification branch to predict the confidence score of each mode.
For the m-th actor, we apply a residual block and a linear layer in the regression branch to regress the K sequences of BEV coordinates:
where is the predicted -th actor’s BEV coordinates of the -th mode at the -th time step. For the classification branch, we apply an MLP to to get distance embeddings. We then concatenate each distance embedding with the actor feature, apply a residual block and a linear layer to output confidence scores, .
Learning
use the sum of classification and regreesion losses to train the model:
where .
For classification, we use the max-margin loss:
where is the margin and is the total number of actors. For regression, we apply the smooth loss on all predicted time steps:
where is the ground truth BEV coordinates at time step , , is the -th element of , and is the smooth loss defined as:
where denotes the norm of .
Neural Network Layout
Data Process And Network Construction
以官方的 2645.csv 数据集为例子
agent node:
data['city']:
城市名称data['trajs'] = [agt_traj] + ctx_trajs:
轨迹点,(agent + context vehicles)data['steps'] = [agt_step] + ctx_steps:
在原始数据中的位置data['feats'] = feats:
(13 X 20 X 3) 前 20 预测轨迹 + 一维是否存在点data['ctrs'] = ctrs:
(13 X 2) 中心点data['orig'] = orig:
AGENT 当前点坐标data['theta'] = theta:
AGENT 偏转角data['rot'] = rot:
(2 X 2) 旋转矩阵data['gt_preds'] = gt_preds:
(13 X 30 X 2) 后 30 帧真实轨迹data['has_preds'] = has_preds:
(13 X 30) 标识后 30 帧轨迹是否存在
lane node:
graph['ctrs'] = np.concatenate(ctrs, 0):
lane node 的中心点坐标graph['num_nodes'] = num_nodes:
lane node 的数量graph['feats'] = np.concatenate(feats, 0):
lane node 方向向量graph['turn'] = np.concatenate(turn, 0):
lane node 转向标识graph['control'] = np.concatenate(control, 0):
lane node 的 has_traffic_control 标识graph['intersect'] = np.concatenate(intersect, 0):
lane node 的 is_intersection 标识graph['pre'] = [pre]:
pre [‘u’] 和 pre [‘v’], v 是 u 的 pre, 这里表述的是 lane node 之间的关系graph['suc'] = [suc]:
suc [‘u’] 和 suc [‘v’], v 是 u 的 suc, 这里表述的是 lane node 之间的关系graph['lane_idcs'] = lane_idcs:
lane node index1 2 3 4
0 0 0 ... 0 1 1 1 ... 1 ... 83 83 83 ... 83
graph['pre_pairs'] = pre_pairs:
pair 表述的是 lane 之间的关系graph['suc_pairs'] = suc_pairs:
pair 表述的是 lane 之间的关系graph['left_pairs'] = left_pairs:
pair 表述的是 lane 之间的关系graph['right_pairs'] = right_pairs:
pair 表述的是 lane 之间的关系- 对于
pre['u']
和pre['v']
, v 是 u 的 pre - 对于
suc['u']
和suc['v']
, v 是 u 的 suc - 对于
left['u']
和left['v']
, v 是 u 的 left - 对于
right['u']
和right['v']
, v 是 u 的 right
- 对于
Net 结构
- ActorNet
input:
M x 3 x 20output:
M x 128 x 20
解释:
MapNet: 把 v 按照 u 加到 center 上
input:
N x 4output:
N x 128A2M
input:
N x 128output:
N x 128M2M
input:
N x 128output:
N x 128M2A
input:
N x 128output:
M x 128A2A
input:
N x 128output:
N x 128Prediction Header:
input
M x 128- MLP Regression
- MLP Classification
ref link: https://zhuanlan.zhihu.com/p/447129428

