(1) NVIDIA(2) University of Toronto (3) Vector Institute (4) Simon Fraser University
Abstract
A longstanding challenge for self-driving development is simulating dynamic
driving scenarios seeded from recorded driving logs. In pursuit of this functionality,
we apply tools from discrete sequence modeling to model how vehicles,
pedestrians and cyclists interact in driving scenarios. Using a simple data-driven
tokenization scheme, we discretize trajectories to centimeter-level resolution using
a small vocabulary. We then model the multi-agent sequence of discrete motion
tokens with a GPT-like encoder-decoder that is autoregressive in time and takes
into account intra-timestep interaction between agents. Scenarios sampled from
our model exhibit state-of-the-art realism; our model tops theWaymo Sim Agents
Benchmark, surpassing prior work along the realism meta metric by 3.3% and
along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy
and partial autonomy settings, and show that the representations learned
by our model can quickly be adapted to improve performance on nuScenes. We
additionally evaluate the scalability of our model with respect to parameter count
and dataset size, and use density estimates from our model to quantify the saliency
of context length and intra-timestep interaction for the traffic modeling task.
@inproceedings{
philion2024trajeglish,
title = {Trajeglish: Traffic Modeling as Next-Token Prediction},
author = {Jonah Philion and Xue Bin Peng and Sanja Fidler},
booktitle = {The Twelfth International Conference on Learning Representations},
year = {2024},
url = {https://openreview.net/forum?id=Z59Rb5bPPP}
}