Trajeglish: Traffic Modeling as Next-Token Prediction

International Conference on Learning Representations (ICLR 2024)

Jonah Philion (1, 2, 3)    Xue Bin Peng (1, 4)    Sanja Fidler (1, 2, 3)

(1) NVIDIA    (2) University of Toronto    (3) Vector Institute    (4) Simon Fraser University



Abstract

A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops theWaymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.

Paper: [PDF]       Webpage: [Link]       Preprint: [arXiv]

Videos



Bibtex

@inproceedings{
	philion2024trajeglish,
	title = {Trajeglish: Traffic Modeling as Next-Token Prediction},
	author = {Jonah Philion and Xue Bin Peng and Sanja Fidler},
	booktitle = {The Twelfth International Conference on Learning Representations},
	year = {2024},
	url = {https://openreview.net/forum?id=Z59Rb5bPPP}
}