Spline-based Transformers

Published in Europen Conference on Computer Vision (ECCV), 2024

Abstract

We introduce Spline-based Transformers, a novel class of Transformer models that eliminate the need for positional encoding. Inspired by workflows using splines in computer animation, our Spline-based Transformers embed an input sequence of elements as a smooth trajectory in latent space. Overcoming drawbacks of positional encoding such as sequence length extrapolation, Spline-based Transformers also provide a novel way for users to interact with transformer latent spaces by directly manipulating the latent control points to create new latent trajectories and sequences. We demonstrate the superior performance of our approach in comparison to conventional positional encoding on a variety of datasets, ranging from synthetic 2D to large-scale real-world datasets of images, 3D shapes, and animations.

Project Page

Bibtex:

@InProceedings{ChandranSerifi2024,
    author={Chandran, Prashanth
    and Serifi, Agon
    and Gross, Markus
    and B{\"a}cher, Moritz},
    editor={Leonardis, Ale{\v{s}}
    and Ricci, Elisa
    and Roth, Stefan
    and Russakovsky, Olga
    and Sattler, Torsten
    and Varol, G{\"u}l},
    title={Spline-Based Transformers},
    booktitle={Computer Vision -- ECCV 2024},
    year={2025},
    publisher={Springer Nature Switzerland},
    pages={1--17},
    isbn={978-3-031-73016-0}
}