Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Quantised autoencoders usually split images into local patches, each encoded by one token. This representation is potentially inefficient, as the same number of tokens are spent per region, regardless of the visual information content in that region. To mitigate uneven distribution of information content, modern architectures provide an adaptive discretisation or add an attention mechanism to the autoencoder to infuse global information into the local tokens. Despite these improvements, tokens are still associated with a local image region. In contrast, our method is inspired by spectral decompositions which transform an input signal into a superposition of global frequencies. Taking the data-driven perspective, we train an encoder that produces a combination of tokens that are then decoded jointly, going beyond the simple linear superposition of spectral decompositions. We achieve this global description with an efficient transpose operation between features and channels and demonstrate how our global and holistic representation improves compression and can boost downstream tasks like generation.
Description
CCS Concepts: Computing methodologies → Image compression; Machine learning algorithms
@inproceedings{10.2312:vmv.20251231,
booktitle = {Vision, Modeling, and Visualization},
editor = {Egger, Bernhard and Günther, Tobias},
title = {{Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data}},
author = {Elsner, Tim and Usinger, Paula and Czech, Victor and Kobsik, Gregor and He, Yanjiang and Lim, Isaak and Kobbelt, Leif},
year = {2025},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-294-3},
DOI = {10.2312/vmv.20251231}
}