Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data

Loading...
Thumbnail Image
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Quantised autoencoders usually split images into local patches, each encoded by one token. This representation is potentially inefficient, as the same number of tokens are spent per region, regardless of the visual information content in that region. To mitigate uneven distribution of information content, modern architectures provide an adaptive discretisation or add an attention mechanism to the autoencoder to infuse global information into the local tokens. Despite these improvements, tokens are still associated with a local image region. In contrast, our method is inspired by spectral decompositions which transform an input signal into a superposition of global frequencies. Taking the data-driven perspective, we train an encoder that produces a combination of tokens that are then decoded jointly, going beyond the simple linear superposition of spectral decompositions. We achieve this global description with an efficient transpose operation between features and channels and demonstrate how our global and holistic representation improves compression and can boost downstream tasks like generation.
Description

CCS Concepts: Computing methodologies → Image compression; Machine learning algorithms

        
@inproceedings{
10.2312:vmv.20251231
, booktitle = {
Vision, Modeling, and Visualization
}, editor = {
Egger, Bernhard
and
Günther, Tobias
}, title = {{
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
}}, author = {
Elsner, Tim
and
Usinger, Paula
and
Czech, Victor
and
Kobsik, Gregor
and
He, Yanjiang
and
Lim, Isaak
and
Kobbelt, Leif
}, year = {
2025
}, publisher = {
The Eurographics Association
}, ISBN = {
978-3-03868-294-3
}, DOI = {
10.2312/vmv.20251231
} }
Citation
Collections