DViTGAN: Training ViTGANs with Diffusion

Loading...
Thumbnail Image
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Recent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.
Description

CCS Concepts: Computing methodologies → Collision detection; Hardware → Sensors and actuators; PCB design and layout

        
@inproceedings{
10.2312:pg.20241305
, booktitle = {
Pacific Graphics Conference Papers and Posters
}, editor = {
Chen, Renjie
and
Ritschel, Tobias
and
Whiting, Emily
}, title = {{
DViTGAN: Training ViTGANs with Diffusion
}}, author = {
Tong, Mengjun
and
Rao, Hong
and
Yang, Wenji
and
Chen, Shengbo
and
Zuo, Fang
}, year = {
2024
}, publisher = {
The Eurographics Association
}, ISBN = {
978-3-03868-250-9
}, DOI = {
10.2312/pg.20241305
} }
Citation