CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

(CVPR 2022)

1University of California San Diego, 2Nvidia
(* Work done while an intern at Nvidia)

A structure-texture disentangled GAN with dense correspondence map. Each row: same structure but different texture codes.

Combining structure codes learned from real images with textures learned from art images to generate images with diverse art styles.

Abstract

Recent advances show that Generative Adversarial Networks (GANs) can synthesize images with smooth variations along semantically meaningful latent directions, such as pose, expression, layout, etc. While this indicates that GANs implicitly learn pixel-level correspondences across images, few studies explored how to extract them explicitly. In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i.e., the correspondence map, which describes the structure (e.g., the shape of a face), is controlled via a transformation. Hence, finding correspondences boils down to locating coordinates of different correspondence maps that are transformed from the same coordinates in the canonical frame. In CoordGAN, we sample a transformation to represent the structure of a synthesized instance, while an independent texture branch is responsible for rendering appearance details orthogonal to the structure. Our approach can also extract dense correspondence maps for real images by adding an encoder on top of the generator. We quantitatively demonstrate the quality of the learned dense correspondences through segmentation mask transfer on multiple datasets. We also show that the proposed generator achieves better structure and texture disentanglement compared to existing approaches.

Video

Identity-preserved Texture Swapping

Each row shows images generated with the same structure code but different texture codes . Correspondence maps are controlled by structure codes.

Structure Swapping

Each row shows images generated with the same texture code but different structure codes . Correspondence maps are controlled by structure codes.

Semantic Label Propagation

In each row, given one reference image along with its semantic labels, the proposed approach predicts its correspondence map and propagates its segmentation mask to other query images.

BibTeX

@article{mu2022coordgan,
                author = {Mu, Jiteng and De Mello, Shalini and Yu, Zhiding
                          and Vasconcelos, Nuno and Wang, Xiaolong and Kautz, Jan and Liu, Sifei},
                title = {CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs},
                journal = {arXiv preprint arXiv: 2203.16521},
                year={2022}}