DDIS: Diffusion-assisted Data-free Image Synthesis

When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class

¹Korea University ²Korea Institute of Science and Technology
³KAIST ⁴Kyung Hee University

^*Equal Contribution

ICML 2025

Abstract

Open-source pre-trained models hold great potential for diverse applications, but their utility declines when their training data is unavailable. Data-Free Image Synthesis (DFIS) aims to generate images that approximate the learned data distribution of a pre-trained model without accessing the original data. However, existing DFIS methods produce samples that deviate from the training data distribution due to the lack of prior knowledge about natural images. To overcome this limitation, we propose DDIS, the first Diffusion-assisted Data-free Image Synthesis method that leverages a text-to-image diffusion model as a powerful image prior, improving synthetic image quality. DDIS extracts knowledge about the learned distribution from the given model and uses it to guide the diffusion model, enabling the generation of images that accurately align with the training data distribution. To achieve this, we introduce Domain Alignment Guidance (DAG) that aligns the synthetic data domain with the training data domain during the diffusion sampling process. Furthermore, we optimize a single Class Alignment Token (CAT) embedding to effectively capture class-specific attributes in the training dataset. Experiments on PACS and ImageNet demonstrate that DDIS outperforms prior DFIS methods by generating samples that better reflect the training data distribution, achieving SOTA performance in data-free applications.

Overview of DDIS

We propose an Diffusion-assisted Data-free Image Synthesis (DDIS), which guides the T2I diffusion model to generate images that are closely aligned with the training-set distribution. Our approach tackles the misalignment problem that arises when directly substituting the training set with images synthesized by a T2I diffusion model. figure1_image.

The goal of DDIS is to generate images approximating the training set distribution learned by \( f^*_\theta \) using a Text-to-Image diffusion model. Firstly, we construct prompts \( \mathbf{y} \) with Class Alignment Token (CAT) and the class label \( c \) provided with the model. Secondly, we provide domain guidance to noise latent \( z_t \) at each time step \( t \) via Domain Alignment Guidance (DAG) , aligning image features with the BN layer statistics within a model \( f^*_\theta \). Lastly, we forward the final image \( \hat{x}_0 \) from the guided image latent \( \tilde{z}_0 \) to the \( f^*_\theta \) and optimize the CAT embedding using Cross-Entropy loss to encode features specific to the target class. (As in the figure above, we can successfully synthesize the “tiger cat” class in the “art” domain via DDIS.)

Quantitative Results of DDIS

We compare the image quality with existing DFIS studies to evaluate whether the generated images approximate the distribution of the training set used to train the model. For this evaluation, we use (1) Inception Score (IS) and Frechet Inception Distance (FID), and (2) Precision and Recall (P&R), to measure the fidelity and diversity of synthetic images strictly.

Qualitative Results of DDIS

Qualitative comparison with various DFIS methods on the PACS (Art, Cartoon) and Style-Aligned dataset. Since we do not know the knowledge of the dataset used for training the classifier, existing DFIS methods must explore an extremely large image search space. This leads to generated images with artifacts that fail to accurately capture the training dataset's domain properties. DDIS leverages DAG in the diffusion sampling process and optimizes CAT embeddings to synthesize images that accurately reflect the training dataset's domain and class attributes, producing images that closely resemble the original data.

Art

Cartoon

Style-Aligned

Applications of DDIS

DDIS aims to enhance the utility of a given model by generating samples that approximate the distribution of the training data. Accordingly, we conduct experiments on Data-Free Knowledge Distillation (DFKD) and Data-Free Model Pruning using synthetic images without direct access to the training data. Specifically, we synthesize 2,800 images from the PACS dataset and 100k images from ImageNet-1k for these experiments.

Data-Free
Knowledge Distillation

Data-Free
Model Pruning