Overview of DDIS
We propose an Diffusion-assisted Data-free Image Synthesis (DDIS), which guides the T2I diffusion model to generate images that are closely aligned with the training-set distribution. Our approach tackles the misalignment problem that arises when directly substituting the training set with images synthesized by a T2I diffusion model.
The goal of DDIS is to generate images approximating the training set distribution learned by \( f^*_\theta \) using a Text-to-Image diffusion model. Firstly, we construct prompts \( \mathbf{y} \) with Class Alignment Token (CAT) and the class label \( c \) provided with the model. Secondly, we provide domain guidance to noise latent \( z_t \) at each time step \( t \) via Domain Alignment Guidance (DAG) , aligning image features with the BN layer statistics within a model \( f^*_\theta \). Lastly, we forward the final image \( \hat{x}_0 \) from the guided image latent \( \tilde{z}_0 \) to the \( f^*_\theta \) and optimize the CAT embedding using Cross-Entropy loss to encode features specific to the target class. (As in the figure above, we can successfully synthesize the “tiger cat” class in the “art” domain via DDIS.)