Latent Image Exploration

Latent space exploration is the process of sampling a point in latent space and incrementally changing the latent representation. A common application is where each sampled point is fed to a decoder and is stored as a frame in the final animation. twitter post

Generative image models learn a "latent manifold" of the visual world: a low-dimensional vector space where each point maps to an image. Going from such a point on the manifold back to a displayable image is called "decoding". In a Stable Diffusion model, this is handled by the "decoder" model.

This latent manifold of images is continuous and interpolative, meaning that:

1. Moving a little on the manifold only changes the corresponding image a little (continuity).

2. For any two points A and B on the manifold (i.e. any two images), it is possible to move from A to B via a path where each intermediate point is also on the manifold (i.e. is also a valid image). Intermediate points would be called "interpolations" between the two starting images.

Stable Diffusion isn't just an image model, though, it's also a natural language model. It has two latent spaces: the image representation space learned by the encoder used during training, and the prompt latent space which is learned using a combination of pretraining and training-time fine-tuning.

Introduction to how Stable Diffusion works. post