Friday, September 23, 2022

Stable Diffusion Based Image Compression

Matthias Bühlmann (via Hacker News):

These examples make it quite evident that compressing these images with Stable Diffusion results in vastly superior image quality at a smaller file sizes compared to JPG and WebP. This quality comes with some important caveats which must be considered, as I will explain in the evaluation section, but at first glance, this is a very promising option for aggressive lossy image compression.

[…]

The main algorithm of Stable Diffusion, which generates new images from short text descriptions, operates on this latent space representation of images. It starts with random noise in the latent space representation and then iteratively de-noises this latent space image by using the trained U-Net, which in simple terms outputs predictions of what it thinks it “sees” in that noise, similar to how we sometimes see shapes and faces when looking at clouds. When Stable Diffusion is used to generate images, this iterative de-noising step is guided by the third ML model, the text encoder, which gives the U-Net information about what it should try to see in the noise. For the experimental image codec presented here, the text encoder is not needed.

[…]

To use Stable Diffusion as an image compression codec, I investigated how the latent representation generated by the VAE could be efficiently compressed.

Previously:

1 Comment RSS · Twitter


This is incredibly clever, and gives me even more of a glimpse of the world of fungible content we're flying headlong into. Which I'm defining as something you can see in the intersection of deep fakes and AI-generated content, where the quality of an image has nothing to do with its truth value, like it has up until now. We've transitioned from analog formats, which fuzz up when distorted, to digital formats, which just self destruct or at least artifact heavily when distorted, to AI formats, which will lose content and meaning when distorted, but otherwise look perfectly realistic, because they're being processed by a mind like ours to look realistic to them.

I do think it's funny that we're using 5 gigabyte AI programs to compress and decompress kb-sized images though.

Leave a Comment