{"id":37123,"date":"2022-09-23T16:36:12","date_gmt":"2022-09-23T20:36:12","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=37123"},"modified":"2022-09-23T16:36:51","modified_gmt":"2022-09-23T20:36:51","slug":"stable-diffusion-based-image-compression","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2022\/09\/23\/stable-diffusion-based-image-compression\/","title":{"rendered":"Stable Diffusion Based Image Compression"},"content":{"rendered":"<p><a href=\"https:\/\/matthias-buehlmann.medium.com\/stable-diffusion-based-image-compresssion-6f1f0a399202\">Matthias B&uuml;hlmann<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=32907494\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/matthias-buehlmann.medium.com\/stable-diffusion-based-image-compresssion-6f1f0a399202\">\n<p>These examples make it quite evident that compressing these images with Stable Diffusion results in vastly superior image quality at a smaller file sizes compared to JPG and WebP. This quality comes with some important caveats which must be considered, as I will explain in the evaluation section, but at first glance, this is a very promising option for aggressive lossy image compression.<\/p>\n<p>[&#8230;]<\/p>\n<p>The main algorithm of Stable Diffusion, which generates new images from short text descriptions, operates on this latent space representation of images. It starts with random noise in the latent space representation and then iteratively de-noises this latent space image by using the trained U-Net, which in simple terms outputs predictions of what it thinks it &ldquo;sees&rdquo; in that noise, similar to how we sometimes see shapes and faces when looking at clouds. When Stable Diffusion is used to generate images, this iterative de-noising step is guided by the third ML model, the text encoder, which gives the U-Net information about what it should try to see in the noise. For the experimental image codec presented here, the text encoder is not needed.<\/p>\n<p>[&#8230;]<\/p>\n<p>To use Stable Diffusion as an image compression codec, I investigated how the latent representation generated by the VAE could be efficiently compressed.<\/p>\n<\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2022\/09\/23\/midjourney-and-stable-diffusion\/\">Midjourney and Stable Diffusion<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/06\/22\/h-265hevc-and-heif\/\">H.265\/HEVC and HEIF<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2016\/07\/15\/lepton-image-compression\/\">Lepton Image Compression<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2014\/11\/25\/jpeg-image-compression\/\">JPEG Image Compression<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Matthias B&uuml;hlmann (via Hacker News): These examples make it quite evident that compressing these images with Stable Diffusion results in vastly superior image quality at a smaller file sizes compared to JPG and WebP. This quality comes with some important caveats which must be considered, as I will explain in the evaluation section, but at [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2022-09-23T20:36:14Z","apple_news_api_id":"328d9863-2ec7-4a94-9728-be77e622e688","apple_news_api_modified_at":"2022-09-23T20:36:54Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAA==","apple_news_api_share_url":"https:\/\/apple.news\/AMo2YYy7HSpSXKL535iLmiA","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[1351,357,619,2281],"class_list":["post-37123","post","type-post","status-publish","format-standard","hentry","category-technology","tag-artificial-intelligence","tag-compression","tag-graphics","tag-stable-diffusion"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/37123","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=37123"}],"version-history":[{"count":2,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/37123\/revisions"}],"predecessor-version":[{"id":37125,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/37123\/revisions\/37125"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=37123"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=37123"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=37123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}