{"id":42070,"date":"2024-02-09T19:29:44","date_gmt":"2024-02-10T00:29:44","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=42070"},"modified":"2024-02-09T19:29:44","modified_gmt":"2024-02-10T00:29:44","slug":"mllm-guided-image-editing-mgie","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2024\/02\/09\/mllm-guided-image-editing-mgie\/","title":{"rendered":"MLLM-Guided Image Editing (MGIE)"},"content":{"rendered":"<p><a href=\"https:\/\/www.theverge.com\/2024\/2\/7\/24065125\/apple-generative-ai-image-editing-mgie-open-source-model\">Emilia David<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.theverge.com\/2024\/2\/7\/24065125\/apple-generative-ai-image-editing-mgie-open-source-model\"><p>Apple researchers <a href=\"https:\/\/arxiv.org\/pdf\/2309.17102.pdf\">released a new model<\/a> that lets users describe in plain language what they want to change in a photo without ever touching photo editing software.<\/p><p>The MGIE model, which Apple worked on with the University of California, Santa Barbara, can crop, resize, flip, and add filters to images all through text prompts. <\/p><p>MGIE, which stands for MLLM-Guided Image Editing, can be applied to simple and more complex image editing tasks like modifying specific objects in a photo to make them a different shape or come off brighter. The model blends two different uses of multimodal language models. First, it learns how to interpret user prompts. Then it &ldquo;imagines&rdquo; what the edit would look like (asking for a bluer sky in a photo becomes bumping up the brightness on the sky portion of an image, for example).<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/appleinsider.com\/articles\/24\/02\/07\/apple-throws-its-hat-into-the-ai-generated-image-ring\">Amber Neely<\/a>:<\/p>\n<blockquote cite=\"https:\/\/appleinsider.com\/articles\/24\/02\/07\/apple-throws-its-hat-into-the-ai-generated-image-ring\">\n<p>MGIE is open-source and available on GitHub for anyone to try. The <a href=\"https:\/\/github.com\/apple\/ml-mgie\">GitHub page<\/a> allows users to snag the code, data, and pre-trained models. <\/p>\n<\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/12\/29\/apples-ferret-mllm\/\">Apple&rsquo;s Ferret MLLM<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2022\/12\/02\/stable-diffusion-with-core-ml-on-apple-silicon\/\">Stable Diffusion With Core ML on Apple Silicon<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Emilia David: Apple researchers released a new model that lets users describe in plain language what they want to change in a photo without ever touching photo editing software.The MGIE model, which Apple worked on with the University of California, Santa Barbara, can crop, resize, flip, and add filters to images all through text prompts. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2024-02-10T00:29:46Z","apple_news_api_id":"d226ddd1-6ff2-48e4-98a7-dd13a78eea8a","apple_news_api_modified_at":"2024-02-10T00:29:47Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/A0ibd0W_ySOSYp90Tp47qig","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[38,1351,619,74],"class_list":["post-42070","post","type-post","status-publish","format-standard","hentry","category-technology","tag-apple","tag-artificial-intelligence","tag-graphics","tag-opensource"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/42070","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=42070"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/42070\/revisions"}],"predecessor-version":[{"id":42071,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/42070\/revisions\/42071"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=42070"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=42070"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=42070"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}