- Syntha AI
- Posts
- New Generative AI Open-Source Models for Image Editing
New Generative AI Open-Source Models for Image Editing
Prompt-to-Prompt, InstructPix2Pix. How to detect AI-generated text.
Generative AI models, such as Stable Diffusion and DALLE 2, enable not only image generation but also image editing. Generally, the process works as follows: start with an image you want to modify. Draw a mask on the image. Provide a text prompt to your generative model. The model then alters only the area under the mask according to your prompt.
![null](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/31428508-a1fe-4223-bf57-a118845116f1/dsyntha_robot_editing_the_painting_it_created_vibrant_neon_colo_88d8b925-10f0-4e7e-9667-c55b45cb157c.png)
Editing with Stable Diffusion allows you to obtain images like the one below. All you need to do is define a mask around the object on the bench and execute the model with the correct prompt. The code for Stable Diffusion image editing can be found here on GitHub.
![](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/1e7280b5-12e2-4d4b-aabc-2e8055557a3d/image.png)
Image source: https://github.com/runwayml/stable-diffusion
Recent advancements in image editing have made it a powerful and useful tool in real-world applications. Today, I'd like to describe two new AI models with available code that can be used in existing projects or to create new startups.
Prompt-to-Prompt Image Editing with Cross-Attention Control
Existing generative AI methods, such as DALLE and Stable Diffusion, allow for image editing by providing a mask on the image. Google's recent work, Prompt-to-Prompt, enables users to change the image by adjusting just its prompt.
![](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/ccc4f990-d1ff-4bb8-9af6-5a285ca31290/image.png)
Image source: https://prompt-to-prompt.github.io/
In order to use this model, you need to provide a pair of prompts with some differences between them. Prompt-to-Prompt will then generate a pair of images that reflect the difference in the prompts.
In addition to this, the model is able to modify the weights of specific words in the prompt.
![Source: https://prompt-to-prompt.github.io/](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/b80eb2a1-564d-4637-896f-f8f069661a6a/99_imagen_results_web-04.png)
Image source: https://prompt-to-prompt.github.io/
How to use it
If you use Stable Diffusion in your project, you can offer users another type of image editing. It can be implemented as Software-as-a-Service (SaaS) or a plugin for Photoshop or other design tools.
If you find a prompt and an accompanying image in a public images database, such as StockAI or LexicaArt, you can use this method to modify it.
One can turn this method into an image constructor. For example, if a user needs generated image, instead of writing prompts from scratch, they can use some predefined image and prompt like “A [white] [cat] sitting on a [beach]”. Giving users the ability to change words in squared brackets to whatever they want will allow them to easily create images of “A black bird sitting on a tree” or “A green frog sitting on a stone”.
Project page | Code | Paper
InstructPix2Pix: Learning to Follow Image Editing Instructions
While the Prompt-to-Prompt method only works with generated images, a new InstructPix2Pix method can act like ChatGPT for any image including real. To modify an image, all you need is the initial image and a human-style request, such as "make it evening" or "add two cats on the road".
![](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/c47591c6-8435-4dbe-99ab-c540e412ebc5/image.png)
Image source: https://www.timothybrooks.com/instruct-pix2pix/
Unlike ChatGPT, this model does not retain context. However, one could use a modified image as input for subsequent modifications.
![Source: https://www.timothybrooks.com/instruct-pix2pix/](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/6e7fb422-f2c0-43f5-9e62-4a445f7ffa3b/abbey.jpg)
Image source: https://www.timothybrooks.com/instruct-pix2pix/
How to use it
Implement an image editor like a chatbot, which will remember the edit history and will allow doing the sequences of edits and rollbacks.
This model might be a useful plugin for Photoshop or other design applications or services.
Project page | Code | Paper
News of the week
OpenAI released a model that can identify AI-generated text. It uses a fine-tuned version of GPT, which predicts one of five labels: "very unlikely", "unlikely", "unclear if it is", "possibly", or "likely" AI-generated.
![](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/74d61bdd-bea9-4edd-9c62-a4edc9aa6c0b/image.png)
Tool of the week
SceneryAI is an AI-powered image editing tool. It allows you to select the area of an image you want to edit, provide a prompt, and view the results. The service costs $19 per month and provides unlimited renders.
![https://sceneryai.com/images/home/header-bg-22.png](https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/47ab6f7d-3a63-4ba3-ad0c-2eaca50b94cc/header-bg-22.png)
That is all for today. If you found this post useful, please share it with your friends and colleagues!