- Syntha AI
- Posts
- New Generative AI Open-Source Models for Image Editing
New Generative AI Open-Source Models for Image Editing
Prompt-to-Prompt, InstructPix2Pix. How to detect AI-generated text.
Generative AI models, such as Stable Diffusion and DALLE 2, enable not only image generation but also image editing. Generally, the process works as follows: start with an image you want to modify. Draw a mask on the image. Provide a text prompt to your generative model. The model then alters only the area under the mask according to your prompt.
Editing with Stable Diffusion allows you to obtain images like the one below. All you need to do is define a mask around the object on the bench and execute the model with the correct prompt. The code for Stable Diffusion image editing can be found here on GitHub.
Image source: https://github.com/runwayml/stable-diffusion
Recent advancements in image editing have made it a powerful and useful tool in real-world applications. Today, I'd like to describe two new AI models with available code that can be used in existing projects or to create new startups.
Prompt-to-Prompt Image Editing with Cross-Attention Control
Existing generative AI methods, such as DALLE and Stable Diffusion, allow for image editing by providing a mask on the image. Google's recent work, Prompt-to-Prompt, enables users to change the image by adjusting just its prompt.
Image source: https://prompt-to-prompt.github.io/
In order to use this model, you need to provide a pair of prompts with some differences between them. Prompt-to-Prompt will then generate a pair of images that reflect the difference in the prompts.
In addition to this, the model is able to modify the weights of specific words in the prompt.
Image source: https://prompt-to-prompt.github.io/
How to use it
If you use Stable Diffusion in your project, you can offer users another type of image editing. It can be implemented as Software-as-a-Service (SaaS) or a plugin for Photoshop or other design tools.
If you find a prompt and an accompanying image in a public images database, such as StockAI or LexicaArt, you can use this method to modify it.
One can turn this method into an image constructor. For example, if a user needs generated image, instead of writing prompts from scratch, they can use some predefined image and prompt like “A [white] [cat] sitting on a [beach]”. Giving users the ability to change words in squared brackets to whatever they want will allow them to easily create images of “A black bird sitting on a tree” or “A green frog sitting on a stone”.
Project page | Code | Paper
InstructPix2Pix: Learning to Follow Image Editing Instructions
While the Prompt-to-Prompt method only works with generated images, a new InstructPix2Pix method can act like ChatGPT for any image including real. To modify an image, all you need is the initial image and a human-style request, such as "make it evening" or "add two cats on the road".
Image source: https://www.timothybrooks.com/instruct-pix2pix/
Unlike ChatGPT, this model does not retain context. However, one could use a modified image as input for subsequent modifications.
Image source: https://www.timothybrooks.com/instruct-pix2pix/
How to use it
Implement an image editor like a chatbot, which will remember the edit history and will allow doing the sequences of edits and rollbacks.
This model might be a useful plugin for Photoshop or other design applications or services.
Project page | Code | Paper
News of the week
OpenAI released a model that can identify AI-generated text. It uses a fine-tuned version of GPT, which predicts one of five labels: "very unlikely", "unlikely", "unclear if it is", "possibly", or "likely" AI-generated.
Tool of the week
SceneryAI is an AI-powered image editing tool. It allows you to select the area of an image you want to edit, provide a prompt, and view the results. The service costs $19 per month and provides unlimited renders.
That is all for today. If you found this post useful, please share it with your friends and colleagues!