Syntha AI
Posts
DeepFloyd IF: generate images with text

DeepFloyd IF: generate images with text

New model by Stability AI + DeepFloyd lab

Denis Volkhonskiy
May 04, 2023

If you've ever used Generative AI models for images, you know they are bad at adding text to objects in an image. If you want to generate a restaurant with a sign — good luck. Models like Stable Diffusion, Midjouney, and DALLE 2 will change the text you want to generate a lot. The new model called DeepFloyd IF by Stability AI and DeepFloyd research lab finally can generate images with text. Correct text. Let’s dive deep into it.

Let us imagine that I decided to open a restaurant called Syntha AI. How it should look like? I asked two models: Midjouney (left) and DeepFloyd IF (right).

While Midjouney generated really cool image, the text in the sign is not what I expected. DeepFloyd IF did the job really well.

DeepFloyd IF examples

Here are some examples of generated images.

Image source: https://github.com/deep-floyd/IF

You can find more examples here or try the free demo on Hugging Face.

How DeepFloyd IF works

Image source: https://github.com/deep-floyd/IF

IF consists of the 3 diffusion parts, that are sequentially applied to the input text:

Text → 64x64 image
Text + 64x64 image → 256x256 image
Text + 256x256 image → 1024x1024 image

Basically, the model is very similar to Google’s Imagen, which is … not available even as a demo or API.

What DeepFloyd IF can do

Besides text-to-image generation, IF is capable of the following tasks.

Image-to-image translation with DeepFloyd IF

This is helpful when you want to change the style of your image. It works with any image, not just generated. For example, you can ask the model to change the style to be an oil painting, origami or abstract drawing.

Image source: https://github.com/deep-floyd/IF

Image super-resolution with DeepFloyd IF

If you want to increase the size of your image, you may consider using super-resolution mode. This is possible thanks to the second and third parts of the neural network.

Image source: https://github.com/deep-floyd/IF

Image Inpainting with DeepFloyd IF

Inpainting is when you have a missing part in the image and want to draw something there. For example, you may want to remove people from your landscape photo and inpaint the background.

Image source: https://github.com/deep-floyd/IF

How to use DeepFloyd IF

While DeepFloyd IF at the moment is available for research purposes only, the authors promise to release the model for commercial usage later. So it is a good moment to learn how to use it. Here are some use cases:

Create a web service, that can generate logos with AI. Now it is possible to add a company name to a generated logo.
Create a service, that would automatically generate videos based on song lyrics. Here is an example:

Create an API or web service for image inpainting and super-resolution. This functionality can also be available as a plugin for existing tools like Canva or Photoshop.

Well, the text is not always 100% correct.

Thank you for reading,
Denis