• Syntha AI
  • Posts
  • Generate image from sketch with ControlNet

Generate image from sketch with ControlNet

ControlNet. ChatGPT API. How to host Stable Diffusion

Usually, if you want to generate an image, you probably would use some kind of diffusion model: DALLE 2, Stable Diffusion or Midjourney. These models are great at generating images based on text prompts. All you need is to write a text prompt like “A man standing on a boat”, and the model will provide you with a corresponding image.

The downside of these methods is the lack of control available to you as a user. You can change only the input text prompt, nothing more.

But what if you would like to generate an image with a human in a specific location on the image? Or what if you want some specific landscape with the exact location of a forest, sky and lake? The only thing you can do is to add some information to the prompt: “a painting of a human, who is standing in the right upper corner of the image”. Looks not that precise.

It would be much better if you could just draw a sketch of an image you want and use it along with a prompt. The sketch would drive a generative model to put a human into the location you want while the text prompt will be responsible for the visuals of your image.

In today’s post, I would like to articulate ControlNet — a new AI method which does exactly what I described: controls image generation with an image and text prompt.

ControlNet: Adding Conditional Control to Text-to-Image Diffusion Model's

The new method called ControlNet is all about this. It is so great that it allows controlling Stable Diffusion with an image and text prompt. As an input image, one can use many different things which we discuss below. It is easier to show. Here is, for example, image generation based on the edges of a deer:

Image source: https://arxiv.org/pdf/2302.05543v1.pdf. The bottom left image is input. The four images on the right are the result of the generation done by ControlNet.

ControlNet is able to generate new images based on existing images. All you need to do is extract such edges from an existing image. The process will be the following:

Real photo → edge detection (simple Computer Vision algorithm) → Use detected edges to control the images generated by Stable Diffusion

Now let’s look at some examples of what ControlNet can do.

Edges to image

Image source: https://arxiv.org/pdf/2302.05543v1.pdf. Input edges images are in the left column.

Human pose to image

With a human body pose, we can do a similar process similar to edges. We can use any algorithm for pose estimation to obtain a human skeleton (see left column). The skeleton of a human body is a set of key points: shoulders, elbows, knees etc. Based on this skeleton and a text prompt ControlNet can create an image of a human in the same pose as the skeleton.

Image source: https://arxiv.org/pdf/2302.05543v1.pdf. Input skeletons are in the left column.

Segmentation to image

Another interesting application of ControlNet is image generation based on a segmented image. A segmentation AI model (which is independent of ControlNet) can be applied to any image to assign a label to every pixel. E.g. for indoor scene images, labels could be a sofa, a table, a chair, a TV set etc.

Such segmentation image is then used as input to ControlNet.

Image source: https://arxiv.org/pdf/2302.05543v1.pdf. Input segmentations are in the left column.

You can find more examples in the ControlNet paper. The code for this model is also available on GitHub. Thanks to Hugging Face, you don’t need to code to try Control Net: here is the demo available to play.

How to use ControlNet

  • Create new image editors based on this method.

  • Use the new Photoshop plugin with ControlNet inside.

New Photoshop plugin in action:

News of the week

OpenAI released API for ChatGPT and Whisper. While a lot of people are familiar with ChatGPT, those who are not can read my quick overview of this fantastical tool. It is also a good moment to read my post about 12 startup ideas one can implement with ChatGPT.

Whisper, which was previously open-sourced, now is also available through API. Its main purpose is speech recognition, i.e. converting voice into text. This might be useful in applications where the control is done by voice.

Tool of the week

Evoke is a service that allows you to run AI models in the cloud. This is a young startup that supports Stable Diffusion for now.

I see such a service to be useful for people who want to build a web service or mobile app but don’t want to spend time on AI deployment (which is very sophisticated). Also, an API can be used with no-code services such as Bubble.

Alternatives for this service are Hugging Face and Replicate.

Tweet of the week

Thank you for reading my newsletter. I would greatly appreciate any feedback you have: just reply to this email.

If you like my newsletter, feel free to share it on Twitter or just send a direct link to this post: https://syntha.beehiiv.com/p/generate-image-from-sketch-with-controlnet.

See you next Thursday.