How to generate video from text / image / another video

Hey, this is Denis. Today, I am going to cover Stable Animation, a new method for video generation made by Stability AI, the creators of the famous Stable Diffusion.

Generating AI-powered videos is a challenging task. Each video comprises a sequence of frames, each of which must be a realistic image. Moreover, all frames should be connected in a logical sequence. While there are some effective AI methods for video generation that I covered in my previous articles here and here, we are still far from perfection. And Stability AI takes one more step towards it.

What do we know about Stability AI? They created Stable Diffusion (SD) — one of the best open-source methods for image generation. Since SD is open-source, we now have thousands of new Generative AI methods based on it.

Recently, Stability AI released Stable Animation. Although most of Stability AI's products are open-source, this one is only available via a paid API / SDK.

Stable Animation works in three different modes: text-to-video, image-to-video, and video-to-video. Here are some examples of the generated videos:

Credit to @glitchedelic from Stable Diffusion Discord.

Credit to @Siyris from Stable Diffusion Discord.

Why Stable Animation is cool:

  1. It supports resolutions up to 1024x1024, thanks to upscaling from lower-resolution videos.

  2. There is no limit on the length of generated videos. For example, Gen-2 is currently limited to 4 seconds.

My take on Stable Animation and why it is important:

  • It creates new opportunities for generative AI startups. Video generation is a much less competitive area because it is harder, requires more resources, and works worse. However, video content is much more popular than text content.

  • It is available as a Python SDK (see the guide), so you don’t need to set up a server with a GPU.