Generate 3D worlds from text

Text to games. Point-E by OpenAI. DreamFusion.

We have already seen significant progress in generative AI for text, images, videos, and programming code. Today, I am going to share a couple of methods for another data domain: 3-dimensional (3D) data.

Why generate 3D data? Because it can be used in game development and film production. Actually, most modern games are 3D games that require a lot of different assets: human body models, backgrounds, and tons of different objects. Each table, cup, or bird you see in a 3D game is actually a 3D model. The same goes for films. Although films are mostly 2D, in many cases people need 3D models. That is why generating 3D is a very important topic.

3D data generation is much more complicated than image generation. Firstly, there is usually less data available for training. Look, there are tons of images on the internet. In the case of 3D, one needs to use special sensors to obtain the data, such as a LIDAR sensor. Although it is available in some iPhone models, it is not that accurate. High-quality LIDAR can cost thousands of USD.

Secondly, the neural networks for 3D data are usually more sophisticated. Usually, they have more parameters. They are harder and more expensive to train compared to AI models for images.

Now let’s look at some methods for text to 3D generation. One of them does not require any 3D data for training at all.

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

The Point-E model was created by OpenAI. In this case, OpenAI is actually open, as the model is open-sourced on GitHub. The model allows text to be transformed into a 3D data format called a point cloud. This data type is represented as a set of points with coordinates (x, y, z).

There is also an extension to Point-E that allows the addition of the AND and NOT operators to the text prompt for Point-E. These operators act like logical operations for text prompts, which affect the resulting generated 3D point cloud.

DreamFusion: Text-to-3D using 2D Diffusion

DreamFusion is another interesting model for 3D data generation from text prompts. The cool feature of this model is that it doesn’t require any 3D training data. Instead, they need a pre-trained image generator (like Stable Diffusion).

News of the week

Stability AI launches StableLM: an open-source Large Language Model

Why StableLM is good news? Because such open-source projects drive the development of AI. Stable Diffusion, an AI model by Stability AI for images, has made it possible for many projects to exist. People can take such models and develop their own models based on them. Thus, having an open-source OpenAI GPT competitor is great for the industry.

StableLM is available for both research and commercial usage. The model itself is currently in the process of training. Stability AI plans to train several different models with different number of parameters. The latest model trained (not the biggest one) is available as a demo.

ChatGPT competitor by HuggingFace

HuggingFace launches Chat — an open-source ChatGPT competitor. It is based on Meta’s LLaMa model.

Tools of the week

Spline AI — a tool, that allows to generate and edit 3D objects and scenes based on text prompts.

Blockade allows to generate 3D worlds with AI. There is a demo to try. Here is my “cyberpunk mario” world.

Tweet of the week