• Syntha AI
  • Posts
  • Segment Anything by Meta: The Future of Image Segmentation

Segment Anything by Meta: The Future of Image Segmentation

Segmentation 101. Segment Anything by Meta

Hey, this is Denis. Today, we will discuss the segmentation task and the Segment Anything model by Meta. I will explain the general principles of how it works, show some examples of how to use it, and explore the services that can be developed from it.

Reading time: 5 minutes

Image Segmentation 101

Image Segmentation is a Computer Vision task where an AI model predicts a class label for each pixel of an image. For example, it can label each pixel as "human" or "background". You may have noticed this in Zoom calls, where the program replaces your background with a different one.

Image source: https://tenor.com/

Segmentation can be applied to many different areas. Here are some use cases:

Segment Anything by Meta

Meta's new Segment Anything (SAM) model shows great promise in segmenting objects. The model produces high-quality segmentations, and can even segment classes that it has never seen during training.

How Segment Anything Works

The model works in several modes:

  • Segment everything mode: The model finds every object class in the image.

  • Click-to-segment mode: Click on an object in an image, and the model segments that object.

  • Bounding box mode: Draw a bounding box, and the model segments the objects inside.

  • Text-based mode: This mode is not yet released, but the model will highlight objects based on text prompts.

The model can generate multiple levels of masks for the same object. For example, if you click on a person's head, the model can highlight just the head on the first level and the entire body on the second level.

Segment Anything Dataset

The dataset, which was used for training the SAM model, consists of 11 million images and 1 billion segmentation masks. It is almost impossible to label so much data manually, therefore it is interesting to know how they did it.

The dataset was collected in three stages:

  1. Assisted-manual stage: In the first stage, they manually labeled 120k images with 4.3M masks. These images were used to train the initial version of the SAM model.

  2. Semi-automatic stage: In the second stage, they used the SAM model to pre-label images so annotators could focus only on hard-to-segment objects. In total, they labeled 5.9M masks for 180k images at this stage.

  3. Fully automatic stage: Finally, labeled images from the first two stages were used to train the model, which was more accurate at this stage. This model was used to label the remaining images. So ~97% of all images were labeled automatically.

You can explore the dataset here and even access it for research purposes only.

How to use Segment Anything to Inpaint Anything

Segment Anything allows you to create services for automatic image editing based on text prompts. You can use it to detect objects and inpaint them using Stable Diffusion. If you are not familiar with inpainting, check my recent post on this topic. Here's what you can do:

  1. Edit images based on where the user clicks. Use Inpaint Anything for this purpose.

  2. Grounded-SAM combines Grounded-DINO with Segment Anything and Stable Diffusion. The first model can detect objects based on text prompts. Combined with the Segment Anything model, it is possible to predict a mask on an image based on a text prompt. This mask can be used as an input for the Stable Diffusion, which can inpaint it.

Another option is to create a labeling service based on SAM. Again, take a look at Grounded-SAM and its examples. It can be used for semi-automatic labeling, which speeds up the labeling process a lot.

Segment Anything Website | Demo | Code

This is all for today. Thank you for reading.