Sam V2 Video

SAM v2 Video by Meta AI, allows promptable segmentation of objects in videos.

API

If you're looking for an API, you can choose from your desired programming language.

POST

import requests
import base64

# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
    with open(image_path, 'rb') as f:
        image_data = f.read()
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
    response = requests.get(image_url)
    image_data = response.content
    return base64.b64encode(image_data).decode('utf-8')

api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sam-v2-video"

# Request payload
data = {
  "input_video": "https://segmind-resources.s3.amazonaws.com/input/650fd563-ca86-4c29-8bce-165b6e23bd41-couple_dance.mp4",
  "prompt": "man",
  "overlay_mask": True
}

headers = {'x-api-key': api_key}

response = requests.post(url, json=data, headers=headers)
print(response.content)  # The response is the generated image

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

input_videostr *

Input video

promptstr *

Text prompt for processing the video

overlay_maskbool *

Whether to overlay a mask on the video

coordinatesstr ( default: 1 )

Coordinates for image selection (optional): Provide either a prompt or coordinates. If a prompt is provided, coordinates will be ignored. For a single coordinate, use the format [834,74]. For multiple coordinates, use [[839,74], [844,20], ...].

remove_coordinatesstr ( default: 1 )

Coordinates to be removed (optional), format is similar to Coordinates

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

SAM V2 Video

Overview

The Segment Anything V2 (SAM V2) video is an advanced AI model that excels at object segmentation in videos by using text prompt. This innovative model has the remarkable ability to identify and segment any object, even if it has never encountered it before. Simply input a video, and SAM V2 outputs segmented objects, demonstrating its versatility in various industries and applications.

Model Description

Capabilities

SAM V2 is incredibly skillful in zero-shot generalization, enabling it to segment objects in real-time within videos, even if it has never seen these objects before.
It has the distinct feature of offering interactive, prompt-able object segmentation, adding a layer of dynamic customization to the segmentation process.
The model is a significant aid in creative applications, improving tools for visual data annotation, and enhancing computer vision systems.

Creator

SAM V2 was meticulously developed and released by the Meta AI Research team, previously known as Facebook AI.

Training Data Info

The rigorous training of SAM V2 made use of the comprehensive SA-V dataset, which consists of about 51,000 videos captured in the real world and over 600,000 masklets, which are spatio-temporal masks.
This extensive dataset allowed for significant fine-tuning and enhancement in the performance of the new model version.

Technical Architecture

SAM V2 represents an architectural progression from the original SAM, extending its capabilities from imaging to video application.
The model incorporates a memory mechanism for mask propagation across video frames, enabling the creation of masklets.
It has a memory mechanism comprising a memory encoder, a memory bank, and a memory attention module, which collectively store object information and previous interactions to provide consistent masklet predictions across timeframes.

Strengths

SAM V2 offers a remarkable improvement in its video segmentation performance, reducing interaction time by a factor of three.
It's extremely effective in generalizing to, and segmenting, unfamiliar objects in any given video or image.
The model significantly enhances the efficiency and accuracy of visual data annotation, and excels in real-time object tracking.
It has the potential to significantly advance creative tools and computer vision technologies by harnessing its capabilities.

How to Use

Stepwise Guide to Using the Sam V2 Video

Upload the Video:

Click on the "Click or Drag-n-Drop" area within the "Input Video" section to upload your video file (mp4.).

Enter the Prompt: In the "Prompt" text box, enter a descriptive word or phrase that represents the object you want to segment. For example, if you want to segment a "suit," type "suit" in this box.
Optional: Enter Coordinates: If you have specific coordinates for the segmentation, enter them in the "Coordinates (optional)" field. This can aid in precise location-based segmentation.
Optional: Remove Coordinates: If you need to remove certain coordinates or specific parts from the segmentation, enter those coordinates in the "Remove Coordinates (optional)" field.
Use Advanced Parameters: Click on the "Advanced Parameters" dropdown to access additional settings. Advanced parameters include wiring options for mask prediction, adjusting the segmentation algorithm sensitivity, and improving video segmentation through memory mechanism configurations among other things.
Overlay Mask: Ensure the "Overlay Mask" checkbox is selected if you want the segmented mask to be displayed over the video.

Use Cases

Automated Video Editing: SAM V2 can facilitate automatic and precise video editing by isolating and tracking objects in a video sequence, bypassing the need for manual intervention.
Content Moderation: The model can analyze and segment objects in videos quickly, assisting in real-time monitoring and content moderation on social media platforms.
Interactive Multimedia: For creating interactive multimedia content, the model's ability to segment objects dynamically in live video feeds proves beneficial.
Surveillance Systems: SAM V2 can significantly improve surveillance systems' efficiency by enabling real-time tracking and segmentation of objects.
Virtual Backgrounds: SAM V2 can create virtual backgrounds in video conferences by segmenting the user from their background in real-time.
Visual Data Annotation: The model enables efficient dataset training for AI models by accelerating the process of annotating visual data through precise, automated segmentation.

Other Popular Models

faceswap-v2

Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl-inpaint

This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

codeformer

CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

sd2.1-faceswapper