1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sam-v2-video"
# Request payload
data = {
"input_video": "https://segmind-resources.s3.amazonaws.com/input/650fd563-ca86-4c29-8bce-165b6e23bd41-couple_dance.mp4",
"prompt": "man",
"overlay_mask": True
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Input video
Text prompt for processing the video
Whether to overlay a mask on the video
Coordinates for image selection (optional): Provide either a prompt or coordinates. If a prompt is provided, coordinates will be ignored. For a single coordinate, use the format [834,74]. For multiple coordinates, use [[839,74], [844,20], ...].
Coordinates to be removed (optional), format is similar to Coordinates
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
The Segment Anything V2 (SAM V2) video is an advanced AI model that excels at object segmentation in videos by using text prompt. This innovative model has the remarkable ability to identify and segment any object, even if it has never encountered it before. Simply input a video, and SAM V2 outputs segmented objects, demonstrating its versatility in various industries and applications.
SAM V2 is incredibly skillful in zero-shot generalization, enabling it to segment objects in real-time within videos, even if it has never seen these objects before.
It has the distinct feature of offering interactive, prompt-able object segmentation, adding a layer of dynamic customization to the segmentation process.
The model is a significant aid in creative applications, improving tools for visual data annotation, and enhancing computer vision systems.
The rigorous training of SAM V2 made use of the comprehensive SA-V dataset, which consists of about 51,000 videos captured in the real world and over 600,000 masklets, which are spatio-temporal masks.
This extensive dataset allowed for significant fine-tuning and enhancement in the performance of the new model version.
SAM V2 represents an architectural progression from the original SAM, extending its capabilities from imaging to video application.
The model incorporates a memory mechanism for mask propagation across video frames, enabling the creation of masklets.
It has a memory mechanism comprising a memory encoder, a memory bank, and a memory attention module, which collectively store object information and previous interactions to provide consistent masklet predictions across timeframes.
SAM V2 offers a remarkable improvement in its video segmentation performance, reducing interaction time by a factor of three.
It's extremely effective in generalizing to, and segmenting, unfamiliar objects in any given video or image.
The model significantly enhances the efficiency and accuracy of visual data annotation, and excels in real-time object tracking.
It has the potential to significantly advance creative tools and computer vision technologies by harnessing its capabilities.
Enter the Prompt: In the "Prompt" text box, enter a descriptive word or phrase that represents the object you want to segment. For example, if you want to segment a "suit," type "suit" in this box.
Optional: Enter Coordinates: If you have specific coordinates for the segmentation, enter them in the "Coordinates (optional)" field. This can aid in precise location-based segmentation.
Optional: Remove Coordinates: If you need to remove certain coordinates or specific parts from the segmentation, enter those coordinates in the "Remove Coordinates (optional)" field.
Use Advanced Parameters: Click on the "Advanced Parameters" dropdown to access additional settings. Advanced parameters include wiring options for mask prediction, adjusting the segmentation algorithm sensitivity, and improving video segmentation through memory mechanism configurations among other things.
Overlay Mask: Ensure the "Overlay Mask" checkbox is selected if you want the segmented mask to be displayed over the video.
Automated Video Editing: SAM V2 can facilitate automatic and precise video editing by isolating and tracking objects in a video sequence, bypassing the need for manual intervention.
Content Moderation: The model can analyze and segment objects in videos quickly, assisting in real-time monitoring and content moderation on social media platforms.
Interactive Multimedia: For creating interactive multimedia content, the model's ability to segment objects dynamically in live video feeds proves beneficial.
Surveillance Systems: SAM V2 can significantly improve surveillance systems' efficiency by enabling real-time tracking and segmentation of objects.
Virtual Backgrounds: SAM V2 can create virtual backgrounds in video conferences by segmenting the user from their background in real-time.
Visual Data Annotation: The model enables efficient dataset training for AI models by accelerating the process of annotating visual data through precise, automated segmentation.