# Segmind API — Model Directory

> Segmind provides serverless GPU inference APIs for 200+ generative AI models including image generation, video generation, audio, LLMs, and more. Pay-per-use pricing with no infrastructure to manage.

- **Base URL**: `https://api.segmind.com/v1/{model_slug}`
- **Authentication**: Bearer token (API key)
- **Docs**: https://docs.segmind.com

For detailed API documentation, parameters, pricing, and code examples for any model, fetch:
  `https://www.segmind.com/models/{slug}/llms.txt`

---

## videoToVideo

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Bria Increase Video Resolution | bria-increase-video-resolution | Transform your videos with AI-powered upscaling and seamless background removal for professional quality. | videoToVideo | $1.0636303030303031 | 192.3339s | [llms.txt](https://www.segmind.com/models/bria-increase-video-resolution/llms.txt) |
| Bria Remove Video Background | bria-remove-video-background | Bria Video AI enhances videos up to 8K while seamlessly removing backgrounds for professional quality content. | videoToVideo | $2.1502712328767126 | 41.1538s | [llms.txt](https://www.segmind.com/models/bria-remove-video-background/llms.txt) |
| Bria Video Eraser | bria-erase-video | Effortlessly remove unwanted objects from videos while preserving audio and reconstructing backgrounds seamlessly. | videoToVideo | $0.3192 | 120.7891s | [llms.txt](https://www.segmind.com/models/bria-erase-video/llms.txt) |
| Esrgan Video Upscaler | esrgan-video-upscaler | ESRGAN Video Upscaler: Experience sharper, clearer 4k videos with ESRGAN. This AI-powered video upscaler boosts resoluti | videoToVideo | $0.32178350502693026 | 157.28935s | [llms.txt](https://www.segmind.com/models/esrgan-video-upscaler/llms.txt) |
| FlashVSR | flashvsr | FlashVSR enhances video quality in real-time, processing high-resolution content | videoToVideo | $1.1762287625 | 163.0629s | [llms.txt](https://www.segmind.com/models/flashvsr/llms.txt) |
| Heygen Video Translate | heygen-video-translate | Transforms videos into multiple languages with natural voice and lip-sync, enhancing global engagement. | videoToVideo | $0.49621668750000003 | 171.24743s | [llms.txt](https://www.segmind.com/models/heygen-video-translate/llms.txt) |
| Kling 2.6 Pro Motion Control | kling-2.6-pro-motion-control | Transform static images into lifelike animations by extracting motion from videos with precision and ease. | videoToVideo | $1.7538774683544305 | 596.74961s | [llms.txt](https://www.segmind.com/models/kling-2.6-pro-motion-control/llms.txt) |
| Kling 2.6 Standard Motion Control | kling-2.6-standard-motion-control | Kling Motion Control enables precise motion transfer from videos to custom characters, preserving identity and movement  | videoToVideo | $0.9665384615384617 | 608.2322s | [llms.txt](https://www.segmind.com/models/kling-2.6-standard-motion-control/llms.txt) |
| Kling O1 Video 2 Video Edit | kling-o1-video-to-video-edit | Kling Video O1 revolutionizes video editing through natural language commands for seamless, high-quality content creatio | videoToVideo | $1.2999646153846152 | 255.42274s | [llms.txt](https://www.segmind.com/models/kling-o1-video-to-video-edit/llms.txt) |
| Kling O1 Video 2 Video Reference | kling-o1-video-to-video-reference | Kling Omni Video O1 generates visually coherent videos from references, ensuring identity preservation in every frame. | videoToVideo | $1.4236173913043475 | 239.7181s | [llms.txt](https://www.segmind.com/models/kling-o1-video-to-video-reference/llms.txt) |
| Kling O3 Video To Video Edit | kling-o3-video2video-edit | Edit any video with text — swap backgrounds, inject characters, and restyle scenes using Kling O3's AI video-to-video mo | videoToVideo | $2.560833333333333 | 210.53405s | [llms.txt](https://www.segmind.com/models/kling-o3-video2video-edit/llms.txt) |
| Kling O3 Video To Video Reference | kling-o3-video2video-reference | Transform any video with AI — swap characters, change styles, and edit scenes using reference images and natural languag | videoToVideo | $1.7935135135135138 | 236.48518s | [llms.txt](https://www.segmind.com/models/kling-o3-video2video-reference/llms.txt) |
| LTX Retake Video | ltx-retake-video | Retake enables precise edits in video segments, maintaining continuity while enhancing dialogue and emotional delivery. | videoToVideo | $0.7911607142857143 | 38.5012s | [llms.txt](https://www.segmind.com/models/ltx-retake-video/llms.txt) |
| Multi Video Merge | multi-video-merge | Multi Video Merge | videoToVideo | $0.032872341353383454 | 97.83184s | [llms.txt](https://www.segmind.com/models/multi-video-merge/llms.txt) |
| Pixverse Lipsync | pixverse-lipsync | PixVerse Lipsync expertly synchronizes lip movements to audio for flawless video content creation. | videoToVideo | $0.31236277056277056 | 112.43345s | [llms.txt](https://www.segmind.com/models/pixverse-lipsync/llms.txt) |
| Runway Gen4 Aleph | runway-gen4-aleph | Runway Aleph revolutionizes video editing with intelligent automation for seamless object and environment manipulation. | videoToVideo | $1.125 | 171.79606s | [llms.txt](https://www.segmind.com/models/runway-gen4-aleph/llms.txt) |
| Sam V2 Video | sam-v2-video | SAM v2 Video by Meta AI, allows promptable segmentation of objects in videos.  | videoToVideo | $0.05689024780269058 | 37.56451s | [llms.txt](https://www.segmind.com/models/sam-v2-video/llms.txt) |
| Sam3 Video | sam3-video | SAM 3 Video excels in real-time video segmentation and tracking of diverse objects using natural language and prompts. | videoToVideo | $0.134495616 | 117.44927s | [llms.txt](https://www.segmind.com/models/sam3-video/llms.txt) |
| Sync.so Lipsync 2 Pro | sync.so-lipsync-2-pro | Lipsync-2-Pro seamlessly synchronizes lips in videos for instant, high-quality multilingual content creation. | videoToVideo | $1.016127329842932 | 233.44394s | [llms.txt](https://www.segmind.com/models/sync.so-lipsync-2-pro/llms.txt) |
| Sync.so React 1 | sync.so-react-1 | React-1 transforms video performances by editing actors' emotions with unmatched precision and realism. | videoToVideo | $1.9782141935483872 | 347.64718s | [llms.txt](https://www.segmind.com/models/sync.so-react-1/llms.txt) |
| Topaz Labs Video Upscale | topaz-video-upscale | Topaz Video AI upscales, enhances, denoises, stabilizes, and increases frame rates in video footage, transforming low-qu | videoToVideo | $1.6247801967213114 | 215.56449s | [llms.txt](https://www.segmind.com/models/topaz-video-upscale/llms.txt) |
| Video Audio Merge | video-audio-merge | Effortlessly merge audio and video with our intuitive Video Audio Merge model. Create stunning multimedia content with p | videoToVideo | $0.0017612948109939177 | 20.09168s | [llms.txt](https://www.segmind.com/models/video-audio-merge/llms.txt) |
| Video Captioner | video-captioner | With Video Captioner create accurate, customizable subtitles for your videos effortlessly. | videoToVideo | $0.038302799028654695 | 68.54137s | [llms.txt](https://www.segmind.com/models/video-captioner/llms.txt) |
| Video Concatenate | video-concatenate | Effortlessly merge videos with customized layouts, spacing, and audio for seamless content creation. | videoToVideo | $0.0007790461538461538 | 31.6725s | [llms.txt](https://www.segmind.com/models/video-concatenate/llms.txt) |
| Video Loop | video-loop | Effortlessly loop videos for engaging social media & storytelling with our Video Loop. | videoToVideo | $0.0009493365992414667 | 8.0558s | [llms.txt](https://www.segmind.com/models/video-loop/llms.txt) |
| Wan 2.7 Video Editing | wan2.7-videoedit | Alibaba's Wan 2.7 video editing model. Edit existing videos using text instructions. Supports style transfer, content mo | videoToVideo | $0.625 | 403.73499s | [llms.txt](https://www.segmind.com/models/wan2.7-videoedit/llms.txt) |

## Image-to-Video Generation

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| AI Face Swap (image and video) | ai-face-swap | AI Face Swap: Effortlessly replace faces online. Fine-tune swaps with advanced controls for age, gender, and resolution. | Image-to-Video Generation | $0.10309356862457172 | 29.51768s | [llms.txt](https://www.segmind.com/models/ai-face-swap/llms.txt) |
| Bytedance HuMo: Human-Centric Video Generation | bytedance-humo | HuMo generates high-quality, human-centric videos from text, images, and audio with unparalleled control and precision. | Image-to-Video Generation | $5 | - | [llms.txt](https://www.segmind.com/models/bytedance-humo/llms.txt) |
| Cog videoX Image To Video | cog-video-5b-i2v | CogVideoX image-to-video is a cutting-edge AI model that converts static images into dynamic, high-quality videos. Perfe | Image-to-Video Generation | $0.3556148971819012 | 355.73991s | [llms.txt](https://www.segmind.com/models/cog-video-5b-i2v/llms.txt) |
| Easy Animate | easy-animate | Easy Animate  is a state-of-the-art image to animation model to convert static images into dynamic animations with remar | Image-to-Video Generation | $0.7981060098555375 | 208.63524s | [llms.txt](https://www.segmind.com/models/easy-animate/llms.txt) |
| Google Veo 2 Image To Video | veo-2-image2video | Discover Google Veo 2, an AI-powered image-to-video model with 4K resolution, realistic motion, and cinematic effects fo | Image-to-Video Generation | $3.957629085982983 | 40.58279s | [llms.txt](https://www.segmind.com/models/veo-2-image2video/llms.txt) |
| Hailuo 02 Fast | hailuo-02-fast | Transform any static image into a captivating, high-quality video clip effortlessly. | Image-to-Video Generation | $0.16129074265267174 | 79.67235s | [llms.txt](https://www.segmind.com/models/hailuo-02-fast/llms.txt) |
| Hailuo 2.3 | hailuo-2.3 | Hailuo 2.3 creates hyper-realistic videos from text prompts with fluid character movements and advanced expressive detai | Image-to-Video Generation | $0.5298571428571429 | 141.39697s | [llms.txt](https://www.segmind.com/models/hailuo-2.3/llms.txt) |
| Hailuo 2.3 Fast | hailuo-2.3-fast | Transform text and images into professional-quality videos at lightning speed. | Image-to-Video Generation | $0.3859139784946236 | 114.16883s | [llms.txt](https://www.segmind.com/models/hailuo-2.3-fast/llms.txt) |
| Hallo | hallo | Hallo lets you create portrait videos from single images. | Image-to-Video Generation | $0.42187299876373635 | 303.87085s | [llms.txt](https://www.segmind.com/models/hallo/llms.txt) |
| Heygen Avatar IV | heygen-avatar-iv | Transform a single photo into a lifelike talking avatar with customizable speech and gestures. | Image-to-Video Generation | $2.216007071428572 | 195.47641s | [llms.txt](https://www.segmind.com/models/heygen-avatar-iv/llms.txt) |
| Higgsfield Image 2 Video | higgsfield-image2video | Transform static images into dynamic, motion-rich videos with unparalleled control and creative depth. | Image-to-Video Generation | $0.6420145379023884 | 154.53481s | [llms.txt](https://www.segmind.com/models/higgsfield-image2video/llms.txt) |
| Higgsfield Speech 2 Video | higgsfield-speech2video | Transform images and audio into dynamic, lip-synced videos for engaging digital content. | Image-to-Video Generation | $1.9714583333333333 | 290.66342s | [llms.txt](https://www.segmind.com/models/higgsfield-speech2video/llms.txt) |
| HyperSwap: Video Faceswap by FaceFusion Labs | video-faceswap-by-facefusion-labs | Hyperswap enables realistic face swapping in videos using a single identity image, preserving natural expressions and li | Image-to-Video Generation | $0.08653758041958041 | 56.52677s | [llms.txt](https://www.segmind.com/models/video-faceswap-by-facefusion-labs/llms.txt) |
| InfiniteTalk | infinite-talk | Animate images and videos with full-body motion perfectly synchronized to audio — beyond lip sync. | Image-to-Video Generation | $0.4601996661915766 | 305.5446s | [llms.txt](https://www.segmind.com/models/infinite-talk/llms.txt) |
| Kling 2 | kling-2 | Kling 2.0 is an advanced AI video generator (5 and 10 seconds) that creates cinematic, dynamic videos from text or image | Image-to-Video Generation | $2.2962962962962963 | 305.2509s | [llms.txt](https://www.segmind.com/models/kling-2/llms.txt) |
| Kling 2.1 AI Video Generator | kling-2.1 | Kling 2.1 offers hyper-realistic video generation with improved motion, sharper 1080p visuals, and instant restyling cap | Image-to-Video Generation | $0.8993846776057319 | 135.86779s | [llms.txt](https://www.segmind.com/models/kling-2.1/llms.txt) |
| Kling 2.5 Turbo | kling-2.5-turbo | Kling AI 2.5 Turbo generates fluid, cinematic videos from text and images, enhancing content creation and storytelling. | Image-to-Video Generation | $0.5664200792602381 | 134.33973s | [llms.txt](https://www.segmind.com/models/kling-2.5-turbo/llms.txt) |
| Kling 2.6 | kling-2.6 | Transforms still images into immersive, cinematic videos with synchronized audio in seconds. | Image-to-Video Generation | $1.073478561549101 | 122.79903s | [llms.txt](https://www.segmind.com/models/kling-2.6/llms.txt) |
| Kling 3.0 Pro Image-to-Video | kling-3-pro-image2video | Kling 3.0 generates high-quality animated videos from images with dynamic motion and optional audio. | Image-to-Video Generation | $1.7839805825242727 | 297.76991s | [llms.txt](https://www.segmind.com/models/kling-3-pro-image2video/llms.txt) |
| Kling 3.0 Standard Image-to-Video | kling-3-standard-image2video | Transform starting images into cinematic 1080p videos with controlled motion and optional audio. | Image-to-Video Generation | $1.2484062500000004 | 153.30395s | [llms.txt](https://www.segmind.com/models/kling-3-standard-image2video/llms.txt) |
| Kling AI 1.6 Image to Video | kling-1.6-image2video | Kling AI 1.6 Image-to-Video is a powerful AI tool that transforms static images into captivating, animated videos. Creat | Image-to-Video Generation | $1.0531424581005626 | 289.44462s | [llms.txt](https://www.segmind.com/models/kling-1.6-image2video/llms.txt) |
| Kling AI Image to Video | kling-image2video | Kling AI Image-to-Video is a powerful AI tool that transforms static images into captivating, animated videos. Create hi | Image-to-Video Generation | $0.8644570085582906 | 317.55773s | [llms.txt](https://www.segmind.com/models/kling-image2video/llms.txt) |
| Kling Avatar V2 Standard | kling-v2-standard-avatar | Transforms images and audio into lifelike video avatars with synchronized lip movement. | Image-to-Video Generation | $2.8101888749999997 | 507.53268s | [llms.txt](https://www.segmind.com/models/kling-v2-standard-avatar/llms.txt) |
| Kling bloombloom | kling-bloombloom | Kling AI transforms text and images into dynamic, high-quality video content with realistic motion and sound. | Image-to-Video Generation | $0.9800000000000001 | 193.06476s | [llms.txt](https://www.segmind.com/models/kling-bloombloom/llms.txt) |
| Kling dizzydizzy | kling-dizzydizzy | Kling DizzyDizzy transforms static content into dynamic, high-resolution videos, enhancing engagement and storytelling f | Image-to-Video Generation | $0.9800000000000003 | 199.27494s | [llms.txt](https://www.segmind.com/models/kling-dizzydizzy/llms.txt) |
| Kling Expansion | kling-expansion | Unleash dynamic visuals with Kling Expansion! Effortlessly inflate and stretch elements for surreal and captivating effe | Image-to-Video Generation | $0.9500000000000001 | 118.23583s | [llms.txt](https://www.segmind.com/models/kling-expansion/llms.txt) |
| Kling fuzzyfuzzy | kling-fuzzyfuzzy | Transform your photos instantly into adorable, plush-toy-like visuals with Kling fuzzyfuzzy effect. | Image-to-Video Generation | $0.9635294117647061 | 132.86003s | [llms.txt](https://www.segmind.com/models/kling-fuzzyfuzzy/llms.txt) |
| Kling Heart Gesture | kling-heart-gesture | Express affection visually with Kling AI's heart gesture effect! Input two portraits and instantly create heartwarming v | Image-to-Video Generation | $0.945 | 208.38736s | [llms.txt](https://www.segmind.com/models/kling-heart-gesture/llms.txt) |
| Kling Hug | kling-hug | Create heartwarming videos instantly with Kling hug effect! Generate tender embracing animations. | Image-to-Video Generation | $0.9800000000000001 | 206.59058s | [llms.txt](https://www.segmind.com/models/kling-hug/llms.txt) |
| Kling Kiss | kling-kiss | Create a heartfelt video in seconds with Kling kiss effect! Input two portraits and instantly generate a kissing animati | Image-to-Video Generation | $1.0006434316353892 | 235.845s | [llms.txt](https://www.segmind.com/models/kling-kiss/llms.txt) |
| Kling O1 Image 2 Video | kling-o1-image-to-video | Transforms static images into dynamic, physics-driven animations for creative storytelling. | Image-to-Video Generation | $0.911967461430575 | 142.44166s | [llms.txt](https://www.segmind.com/models/kling-o1-image-to-video/llms.txt) |
| Kling O1 Reference Image 2 Video | kling-o1-reference-image-to-video | Kling Omni Video O1 transforms static images into dynamic, identity-preserving cinematic videos. | Image-to-Video Generation | $1.3417517976031959 | 198.08886s | [llms.txt](https://www.segmind.com/models/kling-o1-reference-image-to-video/llms.txt) |
| Kling O3 Image To Video | kling-o3-image2video | Kling O3 transforms static images into cinematic videos with precise motion control, multi-segment prompts, and optional | Image-to-Video Generation | $1.7911574074074073 | 197.43504s | [llms.txt](https://www.segmind.com/models/kling-o3-image2video/llms.txt) |
| Kling Squish | kling-squish | Transform your visuals with Kling AI squish effect! Easily compress and distort images/videos for playful, exaggerated e | Image-to-Video Generation | $0.9672727272727268 | 120.28359s | [llms.txt](https://www.segmind.com/models/kling-squish/llms.txt) |
| Kling V1 Pro AI Avatar | kling-v1-pro-ai-avatar | Kwaivgi Kling V1 AI Avatar Pro creates dynamic avatars with synchronized speech and realistic expressions for engaging c | Image-to-Video Generation | $4.475115916666667 | 732.24765s | [llms.txt](https://www.segmind.com/models/kling-v1-pro-ai-avatar/llms.txt) |
| Kling V1 Standard AI Avatar | kling-v1-standard-ai-avatar | Kwaivgi Kling V1 generates lifelike AI avatars with precise lip-sync for engaging multimedia presentations. | Image-to-Video Generation | $2.0203893023255812 | 403.17807s | [llms.txt](https://www.segmind.com/models/kling-v1-standard-ai-avatar/llms.txt) |
| Kling V2 Pro Avatar | kling-v2-pro-avatar | Transform image and audio into engaging avatar-driven videos for dynamic communication. | Image-to-Video Generation | $5.285169729729731 | 779.46664s | [llms.txt](https://www.segmind.com/models/kling-v2-pro-avatar/llms.txt) |
| Live Portrait | live-portrait | Live Portrait animates static images using a reference driving video through implicit key point based framework, bringin | Image-to-Video Generation | $0.05500106602341842 | 36.04155s | [llms.txt](https://www.segmind.com/models/live-portrait/llms.txt) |
| Live Portrait video to video | live-portrait-video-to-video | Experience the magic of Live Portrait’s Video-to-Video Model! Transform your static images into dynamic videos seamlessl | Image-to-Video Generation | $0.27947058954138704 | 74.44037s | [llms.txt](https://www.segmind.com/models/live-portrait-video-to-video/llms.txt) |
| LTX 2 Fast | ltx-2-fast | LTX-2-Fast is a cutting-edge text-to-video AI model by Lightricks, designed for fast video generation directly from text | Image-to-Video Generation | $0.5418243293269229 | 46.61618s | [llms.txt](https://www.segmind.com/models/ltx-2-fast/llms.txt) |
| LTX 2 Pro | ltx-2-pro | This AI model enhances decision-making by generating actionable insights from complex data sets. | Image-to-Video Generation | $0.6267278331034484 | 69.73345s | [llms.txt](https://www.segmind.com/models/ltx-2-pro/llms.txt) |
| LTX Video | ltx-video | LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produ | Image-to-Video Generation | $0.05747610571519036 | 54.98544s | [llms.txt](https://www.segmind.com/models/ltx-video/llms.txt) |
| Luma Image-to-Video | luma-img-2-video | With Luma's Dream Machine, transform your static images into dynamic videos. It offers high-fidelity video generation, r | Image-to-Video Generation | $0.9472327964861006 | 60.84224s | [llms.txt](https://www.segmind.com/models/luma-img-2-video/llms.txt) |
| Luma Modify Video | modify-video | Transform videos seamlessly with high-fidelity generative edits while preserving original actor performances. | Image-to-Video Generation | $0.6006692185185185 | 164.37768s | [llms.txt](https://www.segmind.com/models/modify-video/llms.txt) |
| Luma Ray flash 2 (720p) | ray-flash-2-720p | Generate stunning 720p videos from text with the Luma ray-flash-2-720p model. Faster & cheaper than Ray 2, offering real | Image-to-Video Generation | $0.4267687997750281 | 60.70832s | [llms.txt](https://www.segmind.com/models/ray-flash-2-720p/llms.txt) |
| Luma Ray Image to Video | luma-ray-img-2-video | With Luma's Ray2 image-to-video, transform your static images into cinematic dynamic videos. | Image-to-Video Generation | $1.5999999999999988 | 125.19429s | [llms.txt](https://www.segmind.com/models/luma-ray-img-2-video/llms.txt) |
| Minimax (Hailuo) Video-01-live | minimax-ai-live | Create stunning animations with Minimax (Hailuo) video-01-live, an AI image-to-video model perfect for Live2D, anime, an | Image-to-Video Generation | $0.625 | 167.32884s | [llms.txt](https://www.segmind.com/models/minimax-ai-live/llms.txt) |
| MiniMax AI (Hailuo) | minimax-ai | With Video-01 by MiniMax, create high-definition videos at 720p resolution and 25fps, featuring cinematic camera movemen | Image-to-Video Generation | $0.6009696074594835 | 177.55037s | [llms.txt](https://www.segmind.com/models/minimax-ai/llms.txt) |
| Minimax Hailou 2 | minimax-hailuo-2 | Generate breathtaking 1080P cinematic videos from text or images with ultra-realistic motion and physics. | Image-to-Video Generation | $0.3874999999999999 | 174.24651s | [llms.txt](https://www.segmind.com/models/minimax-hailuo-2/llms.txt) |
| Motion Control SVD | motionctrl-svd | Motion Control SVD is an innovative deep learning framework that breathes life into static images. By intelligently mana | Image-to-Video Generation | $0.08928216191843767 | 62.73815s | [llms.txt](https://www.segmind.com/models/motionctrl-svd/llms.txt) |
| Muscle Surge | muscle-surge | Instantly add muscle and strength to your videos with Pixverse Muscle Surge effect! | Image-to-Video Generation | $0.41835106382978726 | 45.50548s | [llms.txt](https://www.segmind.com/models/muscle-surge/llms.txt) |
| OVI Image To Video | ovi-i2v | Ovi I2V generates synchronized video and audio from text prompts, creating engaging multimedia content effortlessly. | Image-to-Video Generation | $0.24981727088122607 | 41.92037s | [llms.txt](https://www.segmind.com/models/ovi-i2v/llms.txt) |
| Pixverse 4.5 Effects | pixverse-4.5-effects | PixVerse 4.5 transforms photos and text into stunning animated videos for impactful storytelling and marketing. | Image-to-Video Generation | $0.3975694444444444 | 46.91432s | [llms.txt](https://www.segmind.com/models/pixverse-4.5-effects/llms.txt) |
| Pixverse 4.5 Transition | pixverse-4.5-transition | PixVerse 4.5 transforms still images into dynamic, captivating videos with seamless transitions. | Image-to-Video Generation | $0.5813148788927336 | 57.82971s | [llms.txt](https://www.segmind.com/models/pixverse-4.5-transition/llms.txt) |
| Pixverse 4.5 Video | pixverse-4.5-video | Pixverse 4.5 transforms static images and text into dynamic, engaging videos for captivating social media content. | Image-to-Video Generation | $0.5827814569536424 | 45.92084s | [llms.txt](https://www.segmind.com/models/pixverse-4.5-video/llms.txt) |
| Pixverse 5 Extend | pixverse-5-extend | PixVerse Extend creates seamless, AI-generated video continuations that enhance storytelling and viewer engagement. | Image-to-Video Generation | $0.6569148936170213 | 93.2064s | [llms.txt](https://www.segmind.com/models/pixverse-5-extend/llms.txt) |
| Pixverse 5 Transition | pixverse-5-transition | PixVerse v5 generates seamless, immersive video transitions that elevate storytelling and visual content creation. | Image-to-Video Generation | $0.5 | 72.51908s | [llms.txt](https://www.segmind.com/models/pixverse-5-transition/llms.txt) |
| Pixverse 5 Video | pixverse-5-video | PixVerse V5 generates cinematic videos from text and images with stunning realism and precision. | Image-to-Video Generation | $0.6396604938271605 | 69.81873s | [llms.txt](https://www.segmind.com/models/pixverse-5-video/llms.txt) |
| Pixverse Image to Video | pixverse-image2video | Animate your photos effortlessly with Pixverse Image to Video AI! Upload, add motion prompts and styles. | Image-to-Video Generation | $0.5467236467236467 | 41.08404s | [llms.txt](https://www.segmind.com/models/pixverse-image2video/llms.txt) |
| Pixverse Transition | pixverse-transition | PixVerse V4 transforms static images and text into dynamic, visually stunning videos for creators across various industr | Image-to-Video Generation | $0.8727272727272727 | 57.65282s | [llms.txt](https://www.segmind.com/models/pixverse-transition/llms.txt) |
| Pixverse V6 | pixverse-v6 | Generate stunning AI videos up to 15s with native audio, cinematic camera controls, and image-to-video support via PixVe | Image-to-Video Generation | $0.5704166666666666 | 82.6618s | [llms.txt](https://www.segmind.com/models/pixverse-v6/llms.txt) |
| Runway Gen 4 Turbo | runway-gen4-turbo | Generate videos faster and cheaper with Runway Gen-4 Turbo! Create high-quality text, image, and combined video generati | Image-to-Video Generation | $0.774324903557651 | 38.00399s | [llms.txt](https://www.segmind.com/models/runway-gen4-turbo/llms.txt) |
| Runway Gen Alpha Turbo Image to Video | runway-gen3-alphaturbo | Runway Gen-3 AlphaTurbo is a cutting-edge AI tool that transforms static images into dynamic videos with exceptional fid | Image-to-Video Generation | $0.5561636957067537 | 27.3851s | [llms.txt](https://www.segmind.com/models/runway-gen3-alphaturbo/llms.txt) |
| SadTalker | sadtalker | Audio-based Lip Synchronization for Talking Head Video | Image-to-Video Generation | $0.17849439716260992 | 109.17932s | [llms.txt](https://www.segmind.com/models/sadtalker/llms.txt) |
| Seedance 1.0 lite i2v | seedance-v1-lite-image-to-video | Seedance 1.0 transforms text and images into engaging 720p dynamic videos with cinematic storytelling. | Image-to-Video Generation | $0.10694209778708141 | 39.57189s | [llms.txt](https://www.segmind.com/models/seedance-v1-lite-image-to-video/llms.txt) |
| Seedance 1.0 Pro | seedance-pro | Seedance Pro transforms text and images into engaging 720p dynamic videos with cinematic storytelling. | Image-to-Video Generation | $0.3506496098278032 | 62.10605s | [llms.txt](https://www.segmind.com/models/seedance-pro/llms.txt) |
| Seedance 1.0 Pro Fast | seedance-1.0-pro-fast | Seedance 1.0 Pro Fast creates cinematic-quality videos from text and images at unprecedented speed. | Image-to-Video Generation | $0.24658977411376745 | 48.67694s | [llms.txt](https://www.segmind.com/models/seedance-1.0-pro-fast/llms.txt) |
| Seedance 1.5 Pro | seedance-1.5-pro | Seedance 1.5 Pro generates synchronized video and audio for dynamic storytelling and immersive content creation. | Image-to-Video Generation | $0.4143735858055358 | 99.37376s | [llms.txt](https://www.segmind.com/models/seedance-1.5-pro/llms.txt) |
| Sora 2 | sora-2 | Sora 2 transforms detailed text descriptions into stunning, dynamic videos within seconds. | Image-to-Video Generation | $1.0151650312221234 | 178.44405s | [llms.txt](https://www.segmind.com/models/sora-2/llms.txt) |
| Sora 2 Pro | sora-2-pro | Sora 2 Pro generates cinematic-quality videos from text and images with unmatched realism and detail. | Image-to-Video Generation | $3.1467576791808867 | 403.10792s | [llms.txt](https://www.segmind.com/models/sora-2-pro/llms.txt) |
| Stable Video Diffusion | svd | Takes image as input and returns a video. | Image-to-Video Generation | $0.16589559029599096 | 29.6271s | [llms.txt](https://www.segmind.com/models/svd/llms.txt) |
| Tooncrafter | tooncrafter | Create videos from illustrated input images | Image-to-Video Generation | $0.12288504435102474 | 108.41335s | [llms.txt](https://www.segmind.com/models/tooncrafter/llms.txt) |
| V Express | v-express | V-Express lets you create portrait videos from single images. | Image-to-Video Generation | $0.26351073672376873 | 196.28656s | [llms.txt](https://www.segmind.com/models/v-express/llms.txt) |
| Veo 3.1 | veo-3.1 | Transform static images into dynamic, high-quality videos with synchronized audio and precise creative control. | Image-to-Video Generation | $2.1533773489080756 | 110.56481s | [llms.txt](https://www.segmind.com/models/veo-3.1/llms.txt) |
| Veo 3.1 Fast | veo-3.1-fast | Transforms static images into dynamic 1080p videos with synchronized audio and natural motion. | Image-to-Video Generation | $0.8599728322390767 | 99.41785s | [llms.txt](https://www.segmind.com/models/veo-3.1-fast/llms.txt) |
| Veo 3.1 Lite | veo-3.1-lite | Generate high-quality AI videos with audio from text or images using Google's most affordable video model. | Image-to-Video Generation | $0.9027777777777778 | 47.14036s | [llms.txt](https://www.segmind.com/models/veo-3.1-lite/llms.txt) |
| Video Faceswap | videofaceswap | Video Faceswap is  a powerful tool for creators, filmmakers, and meme enthusiasts. With this innovative technology, you  | Image-to-Video Generation | $0.41188478458625727 | 184.88331s | [llms.txt](https://www.segmind.com/models/videofaceswap/llms.txt) |
| Video Frame Interpolation | video-frame-interpolation | FILM synthesizes smooth, high-quality intermediate frames for fluid motion in videos with significant movement. | Image-to-Video Generation | $5 | - | [llms.txt](https://www.segmind.com/models/video-frame-interpolation/llms.txt) |
| Video Stitch | video-stitch | Revolutionize your video editing with the Video Stitch Model. Seamlessly stitch clips, add captivating audio, and create | Image-to-Video Generation | $0.0028857760868437454 | 30.38754s | [llms.txt](https://www.segmind.com/models/video-stitch/llms.txt) |
| Video Tryon | video-tryon | Video Tryon is Segmind’s next-generation AI video model for instant virtual try-on, allowing users to visualize any outf | Image-to-Video Generation | $1.6059995424855487 | 205.12761s | [llms.txt](https://www.segmind.com/models/video-tryon/llms.txt) |
| Video Watermark Remover | video-watermark-remover | Remove watermarks from videos instantly with AI. Upload, process, and download clean videos in just seconds — no manual  | Image-to-Video Generation | $0.8275678268292683 | 194.82s | [llms.txt](https://www.segmind.com/models/video-watermark-remover/llms.txt) |
| Vidu Q1 Reference to Video | vidu-q1-reference-to-video | Vidu AI reference to video transforms text and images into dynamic, high-quality videos effortlessly. | Image-to-Video Generation | $0.5 | 120.68598s | [llms.txt](https://www.segmind.com/models/vidu-q1-reference-to-video/llms.txt) |
| Vidu Template | vidu-template | Transform static images into captivating videos using diverse motion templates effortlessly. | Image-to-Video Generation | $0.0625 | 125.76909s | [llms.txt](https://www.segmind.com/models/vidu-template/llms.txt) |
| Wan 2.1 480p image to video | wan2.1-i2v-480p | Create high-quality 480p videos with excellent visual quality and a broad spectrum of motion from static images. | Image-to-Video Generation | $0.5302190763528138 | 53.38121s | [llms.txt](https://www.segmind.com/models/wan2.1-i2v-480p/llms.txt) |
| Wan 2.1 720p image to video | wan2.1-i2v-720p | Create high-quality 720p videos with excellent visual quality and a broad spectrum of motion from static images. | Image-to-Video Generation | $1.4477480769691775 | 148.71608s | [llms.txt](https://www.segmind.com/models/wan2.1-i2v-720p/llms.txt) |
| Wan 2.2 Image to Video Fast | wan-2.2-i2v-fast | Transforms simple text prompts into breathtaking cinematic-quality videos in minutes. | Image-to-Video Generation | $0.0872760296401044 | 52.64982s | [llms.txt](https://www.segmind.com/models/wan-2.2-i2v-fast/llms.txt) |
| Wan 2.2 Image to Video Flash | wan-2.2-i2v-flash | Transform a single image and text prompt into a coherent, dynamic video. | Image-to-Video Generation | $0.17272985244040864 | 68.10579s | [llms.txt](https://www.segmind.com/models/wan-2.2-i2v-flash/llms.txt) |
| Wan 2.5 Image to Video | wan-2.5-i2v | Wan2.5-Preview creates stunning, high-resolution videos with flawless audio synchronization from multiple inputs. | Image-to-Video Generation | $0.8588397896076353 | 177.58862s | [llms.txt](https://www.segmind.com/models/wan-2.5-i2v/llms.txt) |
| Wan 2.6 Image To Video | wan-2.6-i2v | Wan 2.6 transforms text and images into high-quality videos with precise audio sync, perfect for engaging content creati | Image-to-Video Generation | $1.1879453426553672 | 138.55541s | [llms.txt](https://www.segmind.com/models/wan-2.6-i2v/llms.txt) |
| Wan 2.6 Text To Video | wan-2.6-t2v | Transforms text and audio into high-quality cinematic videos with seamless storytelling and synchronization. | Image-to-Video Generation | $1.015625 | 183.96646s | [llms.txt](https://www.segmind.com/models/wan-2.6-t2v/llms.txt) |
| Wan 2.7 Image to Video | wan2.7-i2v | Animate any image into a cinematic video up to 1080P and 15 seconds with audio sync, first/last frame control, and multi | Image-to-Video Generation | $0.65625 | 415.22486s | [llms.txt](https://www.segmind.com/models/wan2.7-i2v/llms.txt) |
| Wan 2.7 Reference to Video | wan2.7-r2v | Generate character-consistent videos from reference images with multi-subject support and voice cloning up to 1080P. | Image-to-Video Generation | $0.78125 | 194.21016s | [llms.txt](https://www.segmind.com/models/wan2.7-r2v/llms.txt) |
| Wan Animate | wan-animate | Wan-Animate seamlessly animates characters and replaces subjects in videos, ensuring fluid realism and environmental con | Image-to-Video Generation | $1.54193124486692 | 411.90289s | [llms.txt](https://www.segmind.com/models/wan-animate/llms.txt) |
| Wan Scail | scail | SCAIL generates professional-quality character animations from reference images and motion videos with exceptional pose  | Image-to-Video Generation | $1.929277171612903 | 429.39176s | [llms.txt](https://www.segmind.com/models/scail/llms.txt) |
| Wan Video Effects | video-effects | Transform your videos with diverse video effects. Start creating captivating videos today. | Image-to-Video Generation | $0.524892328095238 | 126.12837s | [llms.txt](https://www.segmind.com/models/video-effects/llms.txt) |
| Warmth of Jesus | warmth-of-jesus | Experience the viral "Warmth of Jesus" effect on PixVerse! Transform your images into heartwarming videos of Jesus embra | Image-to-Video Generation | $0.3907894736842105 | 50.30539s | [llms.txt](https://www.segmind.com/models/warmth-of-jesus/llms.txt) |

## Text-to-Video Generation

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Cog Video X 5B | cog-video-5b-t2v | CogVideo is a groundbreaking AI model that turns text into high-quality videos. Create realistic scenes, animations, and | Text-to-Video Generation | $0.35555723622004365 | 229.23271s | [llms.txt](https://www.segmind.com/models/cog-video-5b-t2v/llms.txt) |
| Google Veo 2 | veo-2 | Create stunning, realistic videos with Veo 2, Google's state-of-the-art AI video generation model. Experience enhanced q | Text-to-Video Generation | $4.2576142889376225 | 39.87097s | [llms.txt](https://www.segmind.com/models/veo-2/llms.txt) |
| Google Veo 3 | veo-3 | Veo 3 revolutionizes video creation with advanced text-to-video generation and realistic audio synthesis for cinematic c | Text-to-Video Generation | $5.199440229759413 | 144.76896s | [llms.txt](https://www.segmind.com/models/veo-3/llms.txt) |
| Hunyuan Video | hunyuan-video | Hunyuan AI Video is a new, state of the art, AI Video Generator that creates high-quality videos from text descriptions. | Text-to-Video Generation | $1.4642586306485037 | 211.32017s | [llms.txt](https://www.segmind.com/models/hunyuan-video/llms.txt) |
| Kling 3.0 Pro Text-to-Video | kling-3-pro-text2video | Kling 3.0 generates cinematic 1080p videos with realistic audio and structured storytelling. | Text-to-Video Generation | $3.3413333333333335 | 308.71771s | [llms.txt](https://www.segmind.com/models/kling-3-pro-text2video/llms.txt) |
| Kling 3.0 Standard Text-to-Video | kling-3-standard-text2video | Kling 3.0 creates stunning 1080p cinematic videos from simple text prompts with realistic motion and audio. | Text-to-Video Generation | $1.609263157894737 | 174.80365s | [llms.txt](https://www.segmind.com/models/kling-3-standard-text2video/llms.txt) |
| Kling AI 1.6 Text to Video | kling-1.6-text2video | Kling AI 1.6 Text-to-Video is a cutting-edge AI tool that transforms text into stunning, lifelike videos. Create profess | Text-to-Video Generation | $0.6886923076923093 | 304.07888s | [llms.txt](https://www.segmind.com/models/kling-1.6-text2video/llms.txt) |
| Kling AI Text to Video | kling-text2video | Kling AI Text-to-Video is a cutting-edge AI tool that transforms text into stunning, lifelike videos. Create professiona | Text-to-Video Generation | $0.42746298124383053 | 320.05708s | [llms.txt](https://www.segmind.com/models/kling-text2video/llms.txt) |
| Kling O3 Text-to-Video | kling-o3-text2video | Generate cinematic AI videos up to 15 seconds with native audio, multi-shot control, and physics-accurate motion via API | Text-to-Video Generation | $2.3899999999999997 | 162.1328s | [llms.txt](https://www.segmind.com/models/kling-o3-text2video/llms.txt) |
| LTX-2-19B I2V | ltx-2-19b-i2v | LTX-2 generates synchronized 4K audio-video content efficiently and realistically in a single pass. | Text-to-Video Generation | $0.4547708059322034 | 113.3083s | [llms.txt](https://www.segmind.com/models/ltx-2-19b-i2v/llms.txt) |
| LTX-2-19B T2V | ltx-2-19b-t2v | LTX-2 generates synchronized video and audio from multiple input types, revolutionizing multimedia content creation. | Text-to-Video Generation | $0.419758021686747 | 105.04678s | [llms.txt](https://www.segmind.com/models/ltx-2-19b-t2v/llms.txt) |
| Luma Ray Text to Video | luma-ray-txt-2-video | Luma Ray2 text-to-video creates realistic, coherent videos from your text prompts. | Text-to-Video Generation | $1.599999999999998 | 114.21931s | [llms.txt](https://www.segmind.com/models/luma-ray-txt-2-video/llms.txt) |
| Luma Text-to-Video  | luma-txt-2-video | Luma Video (Text to Video) is an advanced AI model that turns text prompts into captivating videos. Designed for creator | Text-to-Video Generation | $0.9470967741935521 | 62.45225s | [llms.txt](https://www.segmind.com/models/luma-txt-2-video/llms.txt) |
| Minimax AI Director | minimax-ai-director | Minimax video-01-director: Create high-quality videos with control camera movements precisely using text prompts. | Text-to-Video Generation | $0.625 | 154.51132s | [llms.txt](https://www.segmind.com/models/minimax-ai-director/llms.txt) |
| Mochi 1 | mochi-1 | Mochi 1 is a cutting-edge, open-source AI model that transforms text prompts into stunning, high-fidelity videos. Create | Text-to-Video Generation | $0.2664455583242058 | 180.10768s | [llms.txt](https://www.segmind.com/models/mochi-1/llms.txt) |
| Pixverse Text to Video | pixverse-text2video | Effortlessly create captivating videos from text with Pixverse text to video AI! Customize style, duration, and more. | Text-to-Video Generation | $0.4281746031746032 | 44.22598s | [llms.txt](https://www.segmind.com/models/pixverse-text2video/llms.txt) |
| Seedance 1.0 lite t2v | seedance-v1-lite-text-to-video | Seedance V1 Lite transforms text into high-quality videos, streamlining content creation for diverse applications. | Text-to-Video Generation | $0.1977912772585669 | 47.02101s | [llms.txt](https://www.segmind.com/models/seedance-v1-lite-text-to-video/llms.txt) |
| Veo 3 Fast | veo-3-fast | Veo 3 Fast rapidly creates high-quality, 8-second videos with synchronized audio for diverse content needs. | Text-to-Video Generation | $1.6274891774891773 | 80.33263s | [llms.txt](https://www.segmind.com/models/veo-3-fast/llms.txt) |
| Wan 2.2 Text to Video Fast | wan-2.2-t2v-fast | Wan2.2 transforms text and images into high-quality video clips with cinematic flair. | Text-to-Video Generation | $0.09861429665379666 | 96.3707s | [llms.txt](https://www.segmind.com/models/wan-2.2-t2v-fast/llms.txt) |
| Wan 2.5 Text to Video | wan-2.5-t2v | Wan2.5-Preview generates synchronized multimedia content, merging text, image, video, and audio seamlessly. | Text-to-Video Generation | $0.8021737153716216 | 213.44395s | [llms.txt](https://www.segmind.com/models/wan-2.5-t2v/llms.txt) |
| Wan 2.7 Text to Video | wan2.7-t2v | Generate cinematic 1080P videos from text with audio sync, multi-shot control, and 15-second duration via Wan 2.7. | Text-to-Video Generation | $0.78125 | 328.31482s | [llms.txt](https://www.segmind.com/models/wan2.7-t2v/llms.txt) |
| Wan_2.1 Text to Video | wan2.1-t2v | Create visually impressive and feature varied, lifelike motion videos with Wan2.1 using text prompts. | Text-to-Video Generation | $0.8552705145173747 | 104.81362s | [llms.txt](https://www.segmind.com/models/wan2.1-t2v/llms.txt) |

## Text-to-Image Generation

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Background Eraser | background-eraser | Background Eraser helps in flawless background removal with exceptional accuracy. | Text-to-Image Generation | $0.0006150026367029045 | 0.79085s | [llms.txt](https://www.segmind.com/models/background-eraser/llms.txt) |
| Bria 3.2 Text to Image | bria-text-to-image | Bria 3.2 AI transforms natural language into stunning visuals for diverse creative applications — with Base, Fast, and H | Text-to-Image Generation | $0.03890776699029126 | 21.90631s | [llms.txt](https://www.segmind.com/models/bria-text-to-image/llms.txt) |
| Bria Vector Graphics | bria-text-to-vector-graphics | Bria Vision enables high-quality text-to-image and text-to-vector graphic generation for versatile commercial use. | Text-to-Image Generation | $0.039275362318840594 | 17.91585s | [llms.txt](https://www.segmind.com/models/bria-text-to-vector-graphics/llms.txt) |
| Chroma  | chroma | Chroma is an open-source, 8.9B parameter text-to-image model (based on FLUX.1-schnell) designed for diverse and uncensor | Text-to-Image Generation | $0.05539948845537652 | 53.20123s | [llms.txt](https://www.segmind.com/models/chroma/llms.txt) |
| Colossus Lightning SDXL | sdxl1.0-colossus-lightning | Colossus Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images i | Text-to-Image Generation | $0.007343366528711855 | 3.41171s | [llms.txt](https://www.segmind.com/models/sdxl1.0-colossus-lightning/llms.txt) |
| Copax Timeless SDXL | sdxl1.0-timeless | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software. | Text-to-Image Generation | $0.01030139854655408 | 5.44407s | [llms.txt](https://www.segmind.com/models/sdxl1.0-timeless/llms.txt) |
| Cyber Realistic | sd1.5-cyberrealistic | The most versatile photorealistic model that blends various models to achieve the amazing realistic images. | Text-to-Image Generation | $0.002858053399280493 | 1.4938s | [llms.txt](https://www.segmind.com/models/sd1.5-cyberrealistic/llms.txt) |
| DreamShaper Lightning SDXL | sdxl1.0-dreamshaper-lightning | DreamShaper Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px image | Text-to-Image Generation | $0.006107785901437544 | 3.17882s | [llms.txt](https://www.segmind.com/models/sdxl1.0-dreamshaper-lightning/llms.txt) |
| Dreamshaper SDXL | sdxl1.0-dreamshaper | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software. | Text-to-Image Generation | $0.011029928226258858 | 6.46014s | [llms.txt](https://www.segmind.com/models/sdxl1.0-dreamshaper/llms.txt) |
| Dynavis Lightning SDXL | sdxl1.0-dyanvis-lightning | Dynavis Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in | Text-to-Image Generation | $0.006758980029127977 | 3.81324s | [llms.txt](https://www.segmind.com/models/sdxl1.0-dyanvis-lightning/llms.txt) |
| Edge of Realism | sd1.5-edgeofrealism | This model corresponds to the Stable Diffusion Edge of Realism checkpoint for detailed images at the cost of a super det | Text-to-Image Generation | $0.003357485946572224 | 1.62107s | [llms.txt](https://www.segmind.com/models/sd1.5-edgeofrealism/llms.txt) |
| Epic Realism | sd1.5-epicrealism | This model corresponds to the Stable Diffusion Epic Realism checkpoint for detailed images at the cost of a super detail | Text-to-Image Generation | $0.00355876770097237 | 1.66511s | [llms.txt](https://www.segmind.com/models/sd1.5-epicrealism/llms.txt) |
| Fast Flux.1 Schnell | fast-flux-schnell | Fast Flux.1 Schnell by Segmind is an optimized text-to-image model designed for developers needing faster image generati | Text-to-Image Generation | $0.005470555605652221 | 2.44462s | [llms.txt](https://www.segmind.com/models/fast-flux-schnell/llms.txt) |
| Flux .1 Pro | flux-pro | Flux Pro is a state-of-the-art image generation with top of the line prompt following, visual quality, image detail and  | Text-to-Image Generation | $0.06720185128362427 | 20.42139s | [llms.txt](https://www.segmind.com/models/flux-pro/llms.txt) |
| Flux Dev Finetuned | flux-dev-finetuned | Flux is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions | Text-to-Image Generation | $0.03397461959219857 | 22.17098s | [llms.txt](https://www.segmind.com/models/flux-dev-finetuned/llms.txt) |
| Flux Realism Lora with Upscale | flux-realism-lora | Flux Realism Lora  with upscale, developed by XLabs AI is a cutting-edge model designed to generate realistic images fro | Text-to-Image Generation | $0.054015464527838154 | 38.20951s | [llms.txt](https://www.segmind.com/models/flux-realism-lora/llms.txt) |
| Flux-1.1 Pro Ultra | flux-1.1-pro-ultra | Create stunning visuals effortlessly with Flux 1.1 Pro Ultra. Experience unparalleled image quality and speed. | Text-to-Image Generation | $0.07491926231285341 | 13.97069s | [llms.txt](https://www.segmind.com/models/flux-1.1-pro-ultra/llms.txt) |
| flux-pro-1.1 | flux-1.1-pro | Flux Pro 1.1 is a cutting-edge image generation tool offering exceptional speed, quality, and customization. Ideal for d | Text-to-Image Generation | $0.04994153013171826 | 14.09401s | [llms.txt](https://www.segmind.com/models/flux-1.1-pro/llms.txt) |
| Flux.1 Dev | flux-dev | Flux Dev is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions | Text-to-Image Generation | $0.019617479861145433 | 20.4215s | [llms.txt](https://www.segmind.com/models/flux-dev/llms.txt) |
| Flux.1 Schnell | flux-schnell | Flux Schnell  is a state-of-the-art text-to-image generation model engineered for speed and efficiency. | Text-to-Image Generation | $0.00786255360689121 | 10.15551s | [llms.txt](https://www.segmind.com/models/flux-schnell/llms.txt) |
| GPT Image 1 | gpt-image-1 | Create high-quality AI-generated images from text prompts using OpenAI's GPT Image 1 model. Ideal for product design, co | Text-to-Image Generation | $0.17874917050800562 | 48.89941s | [llms.txt](https://www.segmind.com/models/gpt-image-1/llms.txt) |
| GPT Image 1 Mini | gpt-image-1-mini | GPT Image 1 Mini generates high-quality images from text descriptions, empowering efficient visual content creation. | Text-to-Image Generation | $0.037120349445005044 | 42.04229s | [llms.txt](https://www.segmind.com/models/gpt-image-1-mini/llms.txt) |
| GPT Image 1.5 | gpt-image-1.5 | GPT-Image-1.5 creates stunning, photorealistic images with exceptional detail and precision for professional application | Text-to-Image Generation | $0.16952920190274842 | 39.13016s | [llms.txt](https://www.segmind.com/models/gpt-image-1.5/llms.txt) |
| Ideogram 2a Text To Image | ideogram-2a-txt-2-img | Create captivating designs, realistic images & innovative logos with Ideogram 2a text-to-image. | Text-to-Image Generation | $0.049999999999999975 | 12.37334s | [llms.txt](https://www.segmind.com/models/ideogram-2a-txt-2-img/llms.txt) |
| Ideogram 3.0 | ideogram-3 | Ideogram 3.0 revolutionizes content creation with photorealistic text-to-image generation and diverse aesthetic styles. | Text-to-Image Generation | $0.06538634610439696 | 10.29754s | [llms.txt](https://www.segmind.com/models/ideogram-3/llms.txt) |
| Ideogram Text To Image | ideogram-txt-2-img | Ideogram Text to Image: Turn your ideas into stunning visuals instantly with this powerful AI tool. Create captivating d | Text-to-Image Generation | $0.09999999999999902 | 21.63901s | [llms.txt](https://www.segmind.com/models/ideogram-txt-2-img/llms.txt) |
| Ideogram Turbo Text To Image | ideogram-turbo-txt-2-img | Create stunning images in seconds with Ideogram Turbo Text to Image. Fast AI model for quick ideation & text rendering. | Text-to-Image Generation | $0.06299999999999999 | 12.83101s | [llms.txt](https://www.segmind.com/models/ideogram-turbo-txt-2-img/llms.txt) |
| Imagen 3 | imagen | Imagen 3 is Google DeepMind's highest quality text-to-image model. Generates detailed images with enhanced lighting, div | Text-to-Image Generation | $0.059999999999999935 | 8.15673s | [llms.txt](https://www.segmind.com/models/imagen/llms.txt) |
| Imagen 4 | imagen-4 | Imagen 4 is Google’s most advanced AI image generation model, creating detailed, photorealistic or abstract images from  | Text-to-Image Generation | $0.059999999999999894 | 11.36519s | [llms.txt](https://www.segmind.com/models/imagen-4/llms.txt) |
| Juggernaut Final | sd1.5-juggernaut | The most versatile photorealistic model that blends various models to achieve the amazing realistic images. | Text-to-Image Generation | $0.0030132355729215647 | 1.67589s | [llms.txt](https://www.segmind.com/models/sd1.5-juggernaut/llms.txt) |
| Juggernaut Lightning Flux | juggernaut-lightning-flux |  Juggernaut Lightning Flux: Blazing fast (<300ms!) & powerful inference with enhanced visuals. | Text-to-Image Generation | $0.009214037923531239 | 6.07997s | [llms.txt](https://www.segmind.com/models/juggernaut-lightning-flux/llms.txt) |
| Juggernaut Lightning SDXL | sdxl1.0-juggernaut-lightning | Juggernaut Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images | Text-to-Image Generation | $0.004175213716823214 | 3.18446s | [llms.txt](https://www.segmind.com/models/sdxl1.0-juggernaut-lightning/llms.txt) |
| Juggernaut Pro Flux | juggernaut-pro-flux | Juggernaut Pro FLUX: Create stunningly realistic AI images with unprecedented detail and sharpness. | Text-to-Image Generation | $0.012206799634988915 | 7.64268s | [llms.txt](https://www.segmind.com/models/juggernaut-pro-flux/llms.txt) |
| Kling V3 Text to Image | kling-3-text2image | Generate photorealistic, print-ready images from text using Kuaishou's Kling V3 — with native 2K output and character co | Text-to-Image Generation | $0.035 | 50.94996s | [llms.txt](https://www.segmind.com/models/kling-3-text2image/llms.txt) |
| Luma Photon Flash Text to Image | luma-photon-flash-txt-2-img | Luma Photon flash is a powerful and fast text-to-image model offering high-quality visuals with unmatched speed and prec | Text-to-Image Generation | $0.0024999999999999944 | 15.91076s | [llms.txt](https://www.segmind.com/models/luma-photon-flash-txt-2-img/llms.txt) |
| Luma Photon Text to Image | luma-photon-txt-2-img | Luma Photon is a powerful AI-driven text-to-image model offering high-quality visuals with unmatched speed and precision | Text-to-Image Generation | $0.018749999999999985 | 18.88061s | [llms.txt](https://www.segmind.com/models/luma-photon-txt-2-img/llms.txt) |
| Nano Banana | nano-banana | Gemini Image Editor preserves authentic subject identity while enabling seamless image editing and manipulation. | Text-to-Image Generation | $0.036471582946911545 | 14.27837s | [llms.txt](https://www.segmind.com/models/nano-banana/llms.txt) |
| NewReality Lightning SDXL | sdxl1.0-newreality-lightning | NewReality Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images | Text-to-Image Generation | $0.00633556246667513 | 3.03505s | [llms.txt](https://www.segmind.com/models/sdxl1.0-newreality-lightning/llms.txt) |
| NightVis Lightning SDXL | sdxl1.0-nightvis-lightning | NightVis Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images i | Text-to-Image Generation | $0.007701533149756405 | 4.27571s | [llms.txt](https://www.segmind.com/models/sdxl1.0-nightvis-lightning/llms.txt) |
| Playground V2.5 | playground-v2.5 | Playground V2.5 is a diffusion-based text-to-image generative model, designed to create highly aesthetic images based on | Text-to-Image Generation | $0.003721384771350553 | 4.01389s | [llms.txt](https://www.segmind.com/models/playground-v2.5/llms.txt) |
| ProtoVision Lightning SDXL | sdxl1.0-protovis-lightning | ProtoVision Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px image | Text-to-Image Generation | $0.0064859676707384765 | 3.63305s | [llms.txt](https://www.segmind.com/models/sdxl1.0-protovis-lightning/llms.txt) |
| Pruna P Image | p-image | p-image generates high-quality images from text prompts in seconds, optimizing for speed and fidelity. | Text-to-Image Generation | $0.004999999999999999 | 6.35283s | [llms.txt](https://www.segmind.com/models/p-image/llms.txt) |
| Qwen Image | qwen-image | Qwen-Image revolutionizes image generation and editing with seamless multilingual text integration and photorealistic de | Text-to-Image Generation | $0.12120781276923083 | 28.40645s | [llms.txt](https://www.segmind.com/models/qwen-image/llms.txt) |
| Qwen Image 2512 | qwen-image-2512 | Qwen-Image-2512 generates highly realistic images from text descriptions, excelling in human depiction and environmental | Text-to-Image Generation | $0.01380454237459788 | 19.53968s | [llms.txt](https://www.segmind.com/models/qwen-image-2512/llms.txt) |
| Qwen Image Fast | qwen-image-fast | Qwen-Image expertly generates stunning images with complex text integration, especially for Chinese typography. | Text-to-Image Generation | $0.017947974647207814 | 5.53839s | [llms.txt](https://www.segmind.com/models/qwen-image-fast/llms.txt) |
| RealDream Lightning | sdxl1.0-realdream-lightning | RealDream is a sophisticated image generation model utilizing SDXL Lightning architecture. It creates incredibly realist | Text-to-Image Generation | $0.002366139557714861 | 3.0253s | [llms.txt](https://www.segmind.com/models/sdxl1.0-realdream-lightning/llms.txt) |
| Realdream Pony V9 | sdxl1.0-realdream-pony-v9 | Real Dream Pony V9 is an advanced image generation model based on the Stable Diffusion XL (SDXL) architecture, excelling | Text-to-Image Generation | $0.0074068245707692975 | 5.24434s | [llms.txt](https://www.segmind.com/models/sdxl1.0-realdream-pony-v9/llms.txt) |
| Realism Lightning SDXL | sdxl1.0-realism-lightning | Realism Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in | Text-to-Image Generation | $0.00719040746579686 | 4.69101s | [llms.txt](https://www.segmind.com/models/sdxl1.0-realism-lightning/llms.txt) |
| Realistic Vision | sd1.5-realisticvision | This model corresponds to the Stable Diffusion Realistic Vision checkpoint for detailed images at the cost of a super de | Text-to-Image Generation | $0.00250588440791257 | 1.45097s | [llms.txt](https://www.segmind.com/models/sd1.5-realisticvision/llms.txt) |
| Realvis Lightning SDXL | sdxl1.0-realvis-lightning | Realvis Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in | Text-to-Image Generation | $0.0050072260365039345 | 3.04129s | [llms.txt](https://www.segmind.com/models/sdxl1.0-realvis-lightning/llms.txt) |
| Realvis SDXL | sdxl1.0-realvis | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software. | Text-to-Image Generation | $0.010331557161411509 | 4.92935s | [llms.txt](https://www.segmind.com/models/sdxl1.0-realvis/llms.txt) |
| Recraft V3 | recraft-v3 | Recraft V3, the latest iteration of Recraft AI, offers a significant advancement in AI-driven image generation. This sta | Text-to-Image Generation | $0.05000000000000012 | 15.10057s | [llms.txt](https://www.segmind.com/models/recraft-v3/llms.txt) |
| Recraft V3 Svg | recraft-v3-svg | Recraft V3 SVG generates high-quality, customizable vector graphics with precision and ease. Perfect for logos, infograp | Text-to-Image Generation | $0.09999999999999995 | 17.66325s | [llms.txt](https://www.segmind.com/models/recraft-v3-svg/llms.txt) |
| Reliberate | sd1.5-reliberate | This model corresponds to the Stable Diffusion Reliberate checkpoint for detailed images at the cost of a super detailed | Text-to-Image Generation | $0.0032581483488384323 | 1.81634s | [llms.txt](https://www.segmind.com/models/sd1.5-reliberate/llms.txt) |
| Samaritan 3D XL | sdxl1.0-samaritan-3d | Samaritan 3D XL leverages the robust capabilities of the SDXL framework, ensuring high-quality, detailed 3D character re | Text-to-Image Generation | $0.008573312925110004 | 4.13346s | [llms.txt](https://www.segmind.com/models/sdxl1.0-samaritan-3d/llms.txt) |
| Samaritan Lightning SDXL | sdxl1.0-samaritan-lightning | Samaritan Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images  | Text-to-Image Generation | $0.007016612648451129 | 4.37849s | [llms.txt](https://www.segmind.com/models/sdxl1.0-samaritan-lightning/llms.txt) |
| Seedream 3.0 t2i | seedream-v3-text-to-image | Seedream V3 generates high-resolution, bilingual images in seconds, enhancing creative workflows and marketing effective | Text-to-Image Generation | $0.0374999999999999 | 5.80374s | [llms.txt](https://www.segmind.com/models/seedream-v3-text-to-image/llms.txt) |
| Seedream 5.0 Lite: Text-to-Image | seedream-v5-lite-text-to-image | Generate high-quality, instruction-following images with Seedream 5.0 Lite, Segmind's fast multimodal text-to-image mode | Text-to-Image Generation | $0.03499999999999999 | 35.85173s | [llms.txt](https://www.segmind.com/models/seedream-v5-lite-text-to-image/llms.txt) |
| Segmind-Vega | segmind-vega | The Segmind-Vega Model is a distilled version of the Stable Diffusion XL (SDXL), offering a remarkable 70% reduction in  | Text-to-Image Generation | $0.0022489604890494736 | 2.35742s | [llms.txt](https://www.segmind.com/models/segmind-vega/llms.txt) |
| Segmind-VegaRT | segmind-vega-rt-v1 | Segmind-VegaRT a distilled consistency adapter for Segmind-Vega that allows to reduce the number of inference steps to o | Text-to-Image Generation | $0.0020003857514397307 | 1.68497s | [llms.txt](https://www.segmind.com/models/segmind-vega-rt-v1/llms.txt) |
| Simple Vector Flux Lora | Simple_Vector_Flux | Flux is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions | Text-to-Image Generation | $0.0337997876486014 | 36.40649s | [llms.txt](https://www.segmind.com/models/Simple_Vector_Flux/llms.txt) |
| SSD-1B | ssd-1b | SSD-1B efficiently generates high-quality, diverse images from text prompts in real-time. | Text-to-Image Generation | $0.004170655368377178 | 2.82059s | [llms.txt](https://www.segmind.com/models/ssd-1b/llms.txt) |
| Stable Diffusion 3 Medium Text to Image | stable-diffusion-3-medium-txt2img | Stable Diffusion is a type of latent diffusion model that can generate images from text. It was created by a team of res | Text-to-Image Generation | $0.04100084691679047 | 7.37145s | [llms.txt](https://www.segmind.com/models/stable-diffusion-3-medium-txt2img/llms.txt) |
| Stable Diffusion 3.5 Large Text to Image | stable-diffusion-3.5-large-txt2img | Stable Diffusion 3.5 Large offers exceptional customizability, efficient performance on consumer hardware, and diverse i | Text-to-Image Generation | $0.013810415239805474 | 17.49587s | [llms.txt](https://www.segmind.com/models/stable-diffusion-3.5-large-txt2img/llms.txt) |
| Stable Diffusion 3.5 Turbo Text to Image | stable-diffusion-3.5-turbo-txt2img | Stable Diffusion 3.5 Turbo offers exceptional customizability, efficient performance on consumer hardware, and diverse i | Text-to-Image Generation | $0.0034972688164893605 | 4.83609s | [llms.txt](https://www.segmind.com/models/stable-diffusion-3.5-turbo-txt2img/llms.txt) |
| Stable Diffusion XL 1.0 | sdxl1.0-txt2img | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software | Text-to-Image Generation | $0.007976265828682173 | 6.26606s | [llms.txt](https://www.segmind.com/models/sdxl1.0-txt2img/llms.txt) |
| Wan 2.7 Image Generation | wan2.7-image | Generate stunning 2K images, edit with precision, and render multilingual text using Alibaba's Wan 2.7 AI model via API. | Text-to-Image Generation | $0.0375 | 25.14311s | [llms.txt](https://www.segmind.com/models/wan2.7-image/llms.txt) |
| Wan 2.7 Image Generation Pro | wan2.7-image-pro | Wan 2.7 Pro generates 4K images with chain-of-thought reasoning, multilingual text rendering, and multi-reference consis | Text-to-Image Generation | $0.0375 | 51.52104s | [llms.txt](https://www.segmind.com/models/wan2.7-image-pro/llms.txt) |
| WildCard Lightning SDXL | sdxl1.0-wildcard-lightning | WildCard Lightning SDXL is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images i | Text-to-Image Generation | $0.005026386990885329 | 3.41086s | [llms.txt](https://www.segmind.com/models/sdxl1.0-wildcard-lightning/llms.txt) |
| Yamer's Realistic SDXL | sdxl1.0-yamers-realistic | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software. | Text-to-Image Generation | $0.008997079657151756 | 5.70235s | [llms.txt](https://www.segmind.com/models/sdxl1.0-yamers-realistic/llms.txt) |
| Z Image Turbo | z-image-turbo | Z-Image-Turbo generates photorealistic images in under one second with bilingual text support for global applications. | Text-to-Image Generation | $0.030726346022187007 | 6.6892s | [llms.txt](https://www.segmind.com/models/z-image-turbo/llms.txt) |
| Zavychroma SDXL | sdxl1.0-zavychroma | The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software. | Text-to-Image Generation | $0.010168409796996963 | 6.03496s | [llms.txt](https://www.segmind.com/models/sdxl1.0-zavychroma/llms.txt) |

## Text Generation (LLM)

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Claude 3 Haiku | claude-3-haiku | Claude 3 Haiku, the fastest and most cost-effective model LLM from Anthropic, delivers instant responses and image analy | Text Generation (LLM) | $0.005566252838241568 | 4.3314s | [llms.txt](https://www.segmind.com/models/claude-3-haiku/llms.txt) |
| Claude 3 Opus | claude-3-opus | Claude 3 Opus is an LLM pushing the limits of language understanding. It excels at complex tasks, generates human-qualit | Text Generation (LLM) | $0.08819386837881217 | 25.017s | [llms.txt](https://www.segmind.com/models/claude-3-opus/llms.txt) |
| Claude 3.5 Sonnet | claude-3.5-sonnet | Claude 3.5 Sonnet represents a significant advancement in AI language models, combining speed, accuracy, and visual reas | Text Generation (LLM) | $0.03664767260936965 | 11.20302s | [llms.txt](https://www.segmind.com/models/claude-3.5-sonnet/llms.txt) |
| Claude 4 Sonnet | claude-4-sonnet | Claude 4 excels in advanced coding and multi-step reasoning, transforming complex tasks into manageable solutions. | Text Generation (LLM) | $0.04962686607142856 | 11.78458s | [llms.txt](https://www.segmind.com/models/claude-4-sonnet/llms.txt) |
| Claude 4.5 Sonnet | claude-4.5-sonnet | Claude Sonnet 4.5 empowers developers with advanced coding and reasoning for complex software solutions. | Text Generation (LLM) | $0.04580613434811383 | 24.01055s | [llms.txt](https://www.segmind.com/models/claude-4.5-sonnet/llms.txt) |
| DeepSeek Chat | deepseek-chat | DeepSeek V3 combines cutting-edge AI technology with practical usability. Featuring a 671B parameter architecture, enhan | Text Generation (LLM) | $0.0012660532244438188 | 34.30531s | [llms.txt](https://www.segmind.com/models/deepseek-chat/llms.txt) |
| DeepSeek R1 | deepseek-reasoner | DeepSeek-R1 is a cutting-edge AI reasoning model that combines reinforcement learning with supervised fine-tuning. Excel | Text Generation (LLM) | $0.04687947576301616 | 67.33222s | [llms.txt](https://www.segmind.com/models/deepseek-reasoner/llms.txt) |
| Gemini 2 Flash | gemini-2-flash-image-generation | With Gemini 2 Flash, create consistent visuals, edit images conversationally, and render text accurately. | Text Generation (LLM) | $0.05485099295774647 | 8.36087s | [llms.txt](https://www.segmind.com/models/gemini-2-flash-image-generation/llms.txt) |
| Gemini 2.5 Flash | gemini-2.5-flash | Gemini 2.5 Flash uniquely combines multimodal processing with transparent reasoning for advanced, real-world application | Text Generation (LLM) | $0.003855663990825688 | 11.18217s | [llms.txt](https://www.segmind.com/models/gemini-2.5-flash/llms.txt) |
| Gemini 2.5 PRO | gemini-2.5-pro | Gemini 2.5 Pro excels at complex multimodal reasoning, seamlessly analyzing diverse data types for advanced problem-solv | Text Generation (LLM) | $0.02542934067164179 | 27.37359s | [llms.txt](https://www.segmind.com/models/gemini-2.5-pro/llms.txt) |
| Gemini 3 Pro | gemini-3-pro | Gemini 3 Pro autonomously processes multimodal inputs for complex reasoning and problem-solving tasks. | Text Generation (LLM) | $0.0030073446729380884 | 31.66276s | [llms.txt](https://www.segmind.com/models/gemini-3-pro/llms.txt) |
| Gemini Flash | gemini-1.5-flash | Gemini 1.5 Flash is a game-changer for developers and enterprises seeking a speedy and cost-effective large language mod | Text Generation (LLM) | $0.0013412948740531973 | 4.27627s | [llms.txt](https://www.segmind.com/models/gemini-1.5-flash/llms.txt) |
| Gemini PRO | gemini-1.5-pro | Gemini 1.5 Pro represents a significant leap in large language model technology,  offering exceptional understanding and | Text Generation (LLM) | $0.01684745850533808 | 11.21759s | [llms.txt](https://www.segmind.com/models/gemini-1.5-pro/llms.txt) |
| GPT 4 | gpt-4 | GPT-4 outperforms both previous large language models and as of 2023, most state-of-the-art systems (which often have be | Text Generation (LLM) | $0.028924545905172412 | 13.14806s | [llms.txt](https://www.segmind.com/models/gpt-4/llms.txt) |
| GPT 4 turbo | gpt-4-turbo | GPT-4 outperforms both previous large language models and as of 2023, most state-of-the-art systems (which often have be | Text Generation (LLM) | $0.005316457025226017 | 9.89624s | [llms.txt](https://www.segmind.com/models/gpt-4-turbo/llms.txt) |
| GPT 4o | gpt-4o | GPT-4o (“o” for “omni”) is our most advanced model. It is multimodal (accepting text or image inputs and outputting text | Text Generation (LLM) | $0.007468990349237066 | 6.01159s | [llms.txt](https://www.segmind.com/models/gpt-4o/llms.txt) |
| GPT 5 | gpt-5 | GPT-5 automates complex coding tasks with integrated tools for seamless software development and deployment. | Text Generation (LLM) | $0.041768300143266476 | 61.73189s | [llms.txt](https://www.segmind.com/models/gpt-5/llms.txt) |
| GPT 5 Mini | gpt-5-mini | GPT-5 Mini delivers rapid, high-quality AI responses across text, images, and files for efficient applications. | Text Generation (LLM) | $0.008644852099737532 | 46.85817s | [llms.txt](https://www.segmind.com/models/gpt-5-mini/llms.txt) |
| GPT 5 Nano | gpt-5-nano | GPT-5 Nano delivers ultra-fast responses for real-time applications, enhancing efficiency in developer tools and AI inte | Text Generation (LLM) | $0.0010103536938004356 | 30.37253s | [llms.txt](https://www.segmind.com/models/gpt-5-nano/llms.txt) |
| GPT 5.1 | gpt-5.1 | GPT-5.1 delivers precise, actionable code review feedback, enhancing developer workflows and code quality. | Text Generation (LLM) | $0.007256262593516209 | 9.17103s | [llms.txt](https://www.segmind.com/models/gpt-5.1/llms.txt) |
| GPT 5.2 | gpt-5.2 | ChatGPT 5.2 delivers advanced reasoning and multimodal intelligence for precise and reliable AI solutions. | Text Generation (LLM) | $0.01250300519320699 | 20.16582s | [llms.txt](https://www.segmind.com/models/gpt-5.2/llms.txt) |
| GPT 5.4 | gpt-5.4 | GPT-5.4 is OpenAI's most powerful model — frontier reasoning, coding, computer use, and 1M token context in one API. | Text Generation (LLM) | $0.010345472022684309 | 4.66803s | [llms.txt](https://www.segmind.com/models/gpt-5.4/llms.txt) |
| GPT 5.4 Mini | gpt-5.4-mini | GPT-5.4 Mini: OpenAI's fastest efficient model for coding, computer use, and high-volume agentic AI workflows. | Text Generation (LLM) | $0.000542280413936738 | 5.59299s | [llms.txt](https://www.segmind.com/models/gpt-5.4-mini/llms.txt) |
| GPT 5.4 Nano | gpt-5.4-nano | GPT-5.4 Nano delivers flagship-class AI for classification, extraction, and high-volume API workloads at the lowest cost | Text Generation (LLM) | $0.006243802782378271 | 5.2462s | [llms.txt](https://www.segmind.com/models/gpt-5.4-nano/llms.txt) |
| Grok 2 | grok-2 | Grok-2, xAI's latest language model, boasts superior reasoning, coding, and chat capabilities, outperforming many popula | Text Generation (LLM) | $0.0046698497191011235 | 8.51163s | [llms.txt](https://www.segmind.com/models/grok-2/llms.txt) |
| Grok 2 Vision | grok-2-vision | Grok-2, xAI's latest language model with vision understanding. | Text Generation (LLM) | $0.0038650394875659397 | 5.03739s | [llms.txt](https://www.segmind.com/models/grok-2-vision/llms.txt) |
| Kimi K2 Instruct 0905 | kimi-k2-instruct-0905 | Kimi K2 Instruct 0905 excels in deep contextual understanding and complex code generation with an extensive 262K token c | Text Generation (LLM) | $0.0010261428571428572 | 2.20809s | [llms.txt](https://www.segmind.com/models/kimi-k2-instruct-0905/llms.txt) |
| Llama 3 70b | llama-v3-70b-instruct | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and inst | Text Generation (LLM) | $0.0010269687791539328 | 2.38864s | [llms.txt](https://www.segmind.com/models/llama-v3-70b-instruct/llms.txt) |
| Llama 3 8b | llama-v3-8b-instruct | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and inst | Text Generation (LLM) | $0.0006315522434414855 | 1.33946s | [llms.txt](https://www.segmind.com/models/llama-v3-8b-instruct/llms.txt) |
| Llama 3.1 405b | llama-v3p1-405b-instruct | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and inst | Text Generation (LLM) | $0.006355001054333312 | 6.60671s | [llms.txt](https://www.segmind.com/models/llama-v3p1-405b-instruct/llms.txt) |
| Llama 3.1 70b | llama-v3p1-70b-instruct | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and inst | Text Generation (LLM) | $0.0019101056600713594 | 2.70658s | [llms.txt](https://www.segmind.com/models/llama-v3p1-70b-instruct/llms.txt) |
| Llama 3.1 8b | llama-v3p1-8b-instruct | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and inst | Text Generation (LLM) | $0.00002702281348010712 | 1.45468s | [llms.txt](https://www.segmind.com/models/llama-v3p1-8b-instruct/llms.txt) |
| Llama 4 Maverick Instruct Basic | llama4-maverick-instruct-basic | Llama 4 Maverick Instruct Basic is a 400B parameter powerhouse with 128 experts for unparalleled text and image understa | Text Generation (LLM) | $0.0010211187845303867 | 2.47979s | [llms.txt](https://www.segmind.com/models/llama4-maverick-instruct-basic/llms.txt) |
| Llama 4 Scout Instruct Basic | llama4-scout-instruct-basic | Unlock powerful multimodal AI with Llama 4 Scout basic, a 17 billion active parameters model offering leading text & ima | Text Generation (LLM) | $0.0007081910655579192 | 2.68258s | [llms.txt](https://www.segmind.com/models/llama4-scout-instruct-basic/llms.txt) |
| Mixtral 8x22b | mixtral-8x22b-instruct | Mistral MoE 8x22B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction following. | Text Generation (LLM) | $0.0005056373593842511 | 2.73773s | [llms.txt](https://www.segmind.com/models/mixtral-8x22b-instruct/llms.txt) |
| Mixtral 8x7b | mixtral-8x7b-instruct | Mistral MoE 8x7B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction following. | Text Generation (LLM) | $0.0002638041493775934 | 1.8292s | [llms.txt](https://www.segmind.com/models/mixtral-8x7b-instruct/llms.txt) |
| O4 Mini | o4-mini | OpenAI o4-mini enhances decision-making by processing text and images with advanced reasoning capabilities. | Text Generation (LLM) | $0.006826813091158328 | 12.00599s | [llms.txt](https://www.segmind.com/models/o4-mini/llms.txt) |
| OpenAI o1-mini | o1-mini | o1-mini by OpenAI provides high-performance reasoning and coding capabilities. Ideal for developers and businesses seeki | Text Generation (LLM) | $0.08289552217591577 | 30.12678s | [llms.txt](https://www.segmind.com/models/o1-mini/llms.txt) |
| OpenAI o1-preview | o1-preview | o1-preview by OpenAI,  is a powerful AI model that can tackle complex problems with exceptional accuracy and efficiency. | Text Generation (LLM) | $0.34304200481254693 | 51.96905s | [llms.txt](https://www.segmind.com/models/o1-preview/llms.txt) |
| OpenAI o3 | o3 | OpenAI o3: frontier reasoning model that solves complex coding, math, science, and visual tasks with human-expert accura | Text Generation (LLM) | $0.0062050000000000004 | 13.39558s | [llms.txt](https://www.segmind.com/models/o3/llms.txt) |
| OpenAI o3 Mini | o3-mini | OpenAI o3-mini: a cost-efficient reasoning model that excels at coding, math, and science with STEM-leading accuracy. | Text Generation (LLM) | $0.0015286499999999999 | 3.68581s | [llms.txt](https://www.segmind.com/models/o3-mini/llms.txt) |
| QVQ Max | qvq-max | Visual reasoning model with always-on chain-of-thought — solves math diagrams, charts, and complex visual problems step- | Text Generation (LLM) | $0.0075935 | 27.5905s | [llms.txt](https://www.segmind.com/models/qvq-max/llms.txt) |
| Qwen 3 Coder Flash | qwen3-coder-flash | Fast, affordable code AI by Alibaba with 1M token context — ideal for high-volume generation, autocomplete, and agentic  | Text Generation (LLM) | $0.0010888 | 6.31215s | [llms.txt](https://www.segmind.com/models/qwen3-coder-flash/llms.txt) |
| Qwen 3 Coder Plus | qwen3-coder-plus | Qwen3 Coder Plus generates, debugs, and refactors code across entire repositories with 1M token context. | Text Generation (LLM) | $0.00121255 | 3.93104s | [llms.txt](https://www.segmind.com/models/qwen3-coder-plus/llms.txt) |
| Qwen 3 Max | qwen3-max | Access Qwen 3 Max API — Alibaba Cloud's 1T-parameter LLM with 262K context, hybrid reasoning, and built-in tool use for  | Text Generation (LLM) | $0.0012642 | 6.94008s | [llms.txt](https://www.segmind.com/models/qwen3-max/llms.txt) |
| Qwen 3 VL Flash | qwen3-vl-flash | Fast, affordable vision-language API with 262K context for OCR, visual QA, and multimodal document analysis. | Text Generation (LLM) | $0.00019149999999999997 | 3.97372s | [llms.txt](https://www.segmind.com/models/qwen3-vl-flash/llms.txt) |
| Qwen 3 VL Plus | qwen3-vl-plus | Alibaba's Qwen3 VL Plus processes images and text — powerful visual QA, document parsing, and chart analysis with 262K c | Text Generation (LLM) | $0.0011242000000000001 | 8.91876s | [llms.txt](https://www.segmind.com/models/qwen3-vl-plus/llms.txt) |
| Qwen 3.5 Flash | qwen3.5-flash | Fast multimodal AI with 1M context — process text, images, and video with Qwen 3.5 Flash via API. | Text Generation (LLM) | $0.00109004 | 58.12938s | [llms.txt](https://www.segmind.com/models/qwen3.5-flash/llms.txt) |
| Qwen 3.5 Plus | qwen3.5-plus | Alibaba Cloud's native multimodal AI with 1M context, image/video input, and built-in tool use for developers. | Text Generation (LLM) | $0.001731 | 10.81471s | [llms.txt](https://www.segmind.com/models/qwen3.5-plus/llms.txt) |
| Qwen Flash | qwen-flash | Qwen Flash: Alibaba Cloud's fastest, lowest-cost LLM with 1M context for high-volume chat, classification, and summariza | Text Generation (LLM) | $0.00020250000000000002 | 4.73149s | [llms.txt](https://www.segmind.com/models/qwen-flash/llms.txt) |
| Qwen Plus | qwen-plus | Qwen Plus: Alibaba Cloud mid-tier LLM with 1M context for summarization, content generation, and enterprise chatbots. | Text Generation (LLM) | $0.00014566666666666667 | 2.12671s | [llms.txt](https://www.segmind.com/models/qwen-plus/llms.txt) |
| Qwen2 VL 72B Instruct | qwen2-vl-72b-instruct | Qwen2-VL-72B-Instruct is a state-of-the-art multimodal model excelling in image and video understanding, with advanced c | Text Generation (LLM) | $0.005773370372061172 | 7.14185s | [llms.txt](https://www.segmind.com/models/qwen2-vl-72b-instruct/llms.txt) |
| QWEN2-VL-7B-Instruct | qwen2-vl-7b-instruct | The Qwen2-VL-7B-Instruct is a cutting-edge vision-language model with 7 billion parameters, offering advanced capabiliti | Text Generation (LLM) | $0.0012699288110324266 | 36.34406s | [llms.txt](https://www.segmind.com/models/qwen2-vl-7b-instruct/llms.txt) |
| Qwen2.5-VL 32B Instruct | qwen2p5-vl-32b-instruct | Qwen2.5-VL processes text and images seamlessly for advanced multimodal instruction and reasoning. | Text Generation (LLM) | $0.0026978558645024695 | 7.5514s | [llms.txt](https://www.segmind.com/models/qwen2p5-vl-32b-instruct/llms.txt) |
| QwQ Plus | qwq-plus | QwQ Plus delivers deep chain-of-thought reasoning for math, code, and logic with 131K context. | Text Generation (LLM) | $0.009051875000000001 | 69.18748s | [llms.txt](https://www.segmind.com/models/qwq-plus/llms.txt) |

## Image-to-Image Transformation

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| AI Product Photo Editor | ai-product-photo-editor | AI Product Photo Editor leverages advanced image-based ML techniques to generate high-quality product visuals using text | Image-to-Image Transformation | $0.02224141808072486 | 15.50389s | [llms.txt](https://www.segmind.com/models/ai-product-photo-editor/llms.txt) |
| AI Product Photography  | ai-product-photography | Elevate your product imagery with our AI-powered photography model. Create stunning, professional-quality photos that bo | Image-to-Image Transformation | $0.06465815218287853 | 11.64498s | [llms.txt](https://www.segmind.com/models/ai-product-photography/llms.txt) |
| Aura Flow | aura-flow | Largest completely open sourced flow-based generation model that is capable of text-to-image generation | Image-to-Image Transformation | $0.11701278422174842 | 79.15338s | [llms.txt](https://www.segmind.com/models/aura-flow/llms.txt) |
| Automatic Mask Generator | automatic-mask-generator | Automatic Mask Generator is a powerful tool that automates the creation of precise masks for inpainting | Image-to-Image Transformation | $0.0015543570436195773 | 1.6942s | [llms.txt](https://www.segmind.com/models/automatic-mask-generator/llms.txt) |
| Background Removal | bg-removal | This model removes the background image from any image | Image-to-Image Transformation | $0.0020975703032975646 | 1.65848s | [llms.txt](https://www.segmind.com/models/bg-removal/llms.txt) |
| Background Removal V2 | bg-removal-v2 | This model removes the background image from any image | Image-to-Image Transformation | $0.0008842893537857474 | 0.69179s | [llms.txt](https://www.segmind.com/models/bg-removal-v2/llms.txt) |
| Bria Blur Background | bria-blur-background | Bria AI Image Editing API v2 enables precise and context-aware image manipulation for stunning visual outcomes. | Image-to-Image Transformation | $0.053043478260869574 | 16.51626s | [llms.txt](https://www.segmind.com/models/bria-blur-background/llms.txt) |
| Bria Enhance Image | bria-enhance-image | Bria AI creates precise, high-quality image enhancements and manipulations for diverse creative applications. | Image-to-Image Transformation | $0.040445344129554674 | 23.66673s | [llms.txt](https://www.segmind.com/models/bria-enhance-image/llms.txt) |
| Bria Erase Foreground | bria-erase-foreground | Seamlessly removes foreground subjects and regenerates backgrounds for flawless image editing. | Image-to-Image Transformation | $0.04272727272727273 | 12.00002s | [llms.txt](https://www.segmind.com/models/bria-erase-foreground/llms.txt) |
| Bria Eraser | bria-eraser | Bria AI’s Eraser model seamlessly enhances and modifies images with advanced generative capabilities, ensuring flawless  | Image-to-Image Transformation | $0.04176470588235295 | 14.03854s | [llms.txt](https://www.segmind.com/models/bria-eraser/llms.txt) |
| Bria Expand Image | bria-expand-image | Bria Expand enables precise image manipulation and enhancement with generative AI, trained exclusively on licensed data  | Image-to-Image Transformation | $0.03899860917941582 | 13.5721s | [llms.txt](https://www.segmind.com/models/bria-expand-image/llms.txt) |
| Bria Generate Background | bria-replace-background | Transform images through advanced background editing and generative content creation for diverse applications. | Image-to-Image Transformation | $0.04142857142857143 | 18.72365s | [llms.txt](https://www.segmind.com/models/bria-replace-background/llms.txt) |
| Bria Generative Fill | bria-gen-fill | Bria AI enables precise generative image editing for seamless creative enhancements and transformations. | Image-to-Image Transformation | $0.03787878787878789 | 18.44121s | [llms.txt](https://www.segmind.com/models/bria-gen-fill/llms.txt) |
| Bria Increase Resolution | bria-increase-resolution | Seamlessly upscale and manipulate images while preserving the highest fidelity and safety standards. | Image-to-Image Transformation | $0.03670477417277192 | 12.55062s | [llms.txt](https://www.segmind.com/models/bria-increase-resolution/llms.txt) |
| Bria Lifestyle Product Shot by Text | bria-lifestyle-shot-by-text | Transform isolated product images into dynamic lifestyle scenes with AI-driven contextual realism. | Image-to-Image Transformation | $0.03860759493670887 | 25.28842s | [llms.txt](https://www.segmind.com/models/bria-lifestyle-shot-by-text/llms.txt) |
| Bria Product Cutout | bria-product-cutout | Automates precise product cutouts and background removal for professional eCommerce imagery at scale. | Image-to-Image Transformation | $0.04 | 10.92597s | [llms.txt](https://www.segmind.com/models/bria-product-cutout/llms.txt) |
| Bria Product Packshot | bria-product-packshot | Transform product photos into professional, market-ready images with intelligent enhancements and background removal. | Image-to-Image Transformation | $0.04096774193548387 | 16.96101s | [llms.txt](https://www.segmind.com/models/bria-product-packshot/llms.txt) |
| Bria Product Shadow | bria-product-shadow | Bria Product Shadow enhances product images with realistic shadows for professional eCommerce presentations. | Image-to-Image Transformation | $0.038253968253968255 | 8.05132s | [llms.txt](https://www.segmind.com/models/bria-product-shadow/llms.txt) |
| Bria Reimagine | bria-reimagine | Bria AI Reimagine transforms reference images into detailed, styled visuals with creative flexibility. | Image-to-Image Transformation | $0.04113924050632911 | 13.27925s | [llms.txt](https://www.segmind.com/models/bria-reimagine/llms.txt) |
| Bria RMBG 2.0 | bria-remove-background | Effortlessly extract backgrounds with unmatched precision, powered by models trained exclusively on licensed data for sa | Image-to-Image Transformation | $0.01793483556638247 | 10.85834s | [llms.txt](https://www.segmind.com/models/bria-remove-background/llms.txt) |
| Caricature Style | caricature-style | Transform everyday photos into lively, whimsical caricature illustrations that highlight individual features with playfu | Image-to-Image Transformation | $0.09831688117066295 | 53.41868s | [llms.txt](https://www.segmind.com/models/caricature-style/llms.txt) |
| Clarity Upscaler | clarity-upscaler | High resolution creative image Upscaler and Enhancer. A free Magnific alternative.  | Image-to-Image Transformation | $0.019288352431025734 | 18.27937s | [llms.txt](https://www.segmind.com/models/clarity-upscaler/llms.txt) |
| ClarityAI Creative Upscaler | clarityai-creative-upscaler | Clarity AI intelligently enhances image resolution, preserving fine details for stunning visual clarity. | Image-to-Image Transformation | $0.40625 | 132.33269s | [llms.txt](https://www.segmind.com/models/clarityai-creative-upscaler/llms.txt) |
| ClarityAI Crystal Upscaler | clarityai-crystal-upscaler | Clarity AI intelligently upscales images up to 200x while enhancing detail and visual quality. | Image-to-Image Transformation | $0.5055080116533139 | 34.74371s | [llms.txt](https://www.segmind.com/models/clarityai-crystal-upscaler/llms.txt) |
| ClarityAI Flux Upscaler | clarityai-flux-upscaler | Clarity AI transforms low-resolution images into stunning high-quality visuals with unmatched detail preservation. | Image-to-Image Transformation | $1.6839788732394365 | 323.32283s | [llms.txt](https://www.segmind.com/models/clarityai-flux-upscaler/llms.txt) |
| Codeformer | codeformer | CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces. | Image-to-Image Transformation | $0.012377074267612947 | 5.91812s | [llms.txt](https://www.segmind.com/models/codeformer/llms.txt) |
| Consistent Character | consistent-character | Create images of a given character in different poses  | Image-to-Image Transformation | $0.08459392088342917 | 61.36668s | [llms.txt](https://www.segmind.com/models/consistent-character/llms.txt) |
| Consistent Character With Pose | consistent-character-with-pose | Create images of a given character in different poses | Image-to-Image Transformation | $0.029430206793344512 | 30.9992s | [llms.txt](https://www.segmind.com/models/consistent-character-with-pose/llms.txt) |
| ControlNet Canny | sd1.5-controlnet-canny | This model corresponds to the ControlNet conditioned on Canny edges. | Image-to-Image Transformation | $0.003699646868397279 | 4.67214s | [llms.txt](https://www.segmind.com/models/sd1.5-controlnet-canny/llms.txt) |
| ControlNet Depth | sd1.5-controlnet-depth | This model corresponds to the ControlNet conditioned on Depth estimation. | Image-to-Image Transformation | $0.009929268607788793 | 12.66626s | [llms.txt](https://www.segmind.com/models/sd1.5-controlnet-depth/llms.txt) |
| ControlNet Openpose | sd1.5-controlnet-openpose | This model corresponds to the ControlNet conditioned on Human Pose Estimation. | Image-to-Image Transformation | $0.006113047046971417 | 10.00272s | [llms.txt](https://www.segmind.com/models/sd1.5-controlnet-openpose/llms.txt) |
| ControlNet Scribble | sd1.5-controlnet-scribble | This model corresponds to the ControlNet conditioned on Scribble images. | Image-to-Image Transformation | $0.0029691219916376447 | 3.81035s | [llms.txt](https://www.segmind.com/models/sd1.5-controlnet-scribble/llms.txt) |
| ControlNet Soft Edge | sd1.5-controlnet-softedge | This model corresponds to the ControlNet conditioned on Soft Edge. | Image-to-Image Transformation | $0.002842183973209189 | 3.43761s | [llms.txt](https://www.segmind.com/models/sd1.5-controlnet-softedge/llms.txt) |
| ESRGAN | esrgan | ERGAN is an Image Super-Resolution (upscaler) model that enhances images with stunning, high-quality upscaling while pre | Image-to-Image Transformation | $0.004672196869391752 | 4.91004s | [llms.txt](https://www.segmind.com/models/esrgan/llms.txt) |
| Expression Editor | expression-editor | Expression Editor uses reference images to accurately generate new images with desired expressions. Perfect for digital  | Image-to-Image Transformation | $0.001551200900608203 | 2.31222s | [llms.txt](https://www.segmind.com/models/expression-editor/llms.txt) |
| Face Detailer | face-detailer | Restore characters' faces to their original glory with Face Detailer. Enhance facial details, eliminate distortion, and  | Image-to-Image Transformation | $0.014886074230234112 | 16.32846s | [llms.txt](https://www.segmind.com/models/face-detailer/llms.txt) |
| face-to-many | face-to-many | Turn a face into 3D, emoji, pixel art, video game, claymation or toy | Image-to-Image Transformation | $0.024690129277301704 | 22.49207s | [llms.txt](https://www.segmind.com/models/face-to-many/llms.txt) |
| face-to-sticker | face-to-sticker | Turn a face into a sticker | Image-to-Image Transformation | $0.08656035076613267 | 65.40722s | [llms.txt](https://www.segmind.com/models/face-to-sticker/llms.txt) |
| Faceswap | sd2.1-faceswapper | Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. N | Image-to-Image Transformation | $0.022486081654239 | 36.77019s | [llms.txt](https://www.segmind.com/models/sd2.1-faceswapper/llms.txt) |
| Faceswap V2 | faceswap-v2 | Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. N | Image-to-Image Transformation | $0.00426395007171632 | 3.18214s | [llms.txt](https://www.segmind.com/models/faceswap-v2/llms.txt) |
| Faceswap V3 | faceswap-v3 | Face Swap V3 is a cutting-edge tool that empowers you to seamlessly swap faces in images. With customizable features and | Image-to-Image Transformation | $0.005809813651717728 | 4.28387s | [llms.txt](https://www.segmind.com/models/faceswap-v3/llms.txt) |
| Faceswap V3 Multifaceswap | faceswap-v3-multifaceswap | Faceswap V3 Multifaceswap enables realistic face swapping in images, preserving lighting and expressions for professiona | Image-to-Image Transformation | $0.007570338279007598 | 6.60584s | [llms.txt](https://www.segmind.com/models/faceswap-v3-multifaceswap/llms.txt) |
| Flux 2 Flex | flux-2-flex | FLUX.2 generates high-quality, photorealistic images with consistent style using multiple references for professional wo | Image-to-Image Transformation | $0.17143308080808078 | 48.78401s | [llms.txt](https://www.segmind.com/models/flux-2-flex/llms.txt) |
| Flux 2 Max | flux-2-max | FLUX.2 Max generates photorealistic images with unparalleled consistency and contextual awareness for professional conte | Image-to-Image Transformation | $0.22035175879396982 | 53.76674s | [llms.txt](https://www.segmind.com/models/flux-2-max/llms.txt) |
| Flux 2 Pro | flux-2-pro | FLUX.2 generates photorealistic images while ensuring consistency across multiple assets using reference images. | Image-to-Image Transformation | $0.07273068357862124 | 26.14867s | [llms.txt](https://www.segmind.com/models/flux-2-pro/llms.txt) |
| Flux Canny Dev | flux-canny-dev | Open-weight edge-guided image generation. Control structure and composition using Canny edge detection. | Image-to-Image Transformation | $0.03125 | 20.67341s | [llms.txt](https://www.segmind.com/models/flux-canny-dev/llms.txt) |
| Flux Canny Pro | flux-canny-pro | Professional edge-guided image generation. Control structure and composition using Canny edge detection | Image-to-Image Transformation | $0.0624808908416325 | 24.3348s | [llms.txt](https://www.segmind.com/models/flux-canny-pro/llms.txt) |
| Flux Controlnets | flux-controlnet | Flux ControlNets is a collection of models that gives you precise control over image generation. By integrating ControlN | Image-to-Image Transformation | $0.0468823807162098 | 48.66877s | [llms.txt](https://www.segmind.com/models/flux-controlnet/llms.txt) |
| Flux Depth Dev | flux-depth-dev | Open-weight depth-aware image generation. Edit images while preserving spatial relationships. | Image-to-Image Transformation | $0.03125 | 16.03968s | [llms.txt](https://www.segmind.com/models/flux-depth-dev/llms.txt) |
| Flux Depth Pro | flux-depth-pro | Professional depth-aware image generation. Edit images while preserving spatial relationships. | Image-to-Image Transformation | $0.062476530217028384 | 25.437s | [llms.txt](https://www.segmind.com/models/flux-depth-pro/llms.txt) |
| Flux Fill Dev | flux-fill-dev | Open-weight inpainting model for editing and extending images. Guidance-distilled from FLUX.1 Fill Dev | Image-to-Image Transformation | $0.04999999999999996 | 16.65608s | [llms.txt](https://www.segmind.com/models/flux-fill-dev/llms.txt) |
| Flux Fill Pro | flux-fill-pro | Professional inpainting and outpainting model with state-of-the-art performance. Edit or extend images with natural, sea | Image-to-Image Transformation | $0.06233389033942559 | 23.54781s | [llms.txt](https://www.segmind.com/models/flux-fill-pro/llms.txt) |
| Flux Inpaint | flux-inpaint | Flux Inpainting is a powerful image editing tool designed to effortlessly edit and enhance your images. It's perfect for | Image-to-Image Transformation | $0.02437914111346206 | 28.40185s | [llms.txt](https://www.segmind.com/models/flux-inpaint/llms.txt) |
| Flux Ipadapter | flux-ipadapter | Flux IP Adapter is a cutting-edge AI model that lets you to create stunning, customized images. With its advanced style  | Image-to-Image Transformation | $0.07432834362068964 | 76.08154s | [llms.txt](https://www.segmind.com/models/flux-ipadapter/llms.txt) |
| Flux Kontext Max | flux-kontext-max | FLUX.1 Kontext [max] transforms textual descriptions into stunning, high-fidelity images with seamless typography integr | Image-to-Image Transformation | $0.10000000000000044 | 24.34517s | [llms.txt](https://www.segmind.com/models/flux-kontext-max/llms.txt) |
| Flux Kontext Pro | flux-kontext-pro | FLUX.1 Kontext Pro transforms text prompts into high-quality, customized images with remarkable efficiency and precision | Image-to-Image Transformation | $0.04999999999999988 | 21.7493s | [llms.txt](https://www.segmind.com/models/flux-kontext-pro/llms.txt) |
| Flux Krea Dev | flux-krea-dev | FLUX.1 Krea generates stunning, photorealistic images with fine-tuned aesthetic control for diverse creative application | Image-to-Image Transformation | $0.03194389243243245 | 24.82919s | [llms.txt](https://www.segmind.com/models/flux-krea-dev/llms.txt) |
| Flux Pulid | flux-pulid | Flux PuLID: Customize AI-generated images with your unique identity. Seamlessly integrate faces into text-to-image model | Image-to-Image Transformation | $0.036171757384809905 | 13.08964s | [llms.txt](https://www.segmind.com/models/flux-pulid/llms.txt) |
| Flux Redux Dev | flux-redux-dev | Open-weight image variation model. Create new versions while preserving key elements of your original. | Image-to-Image Transformation | $0.03125 | 15.84067s | [llms.txt](https://www.segmind.com/models/flux-redux-dev/llms.txt) |
| Flux Redux Schnell | flux-redux-schnell | Fast, efficient image variation model for rapid iteration and experimentation. | Image-to-Image Transformation | $0.003750000000000002 | 7.67963s | [llms.txt](https://www.segmind.com/models/flux-redux-schnell/llms.txt) |
| Flux-2 Klein-4b | flux-2-klein-4b | FLUX.2 [klein] delivers photorealistic image generation and editing with sub-second latency on consumer hardware. | Image-to-Image Transformation | $0.03203665434985968 | 10.80266s | [llms.txt](https://www.segmind.com/models/flux-2-klein-4b/llms.txt) |
| Flux-2 Klein-9b | flux-2-klein-9b | FLUX.2 [klein] enables ultra-fast, photorealistic image generation on consumer GPUs, transforming creative workflows. | Image-to-Image Transformation | $0.043033694823529416 | 15.79396s | [llms.txt](https://www.segmind.com/models/flux-2-klein-9b/llms.txt) |
| Flux.1 Image To Image  | flux-img2img | Flux Image-To-Image model by Black Forest Labs is an advanced deep learning tool designed for transforming images based  | Image-to-Image Transformation | $0.026468713979663272 | 24.99767s | [llms.txt](https://www.segmind.com/models/flux-img2img/llms.txt) |
| FLUX.1 Kontext [dev] | flux-kontext-dev | FLUX.1 Kontext [dev] creates coherent and editable images by integrating text and visual cues for iterative design. | Image-to-Image Transformation | $0.0400015965855202 | 10.8091s | [llms.txt](https://www.segmind.com/models/flux-kontext-dev/llms.txt) |
| Font Sheet Generator | font-sheet-generator | Transforms images into unique, custom font sets in minutes, revolutionizing typography design. | Image-to-Image Transformation | $0.09303416666666668 | 32.89264s | [llms.txt](https://www.segmind.com/models/font-sheet-generator/llms.txt) |
| Fooocus | fooocus | Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney. | Image-to-Image Transformation | $0.06183306475135813 | 20.21113s | [llms.txt](https://www.segmind.com/models/fooocus/llms.txt) |
| Fooocus Outpainting | focus-outpaint | Fooocus Outpainting transforms ordinary images into extraordinary works of art by seamlessly expanding their boundaries. | Image-to-Image Transformation | $0.02453780321242954 | 15.6497s | [llms.txt](https://www.segmind.com/models/focus-outpaint/llms.txt) |
| GPT Image 1 Edit | gpt-image-1-edit | Edit and compose images using natural language with GPT Image 1 Edit, OpenAI’s powerful inpainting and multi-reference e | Image-to-Image Transformation | $0.14111645422665856 | 57.95774s | [llms.txt](https://www.segmind.com/models/gpt-image-1-edit/llms.txt) |
| GPT Image 1 Edit Mini | gpt-image-1-edit-mini | GPT Image 1 Mini generates and edits high-quality images seamlessly from text and visual inputs. | Image-to-Image Transformation | $0.02749860253716044 | 42.56088s | [llms.txt](https://www.segmind.com/models/gpt-image-1-edit-mini/llms.txt) |
| GPT Image 1.5 Edit | gpt-image-1.5-edit | Transform image edits with precision using natural language instructions for seamless creative workflows. | Image-to-Image Transformation | $0.21767954163976774 | 52.46691s | [llms.txt](https://www.segmind.com/models/gpt-image-1.5-edit/llms.txt) |
| HiDream-I1 (Fast) | hidream-l1-fast | HiDream-I1 is a next-generation, open-source image generative foundation model designed for text-to-image synthesis, esp | Image-to-Image Transformation | $0.015212709738372097 | 10.42825s | [llms.txt](https://www.segmind.com/models/hidream-l1-fast/llms.txt) |
| Higgsfield Text 2 Image Soul | higgsfield-text2image-soul | SOUL AI transforms text into stunning, customizable visuals with unparalleled style control and precision. | Image-to-Image Transformation | $0.2175015913430935 | 40.25632s | [llms.txt](https://www.segmind.com/models/higgsfield-text2image-soul/llms.txt) |
| HyperSwap Image Faceswap by FaceFusion Labs | hyperswap-image-faceswap-by-facefusion-labs | Hyperswap enables high-quality, natural face swapping built for real production use. | Image-to-Image Transformation | $0.1 | 9.68355s | [llms.txt](https://www.segmind.com/models/hyperswap-image-faceswap-by-facefusion-labs/llms.txt) |
| Ideogram 2a Image to Image | ideogram-2a-img-2-img | Ideogram Image to Image: Transform your images with ease! Enhance, modify, or create entirely new visuals using advanced | Image-to-Image Transformation | $0.050000000000000024 | 10.11784s | [llms.txt](https://www.segmind.com/models/ideogram-2a-img-2-img/llms.txt) |
| Ideogram 3 Reframe | ideogram-3-reframe | Ideogram 3.0's Reframe effortlessly adapts images to diverse formats, enhancing visual content creation for any platform | Image-to-Image Transformation | $0.041633565044687196 | 11.30249s | [llms.txt](https://www.segmind.com/models/ideogram-3-reframe/llms.txt) |
| Ideogram 3 Remix | ideogram-3-remix | Ideogram 3 Remix enables versatile image transformation, enhancing creativity through customizable design iterations. | Image-to-Image Transformation | $0.07462871287128714 | 10.54744s | [llms.txt](https://www.segmind.com/models/ideogram-3-remix/llms.txt) |
| Ideogram 3 Replace Background | ideogram-3-replace-background | Effortlessly replace backgrounds in images, enhancing visual storytelling and creativity with precision and speed. | Image-to-Image Transformation | $0.09151278409090909 | 14.03688s | [llms.txt](https://www.segmind.com/models/ideogram-3-replace-background/llms.txt) |
| Ideogram Character | ideogram-character | Achieve perfect character consistency across multiple generations from a single reference image. | Image-to-Image Transformation | $0.23212718544935806 | 20.00061s | [llms.txt](https://www.segmind.com/models/ideogram-character/llms.txt) |
| Ideogram Image To Image | ideogram-img-2-img | Ideogram Image to Image: Transform your images with ease! Enhance, modify, or create entirely new visuals using advanced | Image-to-Image Transformation | $0.10000000000000021 | 12.67675s | [llms.txt](https://www.segmind.com/models/ideogram-img-2-img/llms.txt) |
| Ideogram Reframe | ideogram-reframe | Transform your images with Ideogram Reframe! Easily reframe square images to your chosen resolution. | Image-to-Image Transformation | $0.09999999999999995 | 23.34343s | [llms.txt](https://www.segmind.com/models/ideogram-reframe/llms.txt) |
| Ideogram Turbo Image To Image | ideogram-turbo-img-2-img | Transform images instantly with Ideogram Turbo Image to Image! Fast AI for quick edits & creative remixes. | Image-to-Image Transformation | $0.06300000000000003 | 11.0242s | [llms.txt](https://www.segmind.com/models/ideogram-turbo-img-2-img/llms.txt) |
| IDM VTON | idm-vton | Best-in-class clothing virtual try on in the wild | Image-to-Image Transformation | $0.04427530074310439 | 10.72702s | [llms.txt](https://www.segmind.com/models/idm-vton/llms.txt) |
| illusion-diffusion-hq | illusion-diffusion-hq | Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1 | Image-to-Image Transformation | $0.04254646581057341 | 60.14s | [llms.txt](https://www.segmind.com/models/illusion-diffusion-hq/llms.txt) |
| Image Superimpose | superimpose | Superimpose model lets you to create captivating visuals by seamlessly overlaying one image on top of another. It stream | Image-to-Image Transformation | $0.000516212389380531 | 0.64096s | [llms.txt](https://www.segmind.com/models/superimpose/llms.txt) |
| Image Superimpose V2 | superimpose-v2 | Superimpose V2 elevates image editing! Seamlessly layer images with background removal, precise positioning, and flexibl | Image-to-Image Transformation | $0.0018924620997221812 | 2.24128s | [llms.txt](https://www.segmind.com/models/superimpose-v2/llms.txt) |
| Infinite You | infinite-you | InfiniteYou generates high-fidelity portraits preserving identity while aligning with creative text prompts. | Image-to-Image Transformation | $0.18616816406779663 | 164.55799s | [llms.txt](https://www.segmind.com/models/infinite-you/llms.txt) |
| Inpaint Mask Maker | inpaint-mask-maker | Real-Time Open-Vocabulary Object Detection | Image-to-Image Transformation | $0.004548485514777526 | 8.12156s | [llms.txt](https://www.segmind.com/models/inpaint-mask-maker/llms.txt) |
| Insta Depth | insta-depth | InstantID aims to generate customized images with various poses or styles from only a single reference ID image while en | Image-to-Image Transformation | $0.05214489399238382 | 15.17349s | [llms.txt](https://www.segmind.com/models/insta-depth/llms.txt) |
| InstantID | instantid | InstantID aims to generate customized images with various poses or styles from only a single reference ID image while en | Image-to-Image Transformation | $0.025133247161953393 | 7.73541s | [llms.txt](https://www.segmind.com/models/instantid/llms.txt) |
| IP-adapter Canny XL | ip-sdxl-canny | IP Adpater XL Canny is built on the SDXL framework. This model integrates the IP Adapter and Canny edge preprocessor to  | Image-to-Image Transformation | $0.011347903971991587 | 14.2099s | [llms.txt](https://www.segmind.com/models/ip-sdxl-canny/llms.txt) |
| IP-adapter Depth XL | ip-sdxl-depth | IP Adapter Depth XL is built on the SDXL framework. This model integrates the IP Adapter and Depth preprocessor to offer | Image-to-Image Transformation | $0.015825335794022326 | 21.32385s | [llms.txt](https://www.segmind.com/models/ip-sdxl-depth/llms.txt) |
| IP-adapter Openpose XL | ip-sdxl-openpose | IP Adapter XL Openpose is built on the SDXL framework. This model integrates the IP Adapter and Openpose preprocessor to | Image-to-Image Transformation | $0.013783720446761 | 15.4171s | [llms.txt](https://www.segmind.com/models/ip-sdxl-openpose/llms.txt) |
| IPAdapter Style Transfer | style-transfer | Style & Composition Transfer with Stable Diffusion IP Adapter  | Image-to-Image Transformation | $0.025668324416977605 | 16.64547s | [llms.txt](https://www.segmind.com/models/style-transfer/llms.txt) |
| Kling O1 | kling-o1 | Kling O1 transforms video creation and editing into a seamless, AI-driven experience for content creators. | Image-to-Image Transformation | $0.034999999999999996 | 59.52196s | [llms.txt](https://www.segmind.com/models/kling-o1/llms.txt) |
| Kling V3 Image 2 Image | kling-3-image2image | Transform any image into photorealistic, production-ready visuals with Kling V3's Visual Chain-of-Thought reasoning. | Image-to-Image Transformation | $0.035 | 67.85523s | [llms.txt](https://www.segmind.com/models/kling-3-image2image/llms.txt) |
| Kolors | kolors | Kolors is a cutting-edge text-to-image model that bridges language and visual art. Transform your textual ideas into pho | Image-to-Image Transformation | $0.09548960481352992 | 84.60946s | [llms.txt](https://www.segmind.com/models/kolors/llms.txt) |
| Lifestyle Product Shot by Image | bria-lifestyle-shot-by-image | Transforms ordinary product images into stunning, marketing-ready visuals for eCommerce success. | Image-to-Image Transformation | $0.031020408163265307 | 20.98771s | [llms.txt](https://www.segmind.com/models/bria-lifestyle-shot-by-image/llms.txt) |
| Magic Eraser | magic-eraser | LaMA Object Removal- AI Magic Eraser | Image-to-Image Transformation | $0.00015957072997849228 | 0.78123s | [llms.txt](https://www.segmind.com/models/magic-eraser/llms.txt) |
| material-transfer | material-transfer | Transfer a material from an image to a subject | Image-to-Image Transformation | $0.251970550235849 | 164.02658s | [llms.txt](https://www.segmind.com/models/material-transfer/llms.txt) |
| Minimax-image-01 | image-01 | Generate high-fidelity images from text with precise control & stunning quality with Minimax Image-01. | Image-to-Image Transformation | $0.012521120694168152 | 36.2434s | [llms.txt](https://www.segmind.com/models/image-01/llms.txt) |
| Multi Image Kontext Max | multi-image-kontext-max | FLUX.1 Kontext [max] creates stunning, photorealistic images from text prompts and input images seamlessly. | Image-to-Image Transformation | $0.08792099773890778 | 18.33631s | [llms.txt](https://www.segmind.com/models/multi-image-kontext-max/llms.txt) |
| Multi Image Kontext Pro | multi-image-kontext-pro | Transform text into stunning, professional-grade images with precise editing capabilities. | Image-to-Image Transformation | $0.04999999999999997 | 22.90793s | [llms.txt](https://www.segmind.com/models/multi-image-kontext-pro/llms.txt) |
| Nano Banana 2 | nano-banana-2 | Nano Banana 2 rapidly generates photorealistic images from text prompts, ideal for marketing and creative projects. | Image-to-Image Transformation | $0.09272344749344112 | 39.77981s | [llms.txt](https://www.segmind.com/models/nano-banana-2/llms.txt) |
| Nano Banana Pro | nano-banana-pro | Nano Banana Pro generates high-fidelity, context-aware images with accurate multilingual text and multi-image support. | Image-to-Image Transformation | $0.16494376439303163 | 60.52326s | [llms.txt](https://www.segmind.com/models/nano-banana-pro/llms.txt) |
| Nomos Image Upscaler 4k | nomos-upscaler | This upscaling model is ideal for enhancing amateur to professional photos, excelling with subjects like cats, hair, and | Image-to-Image Transformation | $0.011520166751361163 | 8.93997s | [llms.txt](https://www.segmind.com/models/nomos-upscaler/llms.txt) |
| Omini Control | ominicontrol | OminiControl is an innovative framework that optimizes Diffusion Transformer models for versatile image generation tasks | Image-to-Image Transformation | $0.004185658247640512 | 4.50113s | [llms.txt](https://www.segmind.com/models/ominicontrol/llms.txt) |
| Omni Zero | omni-zero | Omni-Zero: A diffusion pipeline for zero-shot stylized portrait creation. | Image-to-Image Transformation | $0.17303785389324666 | 149.10339s | [llms.txt](https://www.segmind.com/models/omni-zero/llms.txt) |
| Profile Photo Style Transfer | become-image | Turn any image of a face into artwork using Stable Diffusion Controlnet and IPAdapter | Image-to-Image Transformation | $0.09456781571354969 | 63.7612s | [llms.txt](https://www.segmind.com/models/become-image/llms.txt) |
| Pruna P Image Edit | p-image-edit | Pruna's p-image-edit enables sophisticated multi-image editing with AI-guided precision and style applications. | Image-to-Image Transformation | $0.010000000000000002 | 7.53944s | [llms.txt](https://www.segmind.com/models/p-image-edit/llms.txt) |
| PuLID | pulid-base | Novel tuning-free ID customization method for text-to-image generation. | Image-to-Image Transformation | $0.20448619862174583 | 70.5734s | [llms.txt](https://www.segmind.com/models/pulid-base/llms.txt) |
| Qwen Image Edit | qwen-image-edit | Transform images effortlessly through semantic context and pixel-perfect appearance changes. | Image-to-Image Transformation | $0.1993516979032258 | 48.88752s | [llms.txt](https://www.segmind.com/models/qwen-image-edit/llms.txt) |
| Qwen Image Edit Fast | qwen-image-edit-fast | Qwen-Image-Edit enables precise bilingual image editing for seamless localization and professional content creation. | Image-to-Image Transformation | $0.03643549199338586 | 8.72839s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-fast/llms.txt) |
| Qwen Image Edit Plus | qwen-image-edit-plus | Qwen Image Edit Plus revolutionizes multi-image editing with precise transformations and facial consistency. | Image-to-Image Transformation | $0.035252840054403226 | 13.64536s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus/llms.txt) |
| Qwen Image Edit Plus Add People Lora | qwen-image-edit-plus-add-people | Effortlessly generates realistic multi-character scenes with natural interactions for diverse creative applications. | Image-to-Image Transformation | $0.09862558602150538 | 23.00519s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-add-people/llms.txt) |
| Qwen Image Edit Plus Blend It | qwen-image-edit-plus-blend-it | Seamlessly integrates products into backgrounds with precise lighting and perspective adjustments for realistic composit | Image-to-Image Transformation | $0.08271108963210703 | 18.22052s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-blend-it/llms.txt) |
| Qwen Image Edit Plus Eigen Banana | qwen-image-edit-plus-eigen-banana | Eigen-Banana-Qwen-Image-Edit enables precise, text-guided transformations of images for diverse applications. | Image-to-Image Transformation | $0.08898907734265733 | 21.66458s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-eigen-banana/llms.txt) |
| Qwen Image Edit Plus Eraser | qwen-image-edit-plus-eraser | Intelligently removes unwanted objects from images while preserving realistic backgrounds and scene integrity. | Image-to-Image Transformation | $0.0783987044520548 | 20.01055s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-eraser/llms.txt) |
| Qwen Image Edit Plus Face To Portrait | qwen-image-edit-plus-face-to-portrait | Transforms cropped facial images into stunning, identity-preserving portrait photographs. | Image-to-Image Transformation | $0.07414667805907173 | 17.79869s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-face-to-portrait/llms.txt) |
| Qwen Image Edit Plus Group Photo | qwen-image-edit-plus-group-photo | Generates realistic group photos by merging multiple individual portraits while ensuring facial consistency and nostalgi | Image-to-Image Transformation | $0.10533954509803922 | 23.72663s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-group-photo/llms.txt) |
| Qwen Image Edit Plus Multi Lora | qwen-image-edit-plus-multi-lora | Qwen Image Edit Plus Multi lora enables seamless multi-image editing with superior detail preservation for professional- | Image-to-Image Transformation | $0.08616205824665676 | 20.43325s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-multi-lora/llms.txt) |
| Qwen Image Edit Plus Multiple Angles | qwen-image-edit-plus-multiple-angle | Transform any image perspective dynamically using natural language for professional-grade results. | Image-to-Image Transformation | $0.09853380492587278 | 22.91636s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-multiple-angle/llms.txt) |
| Qwen Image Edit Plus Next Scene | qwen-image-edit-plus-next-scene | Creates cinematic sequences with seamless visual flow, enhancing storytelling in digital media. | Image-to-Image Transformation | $0.09455228241469817 | 22.30941s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-next-scene/llms.txt) |
| Qwen Image Edit Plus Product Photography | qwen-image-edit-plus-product-photography | Transforms white-background images into immersive, realistic scenes for professional-quality visual storytelling. | Image-to-Image Transformation | $0.08908184624145787 | 20.64443s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-product-photography/llms.txt) |
| Qwen Image Edit Plus Relight | qwen-image-edit-plus-relight | Transform any image with advanced lighting manipulation using natural language prompts, enhancing realism and atmosphere | Image-to-Image Transformation | $0.10313037307692308 | 21.31752s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-relight/llms.txt) |
| Qwen Image Edit Plus Remove Lighting | qwen-image-edit-plus-remove-lighting | Automatically restores natural lighting and removes artificial effects for stunning, professional-quality images. | Image-to-Image Transformation | $0.08661318 | 20.97713s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-remove-lighting/llms.txt) |
| Qwen Image Edit Plus Texture Apply | qwen-image-edit-plus-texture-apply | Seamlessly applies precise textures to images based on natural language prompts for enhanced visual quality. | Image-to-Image Transformation | $0.0997037090909091 | 23.41138s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-texture-apply/llms.txt) |
| Qwen Image Edit Plus Texture Extract | qwen-image-edit-plus-texture-extract | Effortlessly extracts and generates seamless, tileable textures from photographs for digital creatives. | Image-to-Image Transformation | $0.10225743999999999 | 23.38367s | [llms.txt](https://www.segmind.com/models/qwen-image-edit-plus-texture-extract/llms.txt) |
| Relighting | ic-light | Prompts to auto-magically relight your images. | Image-to-Image Transformation | $0.03634656989195906 | 30.6633s | [llms.txt](https://www.segmind.com/models/ic-light/llms.txt) |
| Runway Gen 4 Image | runway-gen4-image | Runway's Gen-4 Image API enables precise, multimodal image generation for innovative creative and technical applications | Image-to-Image Transformation | $0.1 | 30.6423s | [llms.txt](https://www.segmind.com/models/runway-gen4-image/llms.txt) |
| Sam V2 Image | sam-v2-image | SAM v2, the next-gen segmentation model from Meta AI, revolutionizes computer vision. Building on SAM's success, it exce | Image-to-Image Transformation | $0.0017380988645747315 | 1.69324s | [llms.txt](https://www.segmind.com/models/sam-v2-image/llms.txt) |
| Sam3 Image | sam3-image | SAM3 enables precise object segmentation and tracking in images and videos using natural language and visual prompts. | Image-to-Image Transformation | $0.007405221409305461 | 4.93429s | [llms.txt](https://www.segmind.com/models/sam3-image/llms.txt) |
| SD Outpainting | sd1.5-outpaint | Stable Diffusion Outpainting can extend any image in any direction | Image-to-Image Transformation | $0.010914182429067847 | 4.34742s | [llms.txt](https://www.segmind.com/models/sd1.5-outpaint/llms.txt) |
| SD3 Medium Canny Controlnet | sd3-med-canny | Stable Diffusion 3 (SD3) Medium Canny ControlNet uses Canny edge detection to provide fine-grained control over the gene | Image-to-Image Transformation | $0.006618628908091123 | 8.78311s | [llms.txt](https://www.segmind.com/models/sd3-med-canny/llms.txt) |
| SD3 Medium Pose Controlnet | sd3-med-pose | Stable Diffusion 3 (SD3) Pose ControlNet is a large generative image model tailored for generating images based on text  | Image-to-Image Transformation | $0.016609997988165683 | 18.18351s | [llms.txt](https://www.segmind.com/models/sd3-med-pose/llms.txt) |
| SD3 Medium Tile Controlnet | sd3-med-tile | SD3 Medium Tile ControlNet is a large generative image model designed for generating detailed images based on textual pr | Image-to-Image Transformation | $0.0076904571630204656 | 8.94072s | [llms.txt](https://www.segmind.com/models/sd3-med-tile/llms.txt) |
| SDXL Controlnet | sdxl-controlnet | SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept | Image-to-Image Transformation | $0.012359035310441947 | 10.87821s | [llms.txt](https://www.segmind.com/models/sdxl-controlnet/llms.txt) |
| SDXL Img2Img | sdxl-img2img | SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to ge | Image-to-Image Transformation | $0.01946756761119836 | 8.38561s | [llms.txt](https://www.segmind.com/models/sdxl-img2img/llms.txt) |
| SDXL-Openpose | sdxl-openpose | This model leverages SDXL to generate the images with ControlNet conditioned on Human Pose Estimation. | Image-to-Image Transformation | $0.008039939620145518 | 8.33249s | [llms.txt](https://www.segmind.com/models/sdxl-openpose/llms.txt) |
| SeedEdit 3.0 i2i | seededit-v3 | SeedEdit 3.0 enables seamless, high-quality image edits through advanced AI-driven techniques. | Image-to-Image Transformation | $0.04999999999999997 | 10.85232s | [llms.txt](https://www.segmind.com/models/seededit-v3/llms.txt) |
| Seedream 4.0 (4k) | seedream-4 | Seedream 4.0 generates high-resolution, professional-grade visuals with superior text rendering for impactful design. | Image-to-Image Transformation | $0.035060707265736776 | 20.65252s | [llms.txt](https://www.segmind.com/models/seedream-4/llms.txt) |
| Seedream 4.5 | seedream-4.5 | Seedream 4.5 delivers photorealistic image generation with unmatched accuracy and creative control for professional appl | Image-to-Image Transformation | $0.040011297732921544 | 32.12261s | [llms.txt](https://www.segmind.com/models/seedream-4.5/llms.txt) |
| Seedream 5.0 Lite: Image-to-Image | seedream-v5-lite-image-to-image | Transform images intelligently based on detailed prompts, enhancing creativity and precision in visual design. | Image-to-Image Transformation | $0.03500000000000003 | 47.33647s | [llms.txt](https://www.segmind.com/models/seedream-v5-lite-image-to-image/llms.txt) |
| Segment Anything Model | sam-img2img | The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it c | Image-to-Image Transformation | $0.007594711676599742 | 3.80742s | [llms.txt](https://www.segmind.com/models/sam-img2img/llms.txt) |
| Segmind Beyond: Outpaint with Ease | seg-beyond | Effortlessly expand your visuals with AI Image Extend. Intelligently add pixels to any side of your image. | Image-to-Image Transformation | $0.03635655393883225 | 25.35328s | [llms.txt](https://www.segmind.com/models/seg-beyond/llms.txt) |
| Segmind FaceSwap Comic v1 | faceswap-comic | FaceSwap Comic v1 is an AI-powered face swapping model designed to blend real faces into illustrated or cartoon-style im | Image-to-Image Transformation | $0.0747023325521028 | 21.16491s | [llms.txt](https://www.segmind.com/models/faceswap-comic/llms.txt) |
| Segmind Faceswap v4 | faceswap-v4 | Segmind FaceSwap v4 enables fast and precise face or head swapping between images with customizable options for style, o | Image-to-Image Transformation | $0.12336912908154488 | 32.85455s | [llms.txt](https://www.segmind.com/models/faceswap-v4/llms.txt) |
| Segmind Faceswap v5 | faceswap-v5 | Segmind Faceswap v5: Ultra-Fast, Smart Face & Head Swap Model | Image-to-Image Transformation | $0.04997478018907897 | 9.74868s | [llms.txt](https://www.segmind.com/models/faceswap-v5/llms.txt) |
| Segmind Relighting | segmind-relighting | Prompts to auto-magically relight your images. | Image-to-Image Transformation | $0.059251471825063066 | 10.33768s | [llms.txt](https://www.segmind.com/models/segmind-relighting/llms.txt) |
| Segmind Relighting V2 | segmind-relighting-v2 | Transform images with customizable, photorealistic lighting for unparalleled visual creativity and authenticity. | Image-to-Image Transformation | $0.2586717520215633 | 70.75884s | [llms.txt](https://www.segmind.com/models/segmind-relighting-v2/llms.txt) |
| Segmind SceneCraft v0.1 | segmind-scenecraft-v01 | SceneCraft transforms plain or existing product images into visually rich, photorealistic scenes. Whether starting from  | Image-to-Image Transformation | $0.3392611252173913 | 33.64033s | [llms.txt](https://www.segmind.com/models/segmind-scenecraft-v01/llms.txt) |
| Segmind SegFit v1.1 | segfit-v1.1 | Segmind's Fashion and Immersive Try-on model. SegFIT offers effortless AI virtual try-on from just a product image. No m | Image-to-Image Transformation | $0.4522070048380595 | 68.97894s | [llms.txt](https://www.segmind.com/models/segfit-v1.1/llms.txt) |
| Segmind SegFit v1.2 | segfit-v1.2 | SegFit v1.2 creates hyper-realistic virtual try-on images, transforming fashion retail engagement and conversion rates. | Image-to-Image Transformation | $0.09200017946139626 | 51.79312s | [llms.txt](https://www.segmind.com/models/segfit-v1.2/llms.txt) |
| Segmind SegFit v1.3 | segfit-v1.3 | SegFit v1.3 enables hyper-realistic virtual try-ons, enhancing online fashion retail experiences without physical photos | Image-to-Image Transformation | $0.21980462682503907 | 37.13265s | [llms.txt](https://www.segmind.com/models/segfit-v1.3/llms.txt) |
| Segmind SegSwap v0.1 | seg-swap | Swap Objects Instantly. The Segmind SegSwap v0.1 model enables dynamic and precise image editing by allowing users to re | Image-to-Image Transformation | $0.28720555895974353 | 26.39516s | [llms.txt](https://www.segmind.com/models/seg-swap/llms.txt) |
| Skin Contrast Upscaler | skin-contrast-upscaler | Enhances skin detail in images while preserving background quality for professional photography and art. | Image-to-Image Transformation | $0.013147795271453586 | 3.67843s | [llms.txt](https://www.segmind.com/models/skin-contrast-upscaler/llms.txt) |
| SSD Img2Img | ssd-img2img | This model uses SSD-1B to generate images by passing a text prompt and an initial image to condition the generation  | Image-to-Image Transformation | $0.0033121782447356517 | 3.99695s | [llms.txt](https://www.segmind.com/models/ssd-img2img/llms.txt) |
| SSD-Canny | ssd-canny | This model leverages SSD-1B to generate the images with ControlNet conditioned on Canny Images  | Image-to-Image Transformation | $0.006194010348729143 | 5.95972s | [llms.txt](https://www.segmind.com/models/ssd-canny/llms.txt) |
| SSD-Depth | ssd-depth | This model leverages SSD-1B to generate the images with ControlNet conditioned on Depth Estimation | Image-to-Image Transformation | $0.009080595711003317 | 10.7305s | [llms.txt](https://www.segmind.com/models/ssd-depth/llms.txt) |
| Stable Diffusion img2img | sd1.5-img2img | This model uses diffusion-denoising mechanism as first proposed by SDEdit, Stable Diffusion is used for text-guided imag | Image-to-Image Transformation | $0.0037053834433711337 | 7.69591s | [llms.txt](https://www.segmind.com/models/sd1.5-img2img/llms.txt) |
| Story Diffusion | storydiffusion | Story Diffusion turns your written narratives into stunning image sequences. | Image-to-Image Transformation | $0.1656077458767558 | 118.89071s | [llms.txt](https://www.segmind.com/models/storydiffusion/llms.txt) |
| Supir Photo-Realistic Image Restoration | supir | SUPIR restores and enhances images to stunning, photo-realistic quality with advanced AI techniques. | Image-to-Image Transformation | $5 | - | [llms.txt](https://www.segmind.com/models/supir/llms.txt) |
| Text Overlay | text-overlay | Elevate your visuals withText Overlay Model. Easily add customized text to any image, perfect for social media, marketin | Image-to-Image Transformation | $0.0011387493007344802 | 2.1085s | [llms.txt](https://www.segmind.com/models/text-overlay/llms.txt) |
| Topaz Labs Image Upscale | topaz-image-upscale | Topaz Labs image upscale is an industry-leading AI photo upscaler designed to increase the resolution of photos while pr | Image-to-Image Transformation | $0.37459338096632505 | 23.29591s | [llms.txt](https://www.segmind.com/models/topaz-image-upscale/llms.txt) |
| Transparent Background Maker | transparent-background-maker | Transform your images with Transparent Background Maker. Quickly remove backgrounds using AI technology, supporting PNG  | Image-to-Image Transformation | $0.0028397416233371595 | 1.3546s | [llms.txt](https://www.segmind.com/models/transparent-background-maker/llms.txt) |
| Word2img | w2imgsd1.5-img2img | Create beautifully designed words using Segmind’s word to image for your marketing purposes | Image-to-Image Transformation | $0.004746179074561295 | 10.02128s | [llms.txt](https://www.segmind.com/models/w2imgsd1.5-img2img/llms.txt) |

## Text-to-Audio Generation

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| 3B Orpheus TTS (0.1) | orpheus-3b-0.1 | Orpheus TTS is an open-source text-to-speech (TTS) system powered by the Llama 3B language model, designed for high-qual | Text-to-Audio Generation | $0.12419483106004445 | 117.62101s | [llms.txt](https://www.segmind.com/models/orpheus-3b-0.1/llms.txt) |
| Ace Step Music | ace-step-music | ACE-Step generates high-quality music rapidly, enhancing the creative process for developers and artists worldwide. | Text-to-Audio Generation | $0.035209308805031446 | 11.81605s | [llms.txt](https://www.segmind.com/models/ace-step-music/llms.txt) |
| Chatterbox TTS | chatterbox-tts | Chatterbox transforms text into rich, natural speech with adjustable emotional expressiveness for diverse applications. | Text-to-Audio Generation | $0.01984960639470783 | 18.0555s | [llms.txt](https://www.segmind.com/models/chatterbox-tts/llms.txt) |
| Chatterbox Turbo TTS | chatterbox-turbo-tts | Chatterbox-Turbo delivers ultra-fast, high-quality speech synthesis with human-like expressiveness for real-time applica | Text-to-Audio Generation | $0.020990142937853108 | 13.52225s | [llms.txt](https://www.segmind.com/models/chatterbox-turbo-tts/llms.txt) |
| Dia (Text to Speech) | dia | Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and no | Text-to-Audio Generation | $0.06978429896877268 | 89.60217s | [llms.txt](https://www.segmind.com/models/dia/llms.txt) |
| Elevenlabs Dialogue | elevenlabs-dialogue | Transforms text into immersive, emotionally expressive multi-speaker audio dialogues for various media applications. | Text-to-Audio Generation | $0.01871181818181818 | 7.11957s | [llms.txt](https://www.segmind.com/models/elevenlabs-dialogue/llms.txt) |
| ElevenLabs Dubbing | dubbing | Instantly dubs audio and video into 29 languages while preserving each speaker's original voice. | Text-to-Audio Generation | $0.2496798820000001 | 92.70439s | [llms.txt](https://www.segmind.com/models/dubbing/llms.txt) |
| Elevenlabs Sound Generation | sound-generation | Eleven Labs' Sound Generation API provides a robust development tool for programmatically generating audio content using | Text-to-Audio Generation | $0.02650180798180316 | 7.82464s | [llms.txt](https://www.segmind.com/models/sound-generation/llms.txt) |
| Elevenlabs Text To Speech  | tts-eleven-labs | ElevenLabs TTS transforms text into captivating, human-like speech for diverse applications. | Text-to-Audio Generation | $0.09542532397382102 | 12.29174s | [llms.txt](https://www.segmind.com/models/tts-eleven-labs/llms.txt) |
| Gemini TTS 2.5 Flash | gemini-2.5-flash-tts | Gemini 2.5 TTS transforms text into lifelike speech with expressive tones and consistent character voices. | Text-to-Audio Generation | $0.005021123116003387 | 17.9339s | [llms.txt](https://www.segmind.com/models/gemini-2.5-flash-tts/llms.txt) |
| Gemini TTS 2.5 Pro | gemini-2.5-pro-tts | Gemini 2.5 TTS delivers human-like speech synthesis with expressive emotional delivery across multiple languages. | Text-to-Audio Generation | $0.019738471052631577 | 29.99577s | [llms.txt](https://www.segmind.com/models/gemini-2.5-pro-tts/llms.txt) |
| Lyria 2 | lyria-2 | Lyria 2 by Google DeepMind is an advanced model that generates high-fidelity 48kHz stereo instrumental music from text p | Text-to-Audio Generation | $0.08999999999999997 | 27.23475s | [llms.txt](https://www.segmind.com/models/lyria-2/llms.txt) |
| Meta MusicGen Medium | meta-musicgen-medium | MusicGen: Transform text into music with AI. Create unique, high-quality audio from simple descriptions. Experience the  | Text-to-Audio Generation | $0.040399178065054206 | 22.29288s | [llms.txt](https://www.segmind.com/models/meta-musicgen-medium/llms.txt) |
| Minimax Music-01 | minimax-music-01 | Generate up to 60 seconds of music with both accompaniment and vocals in a single pass, with vocals from lyrics and a re | Text-to-Audio Generation | $0.07049529162790698 | 44.29378s | [llms.txt](https://www.segmind.com/models/minimax-music-01/llms.txt) |
| MyShell Text To Speech | myshell-tts | MyShell's Voice Cloning and Text to Speech - Transform your audio content with realistic, personalized voices. Experienc | Text-to-Audio Generation | $0.006335910745629599 | 7.0019s | [llms.txt](https://www.segmind.com/models/myshell-tts/llms.txt) |
| Openvoice | openvoice | OpenVoice is a versatile voice cloning model that supports multiple languages and offers precise tone replication, flexi | Text-to-Audio Generation | $0.008990546133333327 | 10.19799s | [llms.txt](https://www.segmind.com/models/openvoice/llms.txt) |
| Sam Audio Large | sam-audio-large | Isolates any described sound from mixed audio for enhanced editing and analysis. | Text-to-Audio Generation | $0.06613151052631579 | 13.0301s | [llms.txt](https://www.segmind.com/models/sam-audio-large/llms.txt) |
| Veena TTS | veena-tts | Veena transforms text into high-fidelity, expressive speech in Hindi and English for real-time applications. | Text-to-Audio Generation | $0.055781026515151516 | 45.2031s | [llms.txt](https://www.segmind.com/models/veena-tts/llms.txt) |
| VeenaMax TTS | veena-max-tts | VeenaMAX transforms text into expressive, real-time speech across multiple Indian languages for seamless communication. | Text-to-Audio Generation | $0.017146847682119208 | 12.95526s | [llms.txt](https://www.segmind.com/models/veena-max-tts/llms.txt) |

## voice

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Kling Create Voice | kling-create-voice | Kling AI clones voices from a single audio sample for natural-sounding voice experiences. | voice | $0.007 | 26.02065s | [llms.txt](https://www.segmind.com/models/kling-create-voice/llms.txt) |

## imageTo3d

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Hunyuan-3d 2mv | hunyuan3d-2mv | Hunyuan3D-2mv is finetuned from Hunyuan3D-2 to support multiview controlled shape generation. | imageTo3d | $0.30884971006289297 | 100.38197s | [llms.txt](https://www.segmind.com/models/hunyuan3d-2mv/llms.txt) |
| Hunyuan3D-2 | hunyuan-3d-2 | Hunyuan3D 2.0 enables the creation of high-quality 3D models with intricate details. Produce assets that are visually ap | imageTo3d | $0.3450042239694657 | 36.90529s | [llms.txt](https://www.segmind.com/models/hunyuan-3d-2/llms.txt) |
| Hunyuan3d-2.1 | hunyuan3d-2.1 | Transform 2D images into photorealistic, high-fidelity 3D assets effortlessly. | imageTo3d | $0.15740070664062503 | 150.06842s | [llms.txt](https://www.segmind.com/models/hunyuan3d-2.1/llms.txt) |
| Sam 3D Body | sam-3d-body | SAM 3D Body reconstructs detailed 3D human meshes from a single photo, enabling realistic virtual interactions. | imageTo3d | $0.020982718518518517 | 10.33479s | [llms.txt](https://www.segmind.com/models/sam-3d-body/llms.txt) |
| Sam 3D Object | sam-3d-objects | Transforms a single 2D image into detailed 3D models with remarkable accuracy. | imageTo3d | $0.06385386096256684 | 33.22816s | [llms.txt](https://www.segmind.com/models/sam-3d-objects/llms.txt) |

## Audio-to-Text (Transcription)

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Elevenlabs Dialogue With Timing | elevenlabs-dialogue-with-timestamps | Transforms text into emotionally expressive multi-speaker dialogue for immersive audio experiences. | Audio-to-Text (Transcription) | $0.01445625 | 2.49052s | [llms.txt](https://www.segmind.com/models/elevenlabs-dialogue-with-timestamps/llms.txt) |
| Elevenlabs Forced Alignment | elevenlabs-forced-alignment | Achieves precise audio-text synchronization with word-level timestamps for enhanced media accessibility and production. | Audio-to-Text (Transcription) | $0.1 | 0.70002s | [llms.txt](https://www.segmind.com/models/elevenlabs-forced-alignment/llms.txt) |
| Elevenlabs Transcript | eleven-labs-transcript | Transcribe audio to accurate text in 99 languages with speaker diarization and word-level timestamps. | Audio-to-Text (Transcription) | $0.0034717734729493898 | 7.72543s | [llms.txt](https://www.segmind.com/models/eleven-labs-transcript/llms.txt) |
| Elevenlabs Voice Cloning | elevenlabs-voice-clone | ElevenLabs Voice Cloning creates hyper-realistic voice replicas that express emotion and personality. | Audio-to-Text (Transcription) | $0.010000000000000002 | 4.48292s | [llms.txt](https://www.segmind.com/models/elevenlabs-voice-clone/llms.txt) |
| Elevenlabs Voice Design | elevenlabs-voice-design | Generate unique synthetic voices tailored to specific attributes without needing voice samples. | Audio-to-Text (Transcription) | $0.01 | 22.92416s | [llms.txt](https://www.segmind.com/models/elevenlabs-voice-design/llms.txt) |
| TTS Elevenlabs With Timing | tts-elevenlabs-with-timestamps | Transforms text into emotionally expressive audio with unparalleled realism and versatility across languages. | Audio-to-Text (Transcription) | $0.04955625000000001 | 4.78282s | [llms.txt](https://www.segmind.com/models/tts-elevenlabs-with-timestamps/llms.txt) |

## audioToAudio

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Elevenlabs Audio Isolation | elevenlabs-audio-isolation | AI model expertly extracts clear speech from noisy audio and video, enhancing professional audio quality. | audioToAudio | $0.13456178571428573 | 5.28191s | [llms.txt](https://www.segmind.com/models/elevenlabs-audio-isolation/llms.txt) |
| Elevenlabs Speech To Speech | sts-eleven-labs | Eleven Labs Speech-to-Speech offers AI-powered voice conversion for content creators, media professionals, and anyone se | audioToAudio | $0.018750861111111114 | 6.45038s | [llms.txt](https://www.segmind.com/models/sts-eleven-labs/llms.txt) |

## Image-to-Text (Vision)

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Bria Fibo | bria-fibo-generate | Bria FIBO generates photorealistic images from structured prompts with exceptional accuracy and creative control. | Image-to-Text (Vision) | $0.04 | 21.60291s | [llms.txt](https://www.segmind.com/models/bria-fibo-generate/llms.txt) |
| Bria Fibo Structured Prompt | bria-fibo-generate-structured-prompt | Bria FIBO converts complex or nuanced inputs into structured JSON prompts with superior accuracy, returning only the JSO | Image-to-Text (Vision) | $0.01 | 12.50573s | [llms.txt](https://www.segmind.com/models/bria-fibo-generate-structured-prompt/llms.txt) |
| Bria Mask Generator | bria-mask-generator | Bria AI Get Masks automatically generates accurate object masks for advanced image editing and enhancement. | Image-to-Text (Vision) | $0.0012177570093457942 | 6.85549s | [llms.txt](https://www.segmind.com/models/bria-mask-generator/llms.txt) |
| Bria Prompt Enhancer | bria-prompt-enhancer | Bria AI generates high-quality, commercially safe images tailored to diverse creative needs. | Image-to-Text (Vision) | $0.01839506172839506 | 3.63199s | [llms.txt](https://www.segmind.com/models/bria-prompt-enhancer/llms.txt) |
| Google Translate | google-translate | Translate effortlessly with the powerful Google Translation AI model. | Image-to-Text (Vision) | $0.005857775377969762 | 0.65628s | [llms.txt](https://www.segmind.com/models/google-translate/llms.txt) |
| Ideogram Describe | ideogram-describe | Ideogram describe can effortlessly generate detailed prompts from images. Perfect for refining creations or replicating  | Image-to-Text (Vision) | $0.015000000000000001 | 3.93242s | [llms.txt](https://www.segmind.com/models/ideogram-describe/llms.txt) |
| Image Converter | image-converter | Image Converter | Image-to-Text (Vision) | $0.068 | 5.33057s | [llms.txt](https://www.segmind.com/models/image-converter/llms.txt) |
| Image resizer | image-resizer | Image resizer | Image-to-Text (Vision) | $0.033793103448275866 | 3.92873s | [llms.txt](https://www.segmind.com/models/image-resizer/llms.txt) |
| LLAVA 1.6 7B | llava-v1.6 | LLaVa translates images into text descriptions & captions. | Image-to-Text (Vision) | $0.005315814882419757 | 3.58993s | [llms.txt](https://www.segmind.com/models/llava-v1.6/llms.txt) |
| Sam V2.1 Hiera Large | sam-v21-hiera-large | SAM v2, the next-gen segmentation model from Meta AI, revolutionizes computer vision. Building on SAM's success, it exce | Image-to-Text (Vision) | $0.03675347943262411 | 24.96618s | [llms.txt](https://www.segmind.com/models/sam-v21-hiera-large/llms.txt) |
| Video Speed Change | video-speed-change | Video Speed Change | Image-to-Text (Vision) | $0.042207800000000004 | 30.29047s | [llms.txt](https://www.segmind.com/models/video-speed-change/llms.txt) |

## videoToImage

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Frame extractor | frame-extractor | Frame extractor | videoToImage | $0.0054555857142857146 | 26.50857s | [llms.txt](https://www.segmind.com/models/frame-extractor/llms.txt) |
| Start & End Frame Extractor | start-end-frame-extractor | Extract First & Last Frame from Video | videoToImage | $0.004770339784946237 | 4.73238s | [llms.txt](https://www.segmind.com/models/start-end-frame-extractor/llms.txt) |

## textToEmbed

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Text Embedding 3 Large | text-embedding-3-large | Text-embedding-3-large is a robust language model by OpenAI designed for generating high-dimensional text embeddings for | textToEmbed | $0.00002878991811668372 | 1.4802s | [llms.txt](https://www.segmind.com/models/text-embedding-3-large/llms.txt) |
| Text Embedding 3 Small | text-embedding-3-small | Text-embedding-3-small is a compact and efficient model developed for generating high-quality text embeddings. These emb | textToEmbed | $0.00003002396048757936 | 1.26013s | [llms.txt](https://www.segmind.com/models/text-embedding-3-small/llms.txt) |

## imageTOImage

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Stable Diffusion 3 Medium Image to Image | sd3-med-img2img | Stable Diffusion 3 Medium image-to-image is a cutting-edge AI tool that uses advanced image-to-image technology to trans | imageTOImage | $0.007582778266608011 | 7.43717s | [llms.txt](https://www.segmind.com/models/sd3-med-img2img/llms.txt) |

## Image Inpainting

| Model | Slug | Description | Modality | Avg Cost | Avg Latency | API Docs |
| --- | --- | --- | --- | --- | --- | --- |
| Fooocus Inpainting | focus-inpaint | Fooocus Inpainting is a powerful image generation model that allows you to selectively edit and enhance images. | Image Inpainting | $0.02487331967321622 | 17.92475s | [llms.txt](https://www.segmind.com/models/focus-inpaint/llms.txt) |
| SDXL Inpaint | sdxl-inpaint | This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting | Image Inpainting | $0.0068947473495280485 | 8.54611s | [llms.txt](https://www.segmind.com/models/sdxl-inpaint/llms.txt) |
| Stable Diffusion Inpainting | sd1.5-inpainting | Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given | Image Inpainting | $0.0018182413723170508 | 2.8755s | [llms.txt](https://www.segmind.com/models/sd1.5-inpainting/llms.txt) |
| Try-On Diffusion | try-on-diffusion | Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on | Image Inpainting | $0.01108372703507833 | 7.61919s | [llms.txt](https://www.segmind.com/models/try-on-diffusion/llms.txt) |