POST

javascript

const axios = require('axios');

const fs = require('fs');
const path = require('path');

async function toB64(imgPath) {
    const data = fs.readFileSync(path.resolve(imgPath));
    return Buffer.from(data).toString('base64');
}

const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/llava-v1.6";

const data = {
  "images": toB64('https://segmind-sd-models.s3.amazonaws.com/display_images/llava-input.jpg'),
  "prompt": "Describe meta data of the image in this format, keep them short and factually correct:\n1. Category,\n2. Primary Colors,\n3. Additional Colors,\n4. Primary Material,\n5. Secondary Materials,\n6. Style and a couple others if you can find any according to the product.\n give it in json"
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

imagesimage *

Input Image.

promptstr *

Prompt to send to the model.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

LLAVA 1.6 7B

LLAVA 1.6 7B model is based on Large Language and Vision Assistant (LLaVa), a cutting-edge multimodal transformer model designed for tasks requiring both image and text understanding. Its core strength lies in its ability to process visual data and translate it into comprehensive textual descriptions or captions. This makes LLaVa a valuable tool for various applications, including:

Image Captioning: LLaVa excels at generating natural language descriptions of images. By analyzing the visual elements within an image, it can produce concise yet informative captions that capture the scene's content and context. This functionality is particularly beneficial for tasks like automatic alt text generation, improving image accessibility and searchability.
Visual Question Answering: LLaVa's ability to understand both image and text allows it to answer questions directly related to the visual content. This opens doors for applications in image retrieval systems or educational settings where users can ask questions about an image to gain deeper understanding.
Text Prompt Generation: LLaVa can be leveraged to streamline the generation of text prompts based on image content. This is particularly useful for text-to-image generation tasks, where a well-defined prompt is crucial for producing high-quality results. LLaVa can analyze the image and provide a detailed textual description that serves as a strong foundation for the text-to-image model.

Popular Models

SadTalker Audio-based Lip Synchronization for Talking Head Video

Stable Diffusion XL 1.0 The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training