Llama 3.2 90B Vision Instruct

Experience the cutting edge of AI with Llama 3.2-90B Vision-Instruct. This 90B parameter multimodal LLM excels at image understanding, reasoning, captioning, and more.

Playground

API

If you're looking for an API, you can choose from your desired programming language.

POST

const axios = require('axios');
const fs = require('fs');
const path = require('path');

// helper function to help you convert your local images into base64 format
async function toB64(imgPath) {
    const data = fs.readFileSync(path.resolve(imgPath));
    return Buffer.from(data).toString('base64');
}

const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/llama-v3p2-90b-vision-instruct";

const data = {
  
  "messages": [
    {
            "role": "user",
            "content": "tell me a joke on cats"
        },
        {
            "role": "assistant",
            "content": "here is a joke about cats..."
        },
       {
            "role": "user",
            "content": "now a joke on dogs"
        }
  ]
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

application/json

HTTP Response Codes

200 - OKResponse Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

messagesArray

An array of objects containing the role and content

rolestr

Could be "user", "assistant" or "system".

contentstr

A string containing the user's query or the assistant's response.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Llama 3.2-90B Vision-Instruct

The Llama 3.2-90B Vision-Instruct is a multimodal large language model (LLM) developed by Meta. It is engineered to process both textual and visual inputs, providing advanced capabilities in areas such as image understanding and reasoning

Key Features of Llama 3.2-90B Vision-Instruct

Parameter Count: The model consists of 90 billion parameters (88.8 billion).
Input Modalities: Supports text and image inputs, enabling versatile applications.
Output Modality: Generates text outputs, making it suitable for a wide range of tasks.
Architecture: Built upon the Llama 3.1 text-only model, enhanced with a vision adapter. The vision adapter employs cross-attention layers to integrate image encoder representations into the core LLM.
Context Length: Features a 128k context length.

Technical Specifications

Training Data: Trained on a dataset of 6 billion image and text pairs.
Data Cutoff: The pretraining data has a cutoff of December 2023.
Instruction Tuning: Fine-tuned using publicly available vision instruction datasets and over 3 million synthetically generated examples, combining supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF)

Intended Use Cases

The model is optimized for visual recognition, image reasoning, captioning, and question answering about image.

Other Popular Models

sdxl-controlnet

SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process