1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/llama-v3-8b-instruct";
const data = {
"messages": [
{
"role": "user",
"content": "tell me a joke on cats"
},
{
"role": "assistant",
"content": "here is a joke about cats..."
},
{
"role": "user",
"content": "now a joke on dogs"
}
]
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();
An array of objects containing the role and content
Could be "user", "assistant" or "system".
A string containing the user's query or the assistant's response.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Meta Llama 3 8b, is a game-changer in the world of large language models (LLMs). Developed by Meta AI, it's designed to be open-source and accessible, making it a valuable tool for developers, researchers, and businesses alike. Meta Llama 3 is a foundational system, meaning it serves as a base for building even more advanced AI applications.
Focus on Accessibility
Open-source: Unlike many powerful LLMs, Meta Llama 3 is freely available for anyone to use and modify. This fosters innovation and collaboration within the AI community.
Scalability: Llama 3 comes in two sizes: 8B and 70B parameters. This allows users to choose the version that best suits their needs and computational resources.
Enhanced Capabilities
Efficient Tokenizer: Meta Llama 3 uses a tokenizer with a vocabulary of 128,000 tokens. This allows it to encode language effectively, leading to improved performance compared to previous models.
Grouped Query Attention (GQA): This technique improves the efficiency of the model during the inference stage, making it faster to process information and generate responses.