API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/hunyuan-3d-2"
# Request payload
data = {
"image": image_url_to_base64("https://i.ibb.co/8nbymYTS/hunyuan-image.png"), # Or use image_file_to_base64("IMAGE_PATH")
"octree_resolution": 256,
"num_inference_steps": 30,
"guidance_scale": 5.5,
"seed": 12467,
"face_count": 40000,
"texture": False
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Input Image.
Prompt to render (optional when image is used)
Higher resolution gives betterquality but slower processing.
Allowed values:
Number of inference steps.
min : 20,
max : 50
Scale for classifier-free guidance
min : 1,
max : 15
Seed for image generation.
min : -1,
max : 999999999999999
Only used if texture=true, maximum number of faces in the mesh.
min : 5000,
max : 100000
Whether to apply texture to the generated mesh.
GLB file, only needed if modifying existing mesh
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Hunyuan3D 2.0
Hunyuan3D 2.0 is an advanced 3D synthesis system designed for generating high-resolution, textured 3D assets. It consists of two main components: Hunyuan3D-DiT, a shape generation model, and Hunyuan3D-Paint, a texture synthesis mode. The system uses a two-stage pipeline: first, a bare mesh is created, and then a texture map is synthesized for the mesh. This approach decouples the complexities of shape and texture generation and allows for texturing of both generated and handcrafted meshes.
Key Features of Hunyuan3D 2.0
Shape Generation
-
It is a large-scale flow-based diffusion model.
-
It uses a Hunyuan3D-ShapeVAE autoencoder to capture fine-grained details on meshes. The ShapeVAE employs vector sets and an importance sampling method to extract representative features and capture details such as edges and corners. It uses 3D coordinates and the normal vector of point clouds sampled from the surface of 3D shapes as inputs for the encoder and instructs the decoder to predict the Signed Distance Function (SDF) of the 3D shape, which can be further decoded into triangle mesh.
-
A dual-single stream transformer is built on the latent space of the VAE with a flow-matching objective.
-
It is trained to predict object token sequences from a user-provided image and the predicted tokens are further decoded into a polygon mesh with the VAE decoder.
-
It uses a pre-trained image encoder (DINOv2 Giant) to extract conditional image tokens, processing images at 518 x 518 resolution. The background of the input image is removed, and the object is resized and repositioned to remove the negative impacts of the background.
Texture Synthesis
-
It uses a mesh-conditioned multi-view generation pipeline.
-
It uses a three-stage framework: pre-processing, multi-view image synthesis, and texture baking.
-
An image delighting module is used to convert the input image to an unlit state to produce light-invariant texture maps. The multi-view generation model is trained on white-light illuminated images, enabling illumination-invariant texture synthesis.
-
A geometry-aware viewpoint selection strategy is employed, selecting 8 to 12 viewpoints for texture synthesis.
-
A double-stream image conditioning reference-net, a multi-task attention mechanism, and a strategy for geometry and view conditioning are also used in this pipeline.
-
It uses a multi-task attention mechanism with reference and multi-view attention modules to ensure consistency across generated views.
-
It conditions the model with multi-view canonical normal maps and coordinate maps. A learnable camera embedding is used to boost the viewpoint clue for the multi-view diffusion model.
-
It uses dense-view inference with a view dropout strategy to enhance 3D perception.
-
A single-image super-resolution model is used to enhance texture quality, and an inpainting approach is applied to fill any uncovered patches on the texture map.
-
It is compatible with text- and image-to-texture generation, utilizing T2I models and conditional generation modules
Performance and Evaluation
-
Hunyuan3D 2.0 outperforms previous state-of-the-art models in geometry detail, condition alignment, and texture quality.
-
Hunyuan3D-ShapeVAE surpasses other methods in shape reconstruction.
-
Hunyuan3D-DiT produces more accurate condition following results compared to other methods.
-
The system produces high-quality textured 3D assets.
-
User studies demonstrate the superiority of Hunyuan3D 2.0 in terms of visual quality and adherence to image conditions
Other Popular Models
sdxl-img2img
SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

fooocus
Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

faceswap-v2
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl1.0-txt2img
The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software
