Hunyuan3D-2

Hunyuan3D 2.0 enables the creation of high-quality 3D models with intricate details. Produce assets that are visually appealing and suitable for professional use.

Playground API Pricing

Playground

Try the model in real time below.

FEATURES

PixelFlow allows you to use all these features

Unlock the full potential of generative AI with Segmind. Create stunning visuals and innovative designs with total creative control. Take advantage of powerful development tools to automate processes and models, elevating your creative workflow.

Segmented Creation Workflow

Gain greater control by dividing the creative process into distinct steps, refining each phase.

Customized Output

Customize at various stages, from initial generation to final adjustments, ensuring tailored creative outputs.

Layering Different Models

Integrate and utilize multiple models simultaneously, producing complex and polished creative results.

Workflow APIs

Deploy Pixelflows as APIs quickly, without server setup, ensuring scalability and efficiency.

Hunyuan3D 2.0

Hunyuan3D 2.0 is an advanced 3D synthesis system designed for generating high-resolution, textured 3D assets. It consists of two main components: Hunyuan3D-DiT, a shape generation model, and Hunyuan3D-Paint, a texture synthesis mode. The system uses a two-stage pipeline: first, a bare mesh is created, and then a texture map is synthesized for the mesh. This approach decouples the complexities of shape and texture generation and allows for texturing of both generated and handcrafted meshes.

Key Features of Hunyuan3D 2.0

Shape Generation

It is a large-scale flow-based diffusion model.
It uses a Hunyuan3D-ShapeVAE autoencoder to capture fine-grained details on meshes. The ShapeVAE employs vector sets and an importance sampling method to extract representative features and capture details such as edges and corners. It uses 3D coordinates and the normal vector of point clouds sampled from the surface of 3D shapes as inputs for the encoder and instructs the decoder to predict the Signed Distance Function (SDF) of the 3D shape, which can be further decoded into triangle mesh.
A dual-single stream transformer is built on the latent space of the VAE with a flow-matching objective.
It is trained to predict object token sequences from a user-provided image and the predicted tokens are further decoded into a polygon mesh with the VAE decoder.
It uses a pre-trained image encoder (DINOv2 Giant) to extract conditional image tokens, processing images at 518 x 518 resolution. The background of the input image is removed, and the object is resized and repositioned to remove the negative impacts of the background.

Texture Synthesis

It uses a mesh-conditioned multi-view generation pipeline.
It uses a three-stage framework: pre-processing, multi-view image synthesis, and texture baking.
An image delighting module is used to convert the input image to an unlit state to produce light-invariant texture maps. The multi-view generation model is trained on white-light illuminated images, enabling illumination-invariant texture synthesis.
A geometry-aware viewpoint selection strategy is employed, selecting 8 to 12 viewpoints for texture synthesis.
A double-stream image conditioning reference-net, a multi-task attention mechanism, and a strategy for geometry and view conditioning are also used in this pipeline.
It uses a multi-task attention mechanism with reference and multi-view attention modules to ensure consistency across generated views.
It conditions the model with multi-view canonical normal maps and coordinate maps. A learnable camera embedding is used to boost the viewpoint clue for the multi-view diffusion model.
It uses dense-view inference with a view dropout strategy to enhance 3D perception.
A single-image super-resolution model is used to enhance texture quality, and an inpainting approach is applied to fill any uncovered patches on the texture map.
It is compatible with text- and image-to-texture generation, utilizing T2I models and conditional generation modules

Performance and Evaluation

Hunyuan3D 2.0 outperforms previous state-of-the-art models in geometry detail, condition alignment, and texture quality.
Hunyuan3D-ShapeVAE surpasses other methods in shape reconstruction.
Hunyuan3D-DiT produces more accurate condition following results compared to other methods.
The system produces high-quality textured 3D assets.
User studies demonstrate the superiority of Hunyuan3D 2.0 in terms of visual quality and adherence to image conditions

Other Popular Models

sdxl-img2img

SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

fooocus

Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

faceswap-v2

Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl1.0-txt2img

The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

F.A.Q.

Frequently Asked Questions

Take creative control today and thrive.

Start building with a free account or consult an expert for your Pro or Enterprise needs. Segmind's tools empower you to transform your creative visions into reality.