Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods.
"A squirrel with red sweater." (Squirrel)
"Squirtle." (Squirrel)
"A fox." (Squirrel)
"A bear with red hat." (Cat Pikachu)
"Holstein." (Cat Pikachu)
"A toy dinosaur." (Penguin)
"A steampunk penguin." (Penguin)
"Pine tree in snow." (Manipulator)
"A cat with armour." (Cat Pikachu)
"Doraemon." (Penguin)
"Eagle with crown." (Finch)
"Penguin." (Bird)
"A cat with armour." (Cat Coco)
Ours (Squirrel)
BANMo (Squirrel)
Ours (Cat Coco)
BANMo (Cat Coco)
Ours (Penguin)
BANMo (Penguin)
Ours (Cat Pikachu)
Ours (Manipulator)
Ours (Hand)
Ours (Knight)
Ours (Bird)
Ours (Finch)
Squirrel
Cat Pikachu
Penguin
Cat Coco
Bird
Finch
Knight
Dog Shiba
There are a lot of excellent works that are related to our works.
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
A framework for 4D reconstruction from monocular videos
MVDream: Multi-view Diffusion for 3D Generation
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation