โœจ PUSA V1.0 โœจ

๐ŸŽฌ Revolutionary Video Generation with Vectorized Timestep Adaptation

๐Ÿ”ฅ BREAKTHROUGH PERFORMANCE: Surpassing Wan-I2V on Vbench-I2V with only $500 training cost! ๐Ÿ”ฅ

๐Ÿš€ 4 Powerful Modes: I2V โ€ข Multi-Frame โ€ข V2V โ€ข T2V ๐Ÿš€

๐Ÿ’Ž State-of-the-Art โ€ข โšก Lightning Fast โ€ข ๐ŸŽฏ Precision Control โ€ข ๐ŸŒŸ Professional Quality

Image-to-Video Generation (I2V)

Generate videos from a single starting image. Perfect for bringing static images to life with natural motion and animation.

๐Ÿ“ท Input Image

โš™๏ธ Generation Parameters

0 1
0.5 3
10 50

๐Ÿ“ Text Prompts

๐Ÿ“น Output

๐ŸŽญ Demo Examples

Prompt: "A wide-angle shot shows a serene monk meditating with gentle swaying and peaceful movement..."

  • Noise Multiplier: 0.2
  • LoRA Alpha: 1.4

Prompt: "A female climber rock climbing on an asteroid in deep space with dynamic movement..."

  • Noise Multiplier: 0.3
  • LoRA Alpha: 1.2

๐ŸŽฌ Demo Gallery - See Pusa V1.0 in Action!

Explore real examples showcasing the power and versatility of Pusa V1.0 across different generation modes.

๐Ÿ“‚ Note: Demo files should be placed in ./demos/ and ./assets/ directories to display properly.

๐Ÿ“ทโžก๏ธ๐ŸŽฌ Image-to-Video Generation Example

๐Ÿ–ผ๏ธ Input Image

Settings Used:

  • Prompt: "A wide-angle shot shows a serene monk meditating perched a top of the letter E of a pile of weathered rocks that vertically spell out 'ZEN'. The rock formation is perched atop a misty mountain peak at sunrise..."
  • Conditioning Position: 0 (first frame)
  • Noise Multiplier: 0.2
  • LoRA Alpha: 1.4
  • Inference Steps: 30
  • File Path: ./demos/input_image.jpg

๐ŸŽฅ Generated Video

๐Ÿ“– About Pusa V1.0

Pusa V1.0 leverages vectorized timestep adaptation (VTA) for fine-grained temporal control within a unified video diffusion framework. The model achieves unprecedented efficiency, surpassing Wan-I2V on Vbench-I2V with only $500 training cost and 4k data.

๐Ÿ’ก Pro Tips for Best Results

๐ŸŽš๏ธ LoRA Alpha: Use values between 1-2 for optimal balance between quality and consistency

๐Ÿ”Š Noise Multipliers: Lower values (0.0-0.3) for faithful conditioning, higher values (0.4-1.0) for more variation

๐Ÿ“ Conditioning Positions: Frame 0 is first frame, frame 20 is last frame in the 21-frame latent space

โœ๏ธ Prompts: Be descriptive and specific for better results

๐Ÿ”— Important Links

๐ŸŒ Project Page - Official project website

๐Ÿ“„ Technical Report - Detailed research paper

๐Ÿค— Model on HuggingFace - Download models

๐Ÿ“š Training Dataset - Training data

โœจ Made with โค๏ธ for the AI Community โœจ

Experience the future of video generation with Pusa V1.0 ๐Ÿš€