✨ PUSA V1.0 ✨

🎬 Revolutionary Video Generation with Vectorized Timestep Adaptation

🔥 BREAKTHROUGH PERFORMANCE: Surpassing Wan-I2V on Vbench-I2V with only $500 training cost! 🔥

🚀 4 Powerful Modes: I2V • Multi-Frame • V2V • T2V 🚀

💎 State-of-the-Art • ⚡ Lightning Fast • 🎯 Precision Control • 🌟 Professional Quality

Image-to-Video Generation (I2V)

Generate videos from a single starting image. Perfect for bringing static images to life with natural motion and animation.

📷 Input Image

Upload Single Image

⚙️ Generation Parameters

Noise Multiplier

Controls how faithful the generation is to the input image (0=faithful, 1=creative)

0 1

LoRA Alpha

Controls temporal consistency (1-2 recommended)

0.5 3

Inference Steps

10 50

📝 Text Prompts

Prompt

Negative Prompt

📹 Output

Generated Video

Status

🎭 Demo Examples

🎬 Demo Gallery - See Pusa V1.0 in Action!

Explore real examples showcasing the power and versatility of Pusa V1.0 across different generation modes.

📂 Note: Demo files should be placed in ./demos/ and ./assets/ directories to display properly.

📷➡️🎬 Image-to-Video Generation Example

🖼️ Input Image

Monk Meditation Scene

Settings Used:

Prompt: "A wide-angle shot shows a serene monk meditating perched a top of the letter E of a pile of weathered rocks that vertically spell out 'ZEN'. The rock formation is perched atop a misty mountain peak at sunrise..."
Conditioning Position: 0 (first frame)
Noise Multiplier: 0.2
LoRA Alpha: 1.4
Inference Steps: 30
File Path: ./demos/input_image.jpg

🎥 Generated Video

I2V Result - Single Image Animation

📖 About Pusa V1.0

Pusa V1.0 leverages vectorized timestep adaptation (VTA) for fine-grained temporal control within a unified video diffusion framework. The model achieves unprecedented efficiency, surpassing Wan-I2V on Vbench-I2V with only $500 training cost and 4k data.

💡 Pro Tips for Best Results

🎚️ LoRA Alpha: Use values between 1-2 for optimal balance between quality and consistency

🔊 Noise Multipliers: Lower values (0.0-0.3) for faithful conditioning, higher values (0.4-1.0) for more variation

📍 Conditioning Positions: Frame 0 is first frame, frame 20 is last frame in the 21-frame latent space

✍️ Prompts: Be descriptive and specific for better results

🔗 Important Links

🌐 Project Page - Official project website

📄 Technical Report - Detailed research paper

🤗 Model on HuggingFace - Download models

📚 Training Dataset - Training data

✨ Made with ❤️ for the AI Community ✨

Experience the future of video generation with Pusa V1.0 🚀