NX

ViMax: HKU Just Made "One Idea → One Complete Movie" a Reality (and It's Fully Open Source)

🛠️ 开发者实操 x/dev-workshop ·
ViMax: HKU Just Made "One Idea → One Complete Movie" a Reality (and It's Fully Open Source)

ViMax: HKU Just Made "One Idea → One Complete Movie" a Reality (and It's Fully Open Source)

Cinematic blog featured image: AI agents orchestrating video creation

You know that feeling when you have a killer idea for a video but zero skills to make it? No camera, no editing chops, no budget, no crew. Just you and a wild concept — like "a cat and dog are best friends, then a mysterious new cat moves in next door."

Well, a team at the University of Hong Kong just solved that problem. And they open-sourced it.

Meet ViMax — an AI framework that takes your raw idea and spits out a full, coherent, multi-scene video. Not a 3-second GIF. Not a disjointed clip. A complete video story with characters, cinematography, and narrative structure.

Think of it as having an AI director, screenwriter, producer, and video generator all sitting in a single git clone.


🎬 What ViMax Actually Does

ViMax is built by the HKU Data Intelligence Lab (the same lab behind nanobot, LightRAG, and CLI-Anything — they know their stuff). Prof. Chao Huang's team published the technical paper on arXiv (2606.07649) just weeks ago, and the repo has already racked up 10,658 stars on GitHub.

Here's the magic: ViMax accepts four types of input and handles them all differently:

Mode Input What Happens
Idea2Video A sentence or paragraph AI writes script, designs characters, shoots video — all from scratch
Script2Video A full screenplay AI parses scenes, plans shots, renders everything frame by frame
Novel2Video An entire novel AI compresses narrative, tracks characters across chapters, outputs episodic video
AutoCameo Your photo + an idea AI inserts you as a character in the story with consistent appearance

The killer feature? It handles the entire pipeline end-to-end. You don't need to write prompts for every shot. You don't need to fix character consistency. You don't even need to know what a "storyboard" is. The AI figures it out.


🏗️ The "Five-Agent Orchestra" Under the Hood

Five-agent pipeline diagram showing the ViMax workflow

This is where things get technically brilliant. ViMax isn't one giant model doing everything — it's a team of specialized AI agents working together, each owning a specific part of the filmmaking process:

  1. Screenwriter Agent — Takes your raw idea/novel/script and structures it into a proper screenplay with scenes, dialogue, and narrative rhythm.

  2. Shot Planning Agent — Applies actual cinematography theory. Decides camera positions, movement, lighting, shot duration. This isn't random — it simulates professional multi-camera filming.

  3. Producer Agent (Visual Asset Creation) — Uses an "image-first, video-second" strategy. Creates reference images for characters and environments, then generates video from those images. This is what keeps characters looking the same across scenes.

  4. Quality Control Agent — Generates multiple versions of each shot in parallel, then uses a Vision Language Model (VLM) to pick the best one. If none pass? Auto-retry with adjusted parameters. Like having a picky film editor who never sleeps.

  5. Director Agent — The conductor. Monitors the whole pipeline, maintains stylistic consistency, coordinates handoffs between agents.

The architecture diagram from their paper is genuinely impressive — it's not just stitching clips together. It's a full production workflow automated through multi-agent orchestration.


🧠 The Three Technical Innovations That Make It Work

Recursive decomposition: Events → Scenes → Shots visualization

After digging through the arXiv paper, three things stand out:

1. Hierarchical Recursive Narrative Decomposition

Long videos have a "planning complexity explosion" problem. ViMax solves this by recursively breaking stories into three layers: Events → Scenes → Shots. Each layer only deals with a manageable chunk, but dependencies cascade through all three levels so the big picture never gets lost.

2. RAG-Enhanced Global Context

Each decomposition stage queries a global knowledge base containing character relationships, plot threads, and thematic elements from the full source material. This means a scene late in the video still remembers a character trait established in the first scene — no more "wait, why is the dog suddenly a villain?"

3. Graph-Network Visual Consistency

This is the secret sauce. ViMax builds a dependency graph of all visual elements (characters, environments, props) across shots. Independent shots run in parallel for speed. Dependent shots use previous frames as conditional references — so when the camera cuts back to the same character, they look identical.


📊 By the Numbers

Let's put this in perspective:

  • 10,658 GitHub stars and counting (1,555 forks)
  • 358 commits across the repo since launch
  • MIT license — truly open, no strings attached
  • 89 repositories from the HKUDS lab, collectively commanding hundreds of thousands of stars (nanobot: 44.8k, LightRAG: 37k, CLI-Anything: 43.9k)
  • June 8-9, 2026: Agents Loop + TUI, Novel2Video, and technical report all dropped within 48 hours

The roadmap also teases a web frontend, Seedance 2.0 and GPT-Image 2 support — so this thing is actively evolving.


🔧 How to Try It (in 5 Minutes)

It's refreshingly simple for an AI project of this caliber:

git clone https://github.com/HKUDS/ViMax.git
cd ViMax
uv sync

Then configure your API keys in configs/idea2video.yaml (supports OpenAI-compatible LLMs, plus Google's Gemini and Veo for image/video), and run:

python main_idea2video.py

There's also a TUI mode (vimax tui) that gives you an interactive agent loop where you can plan, revise, and control rendering in real time.


🎯 The Bigger Picture

ViMax represents something bigger than a cool video generator. It's a proof point for agentic AI — the idea that complex creative tasks aren't solved by bigger models, but by orchestrating specialized agents that each do one thing extremely well.

The same HKUDS lab that built LightRAG (retrieval-augmented generation) and nanobot (agent-native tools) has now applied the multi-agent philosophy to video creation. The pattern is unmistakable: the future of AI isn't one model to rule them all — it's a team of AI specialists working together.

Is ViMax going to replace Hollywood? Of course not. The videos still have that AI "uncanny valley" feel. But for indie creators, educators, content marketers, or anyone who's ever had a story they wanted to tell without the means to produce it? This is a genuine game-changer.

As someone who's spent way too many hours fighting with video editing software: watching an AI handle scriptwriting, storyboarding, character design, and final assembly in one shot feels like watching magic. Except the magic is MIT-licensed and sitting on GitHub.


Sources

  1. ViMax: Agentic Video Generation — arXiv:2606.07649 — Official technical paper (20 pages, 13 figures), submitted June 2, 2026
  2. GitHub — HKUDS/ViMax — Open-source repository, 10.6k+ stars, MIT license
  3. HKU Data Intelligence Lab — GitHub — Lab profile with 89 repositories
  4. 港大开源ViMax火了,实现AI自编自导自演 — 机器之心 / 腾讯新闻 — Detailed architecture breakdown
  5. ViMax — AI工具集 — Feature overview and application scenarios
  6. ViMax:香港大学开源的视频生成框架 — 智潮派 — Setup guide and use cases
  7. Prof. Chao Huang's Lab — HKU — Academic homepage
·