Model 2.0 enables long-form, lifelike video generation
SAN FRANCISCO, Nov. 19, 2025 /PRNewswire/ — CraftStory, a pioneer in realistic AI-generated human video, today announced the release of Model 2.0, a next-generation model that creates expressive, human-centric videos up to five minutes — setting a new benchmark for AI video realism and duration.
The Challenge of Scaling Human Video Production
Companies and studios are stuck between costly, slow live shoots and short AI clips that break down—making it hard to scale product education, training, and localized content. They need minutes-long, consistent human video they can update instantly without reshoots.
Model 2.0 fills that gap, turning scripts or reference footage into studio-quality performances at production speed—enabling companies to produce compelling storytelling, product demos, and training videos with unprecedented fidelity and continuity.
Introducing Studio-Quality AI Performances
CraftStory Model 2.0 is currently a video-to-video model; it takes an image and a driving video as an input, and generates an output video from the image, using the person’s motion from the driving video. A user can upload their own video or use one of the preset videos supplied by CraftStory. The model was trained to preserve identity, emotion, and nuance — even across multi-minute sequences.
The breakthrough behind Model 2.0 lies in a new parallelized diffusion pipeline, developed by CraftStory’s research team. This innovation allows diffusion models to process different segments of a video simultaneously while maintaining visual coherence across frames — a key challenge in long-form video synthesis. Model 2.0 was further refined on high-frame-rate (HFR) footage of real people, including expressive hand and body movements, ensuring smooth, lifelike motion and natural facial dynamics. The output video is available in both landscape and portrait formats, at 480p and 720p resolutions — with 720p videos optionally upscalable to 1080p. The system can generate a low-resolution 30-second clip in about 15 minutes — a major step toward real-time, studio-quality video production.
Model 2.0 also includes an advanced lip-sync system that turns any script or audio track into a realistic performance. A built-in gesture alignment algorithm ensures that body movements naturally match speech rhythm and emotion — bringing human expressiveness to AI-generated content.
Founded by the Creators of OpenCV
CraftStory was founded by a team of computer vision experts who helped develop and maintain the world-renowned Open Source Computer Vision Library (OpenCV). The same team also created Avatar SDK, a leading tool for generating 3D avatars from selfies, used across gaming, AR/VR, and digital entertainment industries.
“AI-generated video will soon become the primary way companies communicate their stories,” said Victor Erukhimov, Founder and CEO of CraftStory, who previously sold his computer vision startup to Intel. “With Model 2.0, we’re making it possible to create long-form, studio-quality videos that truly engage audiences. We believe our model provides an unprecedented level of control over content — including the movement and expressiveness of the person on screen.”
CraftStory launched after raising a round of $2M led by Andrew Filev, the founder of Zencoder, previously of Wrike (acquired by Citrix for $2.25B).
“Victor and I have known each other for a long time, and we share the passion for AI and are both very enthusiastic about AI video generation technology and business opportunities.” — said Filev. “One huge gap in this market is the lack of models that can generate consistent videos over longer sequences — and that’s extremely important for real-world use. If you’re creating a commercial for your company, a 10-second video, no matter how good it looks, just isn’t enough. You need 30 seconds, you need 2 minutes — you need more.”
“We’re excited about the release of CraftStory Model 2.0 and the other tech Victor’s team is cooking up behind the scenes,” said Radu B. Rusu, Managing Partner at Cox Exponential (CX2), Cox Enterprises’s early stage venture arm. “We’re eager to support CraftStory in its journey as it pushes the boundaries of what’s possible in generative media.”
The CraftStory team is now developing a text-to-video model that will enable users to generate long-form videos directly from a written script. They are also adding support for moving-camera scenarios, including the popular “walk-and-talk” format — bringing cinematic storytelling closer to full automation.
To try Model 2.0 please visit https://craftstory.com/.
About CraftStory
CraftStory is a pioneer in realistic AI-generated human video, founded by the creators of OpenCV. The company enables businesses to create studio-quality, long-form videos at scale using AI. For more information, please visit https://craftstory.com/
Follow CraftStory on X, YouTube, LinkedIn, and Instagram
SOURCE CraftStory

