close
close

China develops its own version of OpenAI's text-to-video model

[Source]

A group of researchers and artificial intelligence (AI) experts are working together to develop China's answer to Sora, OpenAI's highly anticipated text-to-video model.

What is it: Professors at Peking University and Rabbitpre, a Shenzhen-based AI company, announced their collaboration on Friday in a GitHub post they called Open-Sora. The project was made possible by the Rabbitpre AIGC Joint Lab, a joint project between the company and the university's graduate school.

According to the team, Open-Sora aims to “reproduce OpenAI's video generation model” with a “simple and scalable” repository. The group is seeking support from the open source community for development.

Progress so far: Using a three-part framework consisting of Video VQ-VAE, Denoising Diffusion Transformer, and Condition Encoder, the group successfully generated samples with different aspect ratios, resolutions, and durations for reconstructed videos, including 10- and 18-second clips.

Trending at NextShark: Owner of a sushi restaurant in NYC accuses Wegmans of stealing his concept and trade secrets

About Sora: Unveiled on February 15, Sora is OpenAI's first text-to-video model that can instantly create high-quality, realistic videos using text prompts. So far, the videos can be up to one minute long.

Although the technology has been announced, OpenAI said there are no plans to make Sora available for general use anytime soon. The company still needs to address several issues, such as reducing misinformation, hateful content and bias, and properly labeling the finished product.

Trending at NextShark: Watch: Sydney Sweeney raves about “straight” Bowen Yang in hilarious “SNL” sketch

What's next: Rabbitpre AIGC Joint Lab has outlined some of its future plans for Open-Sora, including creating a code base and training an unconditional model on landscape datasets. Then, as part of their primary project phases, the group plans to train models to improve resolution and duration.

The team also plans to conduct experiments on a text-to-video landscape dataset, train their model on a video-to-text dataset at 1080p (1920 x 1080) resolution, and develop a control model with additional conditions.

Trend at NextShark: Indian painter receives hands of brain-dead woman in rare bilateral hand transplant

Download the NextShark app:

Want to stay up to date with news from the Asian community? Download the NextShark app today!