Video generation models as world simulators

Video generation models as world simulators

Source Node: 2483166

This technical report focuses on (1) our method for turning visual data of all types into a unified representation that enables large-scale training of generative models, and (2) qualitative evaluation of Soraā€™s capabilities and limitations. Model and implementation details are not included in this report.

Much prior work has studied generative modeling of video data using a variety of methods, including recurrent networks,[^1][^2] generative adversarial networks,[^4][^6] autoregressive transformers,[^8] and diffusion models.[^10][^12] These works often focus on a narrow category of visual data, on shorter videos, or on videos of a fixed size. Sora is a generalist model of visual dataā€”it can generate videos and images spanning diverse durations, aspect ratios and resolutions, up to a full minute of high definition video.

Time Stamp:

More from OpenAI