YouTube demands answers on Sora’s training data

YouTube demands answers on Sora’s training data

Source Node: 2536831

YouTube’s CEO Neal Mohan recently fired a warning shot at OpenAI over its Sora AI tool. Sora, known for its AI trick of turning text into videos, has stirred up a storm over where it gets its learning material.

Mohan’s point is clear: YouTube doesn’t want AI systems like Sora swiping its content. It’s like saying, “Hands off our stuff!” But here’s the twist—Google, YouTube’s parent company, does exactly that with publisher data for its own AI projects, just a reminder.

Hey YouTube, even the OpenAI CTO does not know what data Sora was trained with

The situation unfolded when Joanna Stern, a tech reporter from The Wall Street Journal, asked Mira Murati, the Chief Technology Officer (CTO) of OpenAI, about the data used to train Sora, OpenAI’s text-to-video generation tool, during a video interview in March. Murati hesitated to provide a clear answer, indicating a lack of transparency regarding the source of Sora’s training data. The only thing Muratti accept is they were used data from Shutterstock.

[embedded content]

Following this, YouTube CEO Neal Mohan further shed light on the issue during an interview with Bloomberg’s Emily Chang. Mohan emphasized YouTube’s terms of service, stating that creators who upload content to the platform expect their work to be treated according to these terms. Mohan specifically mentioned that YouTube’s terms prohibit actions like downloading transcripts or video segments without proper authorization. Therefore, he indicated that using YouTube’s content without permission, as in the case of OpenAI’s Sora, would violate these terms.

[embedded content]

This exchange highlighted the clash between OpenAI and YouTube regarding the use of YouTube’s content for training AI models like Sora. Mohan’s comments reinforced YouTube’s stance on protecting content creators’ rights and maintaining adherence to its terms of service.

Overall, the incident brought attention to ethical considerations surrounding the use of data for AI training purposes and raised questions about transparency and accountability in the development and deployment of AI technologies.

Meanwhile in the OpenAI universe: Keep hustling

Although details about Sora’s training data remain mysterious, OpenAI has offered a glimpse of its capabilities through a captivating music video collaboration with indie artist August Kamp. Seems like YouTube’s CEO Neal Mohan’s statement not affected OpenAI’s creative spirit.

Titled “Worldweight,” the music video showcases dreamlike visuals reminiscent of electronic music legends like Boards of Canada and Aphex Twin. Scenes of giant crystals, glowing plants, and underwater worlds blend seamlessly, creating a mesmerizing experience.

[embedded content]

For August Kamp, using Sora to visualize her music video was a revelation, allowing her to share her song’s essence with the world in a new way.

Despite the excitement, concerns about Sora’s impact on creativity and legality persist. OpenAI’s CTO, Mira Murati, revealed that Sora was trained on a mix of public and licensed data, but specific sources like YouTube and Instagram remain undisclosed.


When will Sora be available to the public?


As discussions about Sora continue, it’s evident that the technology represents a significant shift in creative expression. With its potential to democratize storytelling, Sora opens doors to a future where AI and human creativity collaborate seamlessly. While Sora’s mysterious nature may raise questions, its emergence marks an exciting chapter in the journey of AI-driven innovation.

Time Stamp:

More from Dataconomy