Sora OpenAI: Bright or Daunting future?

OpenAI recently announced a new product called Sora, which is a text-to-video tool. It is one of the most advanced AI tools available, and while I am excited about it, I am also a bit scared.

The demos on OpenAI’s Sora website make it seem like a tool straight out of a Black Mirror episode. Similar technology has been released before, but nothing quite like this.

How does Sora work?

Sora is a diffusion model, that generates a video by starting with one that looks like static noise and gradually transforms it by removing the noise over many steps.

The model uses a transformer architecture, similar to GPT models, which improves its scaling performance. Sora breaks down videos and images into smaller units called patches, similar to GPT’s tokens. This approach enables Sora to train on a diverse range of visual data, accommodating various durations, resolutions, and aspect ratios.

Sora’s development incorporates learnings from DALL·E and GPT models. It uses DALL·E 3’s recaptioning technique, which generates detailed captions for visual data, helping the model to better follow text instructions in video generation.

How can I use Sora?

Sora is currently unavailable to the public. OpenAI has stated that it is available to red teamers to assess critical areas for harm or risks. Additionally, they are granting access to several visual artists, designers, and filmmakers.

If you come across anyone online selling access to Sora, be aware that it is a scam. Do not buy it.

Sora Examples

Sora makes it easy to create videos without any CGI, even with multiple characters and complex camera motion. While some examples may still be recognizable as AI-generated content, others are almost indistinguishable. Here are a few screenshots of the videos. For more, visit Sora’s website.

Prompt: Step-printing scene of a person running, a cinematic film shot in 35mm.

Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.

Prompt: A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood..

Sora Concerns

The tool is very advanced, but there is a concern that people may use it to create hateful content. OpenAI states that this text-to-video model will have new techniques to prepare for deployment and leverage existing safety methods built for products that use DALL·E 3, which are also applicable to Sora.

For instance, their text classifier will check and reject text input prompts that violate usage policies, such as those requesting extreme violence, sexual content, hateful imagery, celebrity likeness, or the intellectual property of others.

It is currently unclear how this will work in reality. Obtaining information from custom GPTs is relatively easy by manipulating ChatGPT.

Sora Community Concerns

With the announcement of Sora, there are concerns about job losses as it appears that this tool can replace actors, cameramen, and animators.

Some argue that AI-generated content will not be as good as human creations, while others are excited about the prospect of creating videos without the need for a large budget or expensive equipment.

So here are a few of the most interesting takes I found online:

“Deepfakes will be harder to point out. This has political and revenge porn consequences for them. The era of disinformation is here and AI is helping, at a rapid pace, by becoming harder to detect.”

“I animate because I love to animate. Not for money, not for fame. I am as affected by a computer being better than me as I am affected by other humans being better than me. It doesn’t discourage me, in fact, it makes me want to be better.”

“At this point I’ve basically resigned myself to the “wait and see” approach that the rest of humanity has toward AI.”

I agree with the last opinion. New tools are being announced and launched frequently, making it difficult to keep up. As the last comment stated, we will have to wait and see how Sora impacts our lives.

When does Sora release?

Unfortunately, there is no release date for Sora, so we can only hope that it will be released in 2024.

Final thoughts

I thought 2023 was a big year for artificial intelligence. I couldn’t be more wrong. I used to dismiss generative AI tools for video creation because many of them were subpar and not worth anyone’s attention.

My experience with Midjourney, for instance, revealed that it struggles to create realistic images of multiple characters, but it performs well with two or three people. On the other hand, OpenAI’s text-to-video model, Sora, is so advanced that it’s difficult to tell whether the output is AI-generated, even when depicting a large crowd.

I am a bit concerned about people creating hateful or harmful content. We will have to see how OpenAI addresses important safety steps ahead. If they do not, a lot of chaos will appear.

Overall, it is one of the most impressive generative AI tools. I can’t wait to try it out myself. I am also very interested in the price of this tool.