OpenAI’s New Text-to-Video Tool is a Game-Changer. Here’s How it Works
If OpenAI’s Chat GPT-4 wasn’t impressive enough, then Sora, their new text-to-video tool certainly is. This ground-breaking technology produces lifelike content that is hard to believe was generated by only a text prompt of sentence or two.
Here’s how it works (without getting too technical) and what it could mean for the future of content creation.
Sora’s Demo Videos are Stunning
On OpenAI’s website, you can view the first videos generated by Sora that have been shown to the public. As you scroll down, OpenAI provides comments on each batch of videos, discussing the technology’s use cases, capabilities, and limitations.
They also provide a Safety section in an attempt to get out in front of any concerns about people abusing the technology, such as creating videos using someone’s likeness without their consent, or generating violent or inappropriate content.
How Can You Tell an AI-Generated Video from an Authentic One?
On their website, OpenAI describes how their videos incorporate C2PA, which creates meta data in their videos and images that defines its origin. Although this will tell you if a video was created by Sora initially, once posted on social media or copied through a screenshot or screen record, the C2PA goes away.
This could create major problems moving forward, so OpenAI says they’re also developing software that will be able to identify AI-generated videos and images without the need for C2PA.
How Does Sora Actually Work?
Sora is a diffusion model—technically a diffusion transformer, which refers to the architecture its based on. To summarize, Sora begins its process of creating a video with static, and then refines it into something that resembles the prompt you’ve given it. It uses its understanding of language to determine what you mean, and then uses a dataset of video and image content to portray the scene you’ve described.
Sora’s dataset is based on video and images it's gathered from the internet. It essentially breaks all of this content into blocks that it calls “spacetime patches.” These are what it uses to assemble videos. After powerful computing that refines the images, the result can be strikingly lifelike and realistic.
What Else Can Sora Do?
Apart from creating videos from text prompts, Sora can turn images into videos. When the user uploads an image, Sora can bring it to life and turn it into a video.
Sora can also extend the length of existing videos. So if you upload a video and want it to include a part that isn’t normally there, Sora can do that. However, examples of this are still very limited.
What are Sora’s Limitations?
One of Sora’s main issues is being able to imitate the movement of people’s hands in a natural way. If you look closely at the video examples they’ve provided, people’s hand movements are at least a little off. In some examples the hands move very strangely.
Sora also sometimes creates motion that isn’t physically possible, as they’ve demonstrated on their website with a man facing the wrong way on a treadmill.
And at this point, videos are limited to a max of one minute in length, although considering how quickly this technology has developed, we can expect that to be surpassed very soon. After all, only one year ago, an AI-generated video of Will Smith eating pasta was about the best that text-to-video AI generators could manage.
Lastly, and this is more of a critique than it is a limitation, some of the videos are downright creepy. On the OpenAI website, one of first the videos that they chose to demonstrate the power Sora is a short video of a collection of old televisions piled on top of one another as an exhibit in a museum. On each of the screens are supposed to be clips from old sitcoms and old horror movies from the classic era. The result is a bunch of weird and rather disturbing images of people’s faces morphing into creatures and unnatural, creepy mashups and morphing.
How Can Sora Be Utilized Once it’s Released to the Public?
The applications for this technology are broad-ranging. Stock video for things like advertisements and digital signage can be obtained at the click of a button. Sora will be a marketer’s dream once it’s fully released.
Entertainment will be another one of the ways that Sora will be used. It’ll be the source of viral videos at the very least. But once the technology is really refined, you could possibly even create your movies with it. Imagine giving a prompt that you want another sequel to your favorite film franchise except with you as the main character.
At minimum, Sora should be really cool to toy with and create content. If it reaches its full potential though, Sora will be a total game-changer for content creators, and could be the start of a new era.