30 OpenAI Sora Observations

AI video will be the death of stock footage

Feb 18, 2024

Random Observation/Comment #832: Definitely living in a simulation.

Why this List?

How did we go from a wacky Will Smith eating spaghetti to this freaking gorgeous text to video technology from OpenAI Sora? If you missed it, this is one as significant as the first Dall-E release. There will be a lot of production genres impacted by the speed and cost reduction. Here are some notable observations and predictions for when AI-generated video approaches the quality of human-filmed videos.

90% Believable video generations – Based on the examples, I could be fooled by some drone shots or if I weren’t paying attention / actively looking for it within a short clip.
Believable physics – The model must have trained on some Unreal Engine 5 data or something because the water flows and reflections just look incredible.
Consistent characters – The full clips keep the same character face and expressions with consistent attributes throughout the generated video.
Multiple camera angle panning – The clips change camera angles or pan with multiple moving pieces in a very smooth way as if each object was interacting with the scene.
60 second generations – If you can do 60 seconds, you can probably do 2 hours. It’s not the 4-16 second RunwayML gen2 outputs.
You can take a reference video and change attributes – There was a great demo of a car driving a road where you can edit the environment or type of car using the reference shot.
RIP stock footage – Sizzle reels have always used short 2-4 second stock video of pans to introduce videos or locations. Now you can make pretty much any generic scene without doing multi-scene edit cuts.
Creator economy democratization – Technology let’s you do more with less. We are really allowing everyone to not just be a creator, but create based on their inherent choices.
From searching knowledge to prompting your imagination – Instead of finding content someone online has made and refining search terms to get different results, you’re using an AI agent to do the search for you. Does this kill YouTube?
Content explosion – All the tools and skills are a few questions away to being fully generated.
Need for curators and brands – While we’ll see that content volume and velocity increase, we’ll also find trusted sources that help search and surface the best content. We actually don’t want that much choice.
We are the Directors – We will likely be able to watch movies from any person’s perspective or get live generated scenes like in that black mirror episode.
Shifting/Streamlining production costs – Movie and professional cinematic material may be shifting towards the less expensive and higher outreach of short form content.
Personalized content – Blogpost hero images very quickly shifted towards funny and creative AI generated images for the writer. I could probably ask to learn anything and have it live generated.
Concerns for misinformation and deep fakes – There’s even more scrutiny needed as misinformation tends to emphasize our own biases. Perhaps I may share a tweet/x post of events or news happening because they confirmed my existing bias to want them to happen. It can get quite dangerous quite quickly if both the left and right pick up their own manipulated versions of news.
Concerns for deflation – If I have work that can be done faster and cheaper then the Operating Expenses (OpEx) are shifted from Employees to Infrastructure/Software. If there’s fewer job openings or even need for fewer people doing a “good enough” job then Venture Capital cash may decrease their check sizes. If you decrease the check written to produce content then you start paying a whole industry of actors and people less money or no money at all. It might, however, lead to a higher volume of bets towards smaller companies forming overnight.
Renouncing responsibility and credibility – If my sex tape leaks then maybe I can just say it was AI generated.
Unimaginable creative expression – Those shots that were too costly to produce are now in reach. If you want the whole movie to be a one-shot then maybe you can just prompt it to be fully generated this way.
Impact to communication and storytelling – I think my love language is sending Instagram Reels to my wife that describes our love of cats, our daughter, or some home improvement plan. Perhaps now I’ll quickly generate a gif or video that represents my feelings.
Higher quality of marketing and advertising – I wonder how AI would look when it’s better than human-edited/directed content. I wonder if the next dimension of complexity is needed so we aren’t limited by just a screen.
Metaverse/VR/AR/XR enhancements – If I can convert and generate new worlds then we may see the content more deeply experienced in more personal ways. My AI Tamagotchi in my Quest 3 will basically search and show me everything (like Janet in the Good Place).
Interactive content creation – Conversations with AI Non-Playable Characters (NPCs) may now require some convincing before getting the answers to get the next clue. We may see a level of interaction for movies as closely knitted as games. Every player would play a different game all together with very little deterministic outcome.
More watermarking/AI created designations – The opt-in approach of labeling is a bit crazy because the tools themselves will be using AI in different ways too. Perhaps it’d be easier to assume that all content will eventually become AI-digitally manipulated and we just label the content that’s been created by humans.
Expensive boutique “organic” only non-AI firms making material – I wonder if I would pay more for the whole production or a single editor. If it’s a volume game then we’re likely going to want a fully established brand with an arc or template of shorts rather than a series.
Fashion and Design – Will our virtual avatars and models be accurate enough to buy these assets from the virtual world to receive it in the real world? Will we get fully virtual runways.
Legal and Forensic Reconstruction – How interesting would it be in a jury watching the gathered evidence from a crime scene rebuilding a full video from multiple angles.
News personalization – Instead of having an RSS Feed of articles, imagine having a podcast or live newscaster show you clips and videos of that feed at the end or beginning of your day. It’s a visual summary of your apps trying to give you the latest information
Tourism and Exploration – I can already see a lot of the collected location data being useful here, but now all travel photos and videos are not even necessary to capture. Maybe this will be a good thing and we can learn to live in the present again.
Need for conversion from Video/Images to 3D polygons – One of the shortcomings of prompting is that the underlying process for AI generation is not (yet) structured into layer generation. If I wanted to update a text field or do something simple like replace backgrounds then you’d still need to import this information into a file format that has more metadata. The benefit of having APIs and that level of granularity inside of a tool like Canva and Adobe would be the generation of images within the capabilities of a designer (rather than a regular person just blindly using a tool). This would be the “interoperability” buzzword for connecting pixels to manipulated models.
This is the worst the tech will be from here on out.

~See Lemons Scared of AI Video
Originally posted on seelemons.com

Life in Lists of 30

Discussion about this post