Confession time. I can easily spend between 1 and 2 weeks creating a 5 minute long video. 3 weeks if I’m being a perfectionist. Of course, those videos are awesome, and are highly tailored to a specific audience. They say what I want to say.
In this blog I’m always looking into efficiency and productivity gains, and wouldn’t it be great if that 2 or 3 weeks became 0 weeks?
Well, who hasn’t heard about developments in AI over the last year? Even if you can’t tell the difference between deep learning and transformers (hint), you’ve seen the whimsical images and maybe you’ve padded out some text on a webpage or two, or even say, learned some Latin.
Of course what we’re interested in for the purposes of this blog post is whether we can use generative AI to make a tutorial video. Not a virally funny video, or a dramatic movie, but an educational video that can teach you how to use a product, or explain a technical concept, or show the steps in a procedure.
But what exactly can current AI do in this case? And by that I mean, what’s available now? Can you just go to a webpage and say “Make me a tutorial video about IT monitoring”? Hold on to that thought while we look at what needs to be done to make a tutorial video.
Here’s our high-level list for AI:
Why do we need an avatar (or person) there? Well, look at your YouTube main page and check out how many thumbnails include a face. Videos with an actual person involved are perceived as more authoritative, relatable and immersive.
That said, as the AI here you can cut some corners. You can make your users write their own script, choose their own B-Roll, etc. It all depends on what parameters your AI model has been trained on, the resources it has to draw on (e.g., choosing music vs. composing it), etc.
Input: This ranges from a prompt of a few sentences all the way up to submitting an entire script you’ve written.
Script: If you just submit a short prompt, then the system will have to generate the script from that prompt. It might pass your prompt on to an external system like ChatGPT or CoPilot, or have an in-house generator that’s either self-trained or open source like Llama.
Audio content: Just about every AI system runs the text through TTS (text-to-speech) synthesis. You may even have the option of multilingual audio, but with the automatic multilingual captioning already in use on video platforms, that’s probably not necessary for low-view-count tutorial videos.
Visual content: The range of visual effects includes graphics, animated graphics, videos (full screen or not), and scene transitions. Not all AI systems will can include the full range.
Editability: Once your video is ready, is it a finished product like an MP4, or is it a project you can import into your video editor. If it’s the former, the most you can do to modify it are to add transparent objects on top. I’d prefer a project with embedded resources so I can tweak things.
Expectations: There are a lot of companies and startups with AI solutions. And even one of the big tech AI systems isn’t going to produce a video at human-level performance right now. So don’t get your hopes up too high, below I’m just going to try out some of the free demos found in a Google search.
For this blog post I really wanted to see what was publicly (and freely) available. Of course that meant giving out my email address. So I’m taking one for the team, here.
When an AI system wanted a prompt, I used “technology tutorial video talking about IT monitoring, with a male middle aged host”. Other systems wanted a full script, but the demo limited the number of characters. In those cases I just copied a section from one of my existing scripts.
To pick some AI generators, I searched on Google for “AI tutorial video generator” and picked 4 of the results more or less at random. In all cases I tried the demo version only, which obviously don’t have the entire feature set of the full version.
So let’s take a look at a representative sample, then we’ll draw some quick conclusions.
Website URL: https://ai.invideo.io/
Videos: https://www.youtube.com/watch?v=yOu0PYVmYbw
Website URL: https://www.synthesia.io/
Videos: https://www.youtube.com/channel/UC0Rqs6pyPoGaMT5HFMFdslg
Website URL: https://www.canva.com/
Videos: https://www.youtube.com/watch?v=bETbdPU8BAE
Website URL: https://app.elia.io/
Video: https://www.youtube.com/watch?v=CNWmlXf-RBs
That’s what I’ve found. There sure do seem to be a lot of these websites out there, so you may find one even more interesting. So how can the current crop of AI movie generators help us?
Of course the ideal option would be to use a prompt generation system so that we don’t have to write it ourselves. But unless you write a very long, detailed prompt, you’re only going to get a very generic script.
But then why don’t we just write the script instead of writing that long prompt? At least then it will say exactly what we want to say, how we want to say it.
More importantly, none of the AI systems above creates a project, they only make MP4 videos. That means if you want to make some changes, you’re going to have to rewrite your prompt again and again.
But we’re in the early stages of AI video production. In the next few years we’ll find out if they move from being toys to tools.