27. 08. 2024 Charles Callaway Documentation

Using AI to Create Tutorial Videos

Confession time. I can easily spend between 1 and 2 weeks creating a 5 minute long video. 3 weeks if I’m being a perfectionist. Of course, those videos are awesome, and are highly tailored to a specific audience. They say what I want to say.

In this blog I’m always looking into efficiency and productivity gains, and wouldn’t it be great if that 2 or 3 weeks became 0 weeks?

Well, who hasn’t heard about developments in AI over the last year? Even if you can’t tell the difference between deep learning and transformers (hint), you’ve seen the whimsical images and maybe you’ve padded out some text on a webpage or two, or even say, learned some Latin.

Of course what we’re interested in for the purposes of this blog post is whether we can use generative AI to make a tutorial video. Not a virally funny video, or a dramatic movie, but an educational video that can teach you how to use a product, or explain a technical concept, or show the steps in a procedure.

But what exactly can current AI do in this case? And by that I mean, what’s available now? Can you just go to a webpage and say “Make me a tutorial video about IT monitoring”? Hold on to that thought while we look at what needs to be done to make a tutorial video.

Tutorial Making Steps

Here’s our high-level list for AI:

  • Create a script
  • Convert that script to audio (for us it would be video)
  • Create an avatar that can read the script
  • Create relevant graphics, animations and B-Roll video in appropriate places
  • Composite all the elements into a single video
  • Synchronize graphics and animations to the script
  • Optional: add sound effects and background music

Why do we need an avatar (or person) there? Well, look at your YouTube main page and check out how many thumbnails include a face. Videos with an actual person involved are perceived as more authoritative, relatable and immersive.

That said, as the AI here you can cut some corners. You can make your users write their own script, choose their own B-Roll, etc. It all depends on what parameters your AI model has been trained on, the resources it has to draw on (e.g., choosing music vs. composing it), etc.

Think Through the Elements

Input: This ranges from a prompt of a few sentences all the way up to submitting an entire script you’ve written.

Script: If you just submit a short prompt, then the system will have to generate the script from that prompt. It might pass your prompt on to an external system like ChatGPT or CoPilot, or have an in-house generator that’s either self-trained or open source like Llama.

Audio content: Just about every AI system runs the text through TTS (text-to-speech) synthesis. You may even have the option of multilingual audio, but with the automatic multilingual captioning already in use on video platforms, that’s probably not necessary for low-view-count tutorial videos.

Visual content: The range of visual effects includes graphics, animated graphics, videos (full screen or not), and scene transitions. Not all AI systems will can include the full range.

Editability: Once your video is ready, is it a finished product like an MP4, or is it a project you can import into your video editor. If it’s the former, the most you can do to modify it are to add transparent objects on top. I’d prefer a project with embedded resources so I can tweak things.

Expectations: There are a lot of companies and startups with AI solutions. And even one of the big tech AI systems isn’t going to produce a video at human-level performance right now. So don’t get your hopes up too high, below I’m just going to try out some of the free demos found in a Google search.

Specific Examples

For this blog post I really wanted to see what was publicly (and freely) available. Of course that meant giving out my email address. So I’m taking one for the team, here.

When an AI system wanted a prompt, I used “technology tutorial video talking about IT monitoring, with a male middle aged host”. Other systems wanted a full script, but the demo limited the number of characters. In those cases I just copied a section from one of my existing scripts.

To pick some AI generators, I searched on Google for “AI tutorial video generator” and picked 4 of the results more or less at random. In all cases I tried the demo version only, which obviously don’t have the entire feature set of the full version.

So let’s take a look at a representative sample, then we’ll draw some quick conclusions.

Invideo

Website URL: https://ai.invideo.io/
Videos: https://www.youtube.com/watch?v=yOu0PYVmYbw

  • Inputs: Prompt, video size parameters, social media destination
  • Result: Created a 1 minute FullHD, downloadable video, taking about 3 minutes
  • Video description: Using the prompt, it writes the script and selects stock videos from online sites, then synthesizes the script and syncs it to the videos with cuts between the videos. Unlike other systems, it didn’t add an avatar, you only see stock B-Roll video with a voiceover.
  • Thoughts: Most of the AI is in the B-Roll selection, script writing and voice synthesis. I didn’t see in the demo version any intelligence in ordering the video segments. However the videos were relevant. The most helpful part of the AI here for me would be if I could quickly and easily find useful videos from stock video sites, which is a task I could easily spend hours on, but with Invideo it takes only minutes.

Synthesia

Website URL: https://www.synthesia.io/
Videos: https://www.youtube.com/channel/UC0Rqs6pyPoGaMT5HFMFdslg

  • Input: Full pre-written script (200 character limit in demo mode)
  • Result: Created a 12 second long 720×400 video in about 7 minutes, link sent by email
  • Video description: The video stars an avatar that’s based on videos of an actual person, in front of a background. The AI synthesizes audio from the script and lip syncs it to the avatar. The full version lets you choose from multiple avatars and use a longer script.
  • Thoughts: The demo version is very limited compared to the real version. If I wanted to do everything automatically I would first find another website to generate the script.

Canva

Website URL: https://www.canva.com/
Videos: https://www.youtube.com/watch?v=bETbdPU8BAE

  • Input: A prompt
  • Result: Created a 4 second long FullHD video in about 2 minutes, downloadable
  • Video description: Canva created a photo-realistic AI video of a male middle aged host at a computer desk (just like I asked), moving and speaking very slowly. The demo version didn’t come with the audio though.
  • Thoughts: Judging just from the demo version, this is really oriented towards creating stock videos that you can then use to incorporate in other videos. The video Canva created really was photo-realistic, and if you’re good with AI prompts, you can probably easily make a B-Roll video for when you can’t find the exact one you need on a stock site.

Elia

Website URL: https://app.elia.io/
Video: https://www.youtube.com/watch?v=CNWmlXf-RBs

  • Input: A pre-written script
  • Result: Creates a FullHD video of an avatar speaking the script using TTS
  • Video description: Elia’s video had an avatar synchronized with the synthesized audio of my script. The avatar look more natural than others I’ve seen, with
  • Thoughts: It’s part of an online video editing suite, and you can drop the avatar video on top of it with a transparent background

Conclusions

That’s what I’ve found. There sure do seem to be a lot of these websites out there, so you may find one even more interesting. So how can the current crop of AI movie generators help us?

Of course the ideal option would be to use a prompt generation system so that we don’t have to write it ourselves. But unless you write a very long, detailed prompt, you’re only going to get a very generic script.

But then why don’t we just write the script instead of writing that long prompt? At least then it will say exactly what we want to say, how we want to say it.

More importantly, none of the AI systems above creates a project, they only make MP4 videos. That means if you want to make some changes, you’re going to have to rewrite your prompt again and again.

But we’re in the early stages of AI video production. In the next few years we’ll find out if they move from being toys to tools.

Charles Callaway

Charles Callaway

Author

Charles Callaway

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive