Tech Corner: Roll Your Own Captions – Part One

by Michael Lockrey

This is the first part of a series of short articles which aim to show how to create your own captions. It is aimed squarely at the complete beginner or novice. By the end of the step-by-step project, I guarantee that you will have the knowledge to “roll your own captions”.

As an advocate for captioning accessibility, it’s all too common to come up against the following responses from governments, businesses and other organizations when seeking access using captioning. Some examples:

  1. We’ll look into it for you or we’re currently looking into it/working on it, etc (real meaning: we’re not going to do anything and we hope you just go away! / leave us alone!)
  2. We can’t possibly afford to do this (real meaning: we have no idea what this would cost but it looks far too expensive!)
  3. We don’t have to do this under the law (real meaning: we haven’t even checked with our lawyers but it’s a good excuse isn’t it?)
  4. No response at all (real meaning: you can get #$%#$#^!)

(Note: Paraphrasing John Waldo above, with thanks.)

The biggest issue I faced as an advocate about three years ago was that I didn’t have a basic understanding of the technical side of creating captions, so I vowed to teach myself how to create captions and I’ve achieved this goal, and have even recently launched my own captioning and web accessibility business, Melel Media.

With the introduction of Google’s voice recognition captioning tools on YouTube there are now no excuses for not captioning Internet content. It is now extremely easy to create closed captions of a good quality (i.e., pre-prepared, “block” captions that come up two or three lines at a time, in sync with the audio track).

The first step is to create a plain-text transcript of the media you want to caption. In our experience this represents about 70%-80% of the time spent on creating pre-prepared, “block” closed captions. We outsource this step to a company called Casting Words, paying $1.50 to $2.50 per audio minute using their high-quality and professional text transcription service. Another good service is SpeakerText. If you can source good quality text transcripts for around $1.00 per audio minute or less then you are doing very well and have found very good value for money.

Other cheaper options include using family and friends who can hear well. Melel Media has often used family and friends and we’ve found it’s a particularly useful opportunity to “grill them” on their spelling abilities (or lack thereof). I’m talking about you, young nephew!

You can see an example of a basic plain text transcript used in our “Gone with the Wind” captioning example below:

[Scarlett O'Hara:]
Rhett! Rhett!
If you go, where shall I go?
What shall I do?

[Rhett Butler:]
Frankly, my dear,
I don’t give a damn.

[stirring music]


  1. Download a short YouTube video which doesn’t have closed captions and which is under a minute in duration.
  2. Use or any other programs you regularly use to download a copy of the video off the internet.
  3. Create a plain text transcript of the audio track for the video you’ve downloaded, either using an outsourced service provider, such as Casting Words or by using your own residual hearing or family and friends. For Mac OS X users, I recommend using the “TextEdit” application. For PC users, the “Notepad” application should be fine for this step.

For different speakers, place their names in between brackets such as:

[Michael Lockrey:] To be or not to be.
[Bozo the Clown:] That is the question.


For audio effects, use a description between brackets such as:

[uptempo instrumental music]
[buzzer on oven rings out]


  1. Save the text transcript and wait eagerly for Part 2 of this series when we will create our closed caption file.

Note: If you have any questions on the assignment, send me an email at: and I would be more than happy to help you through the steps in readiness for Part 2 of the “Roll Your Own Captions” Project

