Some Captioning Best Practices


By Chelsea, last updated Monday, March 24, 2025.

Plain language summary

People who can't hear still want to watch videos. They rely on written captions to understand what is happening. People who make captions need to do lots of things besides just write out the text. By identifying speakers, writing out sounds, and formatting things clearly, captions can provide the same information to people who can't hear.

What are Captions?

Captions are a synchronized transcription of audio content. You might more commonly know them as 'subtitles.' They were designed as an accessibility tool for Deaf and Hard of Hearing folks.

Nowadays, large swaths of the population use captions. WestCentralOnline reports that 53% of Millennials use captions regularly, and 70% of Gen Z viewers use them frequently. Streaming services are reporting that more and more viewers turn on captions by default.

At some point in your life, you might be responsible for creating captions for a piece of media. This article will go over some not-so-obvious things that must be considered when you write captions for Deaf and Hard of Hearing audiences.

1. Speaker changes should be indicated with dashes

Say we have a recording from an improv show, and two improvisors are taking turns speaking four words.

Here's what not to do: ❌


Welcome to our little

House on the prairie,

Where all magic happens

And super smiles abound!

Instead, use a dash to indicate that the speaker has changed, like so: ✅


- Welcome to our little

- House on the prairie,

- Where all magic happens

- And super smiles abound!

2. Speakers that can't be visually identified should be identified in text

Let's imagine a very similar scenario, except the video is from a different angle. They're filming the audiences' reactions, and the improvisors are both offscreen.

We need to clarify who is speaking and we do so by putting the speaker's name in between these square brackets characters [ ].

Remember to still put the dashes for speaker changes! ✅


- [Sam] Welcome to our little

- [Ali] House on the prairie,

- [Sam] Where all magic happens

- [Ali] And super smiles abound!

Let's imagine another angle. This time, the video was taken by Ali's girlfriend, who has Ali up in full zoom. Ali's girlfriend has fully excluded Sam from her shot.

This means Ali is a visually identifiable speaker, while Sam cannot be visually identified. This should be reflected like this in our captions: ✅


- [Sam] Welcome to our little

- House on the prairie,

- [Sam] Where all magic happens

- And super smiles abound!

3. If someone intentionally speaks a word, caption it

Filler words exist in English. Generally, you're going to want to cut out things like excessive 'um' and 'huh,' if there are filler words to the point where the subtitles are unreadable. Leaving a few in is not bad. Sometimes people will lean on a filler word for emphasis, and you definitely do not want to cut that out.

Punctuate as best as you can to keep the text readable.

If they speak a contraction, caption the contraction. If they do not use a contraction, do not use the contraction.

If a swear pops up and it is uncensored, type it out uncensored. If it is censored, replace it with an appropriate sound like (beep).

If you really cannot determine what the word was and you have no way of figuring it out, replace the word with the keyword (indistinct).

The following example demonstrates a couple of these principles including the inclusion of a filler word, not censoring spelling, proper punctuation, and a use of (indistinct). ✅


- Ali! I have told you literally 
so many times not to knock

on the door of the Crone Hagmother.

- I'm sorry.

- Fucking (indistinct) force-of-nature!

I hope this is not going to-

- Um, Sam, did you hear that? 

(string winding)


4. Break up text into chunks that support readability

Let's talk about the anatomy of a caption.

A caption can have one line, or two lines. In the previous example, the first caption had two lines, and the rest had one.

Each line should be no more than 60 characters long. It's not a hard, hard cut off (sometimes you can go one or two characters over), but stick to 60 if you can.

There are plenty of sentences that are more than 60 characters long. This means that we must break up the text into separate lines and captions.

In breaking up the text, we want to maintain a high level of readability. I've chosen to break up Sam's monologue in three ways. Which one is most readable to you? about it.

Here's an example

Option #1


- Everybody knows literally everybody 
in Winnipeg except for us. 

I'm sure that the Crone Hagmother 
wields people like us into her trap, 

forcing us to knock on her door 
and play servant to her whims. 

I think I know the answer. 

I think we have to have a baptism 
in the muddy waters of the Red River. 


Option #2


- Everybody knows literally everybody in 
Winnipeg except for us. I'm sure 

that the Crone Hagmother wields people 
like us into her trap, forcing us to 

knock on her door and play servant to 
her whims. I think I know the answer. 

I think we have to have a baptism in 
the muddy waters of the Red River. 


Option #3


- Everybody knows literally 
everybody in Winnipeg except for us. 

I'm sure that the Crone Hagmother wields 
people like us into her trap, 

forcing us to knock on her door and 
play servant to her whims. 

I think I know the answer. 

I think we have to have a baptism in 
the muddy waters of the Red River. 


Ready for the answer?

...

...

...

It's a close race between Options 1 and 3, but Option #1 is the winner here.

Option #2's fatal flaw is continuing a new sentence in the same line after the sentence has finished. I often see this in subtitles where a YouTuber has simply dumped their script into YouTube's subtitle editor and not bothered to chunk it out into logical blocks.

Between Option #1 and Option #3, I do a few techniques in #1 that I do not do in #3.

  1. I try to preserve adjective groupings ('literally everybody')
  2. I try to start new lines on a preposition ('in the muddy waters')
  3. I try to start new lines on a conjunction ('and play servant')

Users typically report that these techniques improve readability. Do you feel that it helps you decipher what's going on?

5. Provide text equivalents for non-speech sounds

Let's say a fire alarm rang out mid performance. This is pretty crucial information, and our Deaf and Hard of Hearing listeners are going to want to know about it.

Here's an example of how one might do it. ✅


(fire alarm blares)

- [Manager] Everyone please evacuate,
this is not a drill. 

Exits are at the back. 

(crowd chatters)

(chairs screeching)

(clown horn squeaks)

- [Manager] Please don't touch the props. 

Stylistically, it is good to write out the cause of the sound using a quick 'noun verb' construct written in the present tense. Do this instead of onomatopoeia. It is much harder to understand what is going on.

Don't do onomatopoeia. ❌


(weeeeeewooo beeep beep beep beep)

- [Manager] Everyone please evacuate,
this is not a drill. 

Exits are at the back. 

(crowd chatters)

(screech)

(honk!)

- [Manager] Please don't touch the props. 

6. Finally, how to navigate people interrupting each other.

If people speaking at the same time lead to the text being absolutely incomprehensible, you can notate this with the keyword (indistinct) or (crosstalk).

But in most situations, you will be able to make out what is being said.

Indicate an interruption has happened by cutting off one person's speech with a dash, and then starting a new speaker on a line in the same caption block.

In this example, Sam first interrupts Ali, who still finishes their sentence. When Sam speaks again to deflect responsibility, Ali interrupts.


- Listen, I was not the person 
who said that this was a good idea.

The Red River baptism was your idea-
- I proposed it as a joke!

- ...and further, if I might add,
it was your idea to bring Nessa along. 

- I was memeing, okay? 

It was all a meme-
- No. You should have known better.

It's a serious situation,
so act seriously.

- Fuck you, Ali.

- No, fuck yourself. 
This is your responsibility.

Conclusion

A lot of hearing people use captions nowadays, so it can be easy to forget that the primary purpose of captions is facilitating accessibility.

Center accessibility in your caption-making process by following the following tips:

  1. Speaker changes should be indicated with dashes.
  2. Speakers that can't be visually identified should be identified in text.
  3. If someone intentionally speaks a word, caption it.
  4. Break up text into chunks that support readability.
  5. Provide text equivalents for non-speech sounds
  6. Finally, use formatting that makes it clear when people are interrupting each other.

Happy ca(p)tioning!

A grey cat watches cartoons on a mobile phone while lounging on the couch.