Federal Agencies
Digitization Guidelines Initiative

Home > Glossary > T > Transcripts

Term: Transcripts

Note:
 “Search Glossary” button searches only the glossary. Temporary note: search not enabled for two- and three-character terms; browse by alphabet.
 “Search“ button at the top right of the page searches the Web site, not the glossary.

Suggest a term

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Term: Transcripts

Definition:
The main purpose of a transcript is to provide text-based information needed to understand the content to people who cannot get it from the audio and/or video. Transcripts can also help deaf/blind users interact with content using refreshable Braille devices. According to W3C, there are two types of transcripts:
  • Basic transcripts are a text version of the speech and non-speech audio information needed to understand the content.
  • Descriptive transcripts also include text description of the visual information needed to understand the content such as speaker identification, scene description, language of content and text embedded in the video. Descriptive transcripts are preferred over basic transcripts.
Transcripts can be created from an existing closed caption or audio description sidecar file, by transcribing/typing out the audio to text and machine generation/auto data extraction. If using captions to generate the transcripts, more detailed visual information will likely need to be added to create descriptive transcripts.

In addition, they can be 'static' or stationary elements in the display page, or can be 'interactive' (via WebVTT for example), allowing for synched scrolling, and enabling the user to navigate video playback via a text item of interest.

Transcripts can be in a number of text-based file formats depending on the media player with HTML being the most common but also plain text (.txt), common word processing formats such as MS Word (.doc, .docx), PDF and even JSON. Transcripts are often autogenerated from audio or video tracks but would typically only result in a basic transcript and not descriptive one. For example, the popular open source speech recognition application Kaldi creates a plain text file from WAVE file inputs while Google Cloud�s Speech-to-Text has a JSON output from a variety of audio inputs including FLAC, WAVE, Mu-Law and OGG Opus (see Optimize audio files for Speech-to-Text for full details). Amazon AWS Transcribe supports several inputs but with a preference for Linear PCM in WAVE or FLAC and defaults to a JSON output (see Data input and output - Amazon Transcribe for full details).
Category:
Image and Audio
See also: