Federal Agencies
Digitization Guidelines Initiative

Home > Glossary > F > File format

Term: File format

Note:
 “Search Glossary” button searches only the glossary. Temporary note: search not enabled for two- and three-character terms; browse by alphabet.
 “Search“ button at the top right of the page searches the Web site, not the glossary.

Suggest a term

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Term: File format

Definition:

Set of structural conventions that define a wrapper, formatted data, and embedded metadata, and that can be followed to represent images, audiovisual waveforms, texts, etc., in a digital object. The wrapper component on its own is often colloquially called a file format. The formatted data may consist of one or more encoded binary bitstreams for such entities as images or waveforms, and/or textually-encoded data, often marked up with XML or HTML, for texts. The embedded metadata may be skeletal or extensive.

This definition has been tailored to fit the planning activities carried out by the FADGI File Format Subgroup. Meanwhile, in the digital library community, the broad concepts underlying the FADGI definition are often subsumed under the generic term format, although this usage does not generally require that all three elements (wrapper, bitstream, and metadata) be present at the same time. Here are two definitions for format from authoritative bodies in the field:

  • A set of syntactic and semantic rules for mapping between an information model and a serialized bit stream. Many formats can be grouped into loose categories, or families, sharing a general set of encoding rules that are further restricted or extended for the specific format or profile. A format version is considered a profile. (Combined definition from the United Digital Formats Registry (UDFR), slide 7 in the Unified Digital Formats Registry Stakeholder Meeting PowerPoint Slides (slides no longer available as of July 2020); and JHOVE2, JHOVE2 glossary.)
  • The internal structure and encoding of a digital object, which allows it to be processed, or to be rendered in human-accessible form. A digital object may be a file, or a bitstream embedded within a file. (From the U.K. National Archives Digital Preservation Technical Paper Automatic Format Identification Using PRONOM and DROID.)

Additional definitions of format have been offered by the InterPARES 2 Project and the Library of Congress Sustainability of Digital Formats Planning Web site.

Category:
General
Resource:
Unified Digital Formats Registry Stakeholder Meeting PowerPoint Slides (slides no longer available as of July 2020)
JHOVE2 glossary
https://bitbucket.org/jhove2/main/wiki/Glossary
Automatic Format Identification Using PRONOM and DROID
http://www.nationalarchives.gov.uk/aboutapps/fileformat/pdf/automatic_format_identification.pdf
InterPARES 2 Project glossary
http://www.interpares.org/ip2/display_file.cfm?doc=ip2_glossary.pdf
Sustainability of Digital Formats Planning Web site
http://www.digitalpreservation.gov/formats/intro/format_eval_rel.shtml
See also:
Wrapper; Bitstream; Encoding; Digital file; Object (PREMIS term)