XML (eXtensible Markup Language) is a generic markup language designed by the World Wide Web Consortium (W3C) and is used in XML documents, a class of data objects that describe the behavior of algorithms through which they are processed. XML is an application profile and a limited form of SGML (Standard Generalized Markup Language).
The format of XML is in its most basic aspects the same as plain text, which makes XML readable for people. As the complexity increases, readability decreases and we depend on specialized applications for interpretation. This means that the possibilities for preserving XML over a longer period of time depend on the degree of complexity on the one hand and the availability of those specialized applications on the other.
An XML document contains one or more elements within one root element. Each element consists of a start tag and an end tag or is an empty element. Between the start and end tag is the content of the element, which can consist of text and / or child elements. An element can contain a set of attributes that consist of a name and (text) value.
The structure of a valid XML document can be validated on syntax. An example of an online validator is the W3C Markup Validation Service: https://validator.w3.org/
Encoding of XML documents
The encoding of XML documents is indicated in the XML prologue, a declaration before the root element. If no prologue is present, then the encoding is assumed to be the default for XML documents: UTF-8.
The general media type, or mime type, for XML documents is application/xml, also known as text/xml. The usual extension is .xml.
XML has a wide variety of subtypes and specialized applications. Some of the most used:
– XHTML – Extensible Hyper Text Markup Language, a stricter version of html
– XSD – XML Schema Definition, a formal description of the elements in an XML document
– XSLT – XSL (eXtensible Stylesheet Language) Transformations, for transforming XML documents into other formats (in particular XML and HTML).
Specialized applications of XML include:
– RDF XML – Resource Description Framework (RDF) described in XML, according to RFC3870
– RSS XML – RSS (Realy Simple Syndication) a type of web feed
– WSDL XML – (Web Services Description Language) an XML-based language for describing web services
– GML – Geography Markup Language, a markup language for describing geographical features
– SVG – Scalable Vector Graphics, a markup language for images
Planning for Library of Congress Collections: https://www.loc.gov/preservation/digital/formats/fdd/fdd000075.shtml
XML is a preferred format for file type Markup Language.