MicroXML -- Editor's Draft

John Cowan

2011-06-30

This version: http://www.ccil.org/~cowan/MicroXML.html

Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.

Abstract

MicroXML is a subset of XML intended for use in contexts where full XML is, or is perceived to be, too large and complex. MicroXML provides a set of rules for defining markup languages intended for use in encoding data objects, and specifies behavior for certain software modules that access them.

Status of this Document

This document is a private skunkworks and has no official standing of any kind, not having been reviewed by any organization in any way.

This draft was edited by John Cowan from bits and pieces of W3C and non-W3C documents edited by himself, James Clark, Tim Bray, Dave Hollander, Andrew Layman, Eve Maler, Jonathan Marsh, Jean Paoli, C. Michael Sperberg-McQueen, and Richard Tobin. There should be no suggestion that anybody other than John Cowan approves of the content or even the existence of the present document.

The copyright statement above applies to much of the text assembled for this document, but should not be taken as an indication that the W3C approves of the contents or existence of this document.

1 Introduction

MicroXML describes a class of data objects called MicroXML documents, or just documents, provides a data model for them, and partially describes the behavior of computer programs which process them. By construction, MicroXML documents are well-formed XML 5th Edition documents. MicroXML documents are also XML namespace-well-formed provided that all prefixes in attribute names have been bound (see section 4.3).

The creation of an XML subset can be justified even though the costs of XML complexity have already been paid, for at least the following reasons:

MicroXML documents are made up of characters, some of which form character data, and some of which form markup. Markup primarily encodes a description of the document's logical structure.

A software module called a MicroXML processor is used to read MicroXML documents and provide access to their content and structure. It is assumed that a MicroXML processor is doing its work on behalf of another module called the application. This specification describes the behavior of a MicroXML processor in terms of how it MUST process MicroXML documents and what information it MUST, SHOULD, and MAY provide to the application.

This specification, together with the associated standards [RFC 2119] for requirement keywords, [Unicode] for characters, [BCP 47] for language tags, and [RFC 3986] for URI syntax, provides all the information necessary to understand MicroXML and construct computer programs to process it.

This version of the MicroXML specification can be distributed freely, as long as all text and legal notices remain intact.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

The term "for compatibility" in this specification marks a sentence describing a feature of MicroXML that is included solely to ensure that MicroXML remains compatible with XML, HTML or SGML.

2 Syntax

A sequence of characters is a MicroXML document if taken as a whole, it matches the production labeled document, and meets the further constraints found in the text of this specification marked with the keywords MUST or REQUIRED.

2.1 Documents

[1] document ::= comments (doctype comments)? element comments
[2] comments ::= (comment | pi | s)*
[3] doctype ::= '<!DOCTYPE' s+ name s* ">"

Each document contains a single element called the root element, plus OPTIONAL comments and whitespace before and after it. For compatibility, the root element MAY be preceded by a DOCTYPE declaration, which contains a name that MUST match the element name of the root element.

Here is a simple example of a document whose root element is named greeting:

<!DOCTYPE greeting>
<greeting><w>Hello</w> <w>world</w>!</greeting>

2.2 Elements

[4] element ::= startTag content endTag
              | emptyElementTag
[5] content ::= (element | comment | pi | dataChar | charRef)*
[6] startTag ::= '<' name (s+ attribute)* s* '>'
[7] emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
[8] endTag ::= '</' name s* '>'

Elements are the basic building blocks of documents. An element MAY contain a span of text called its content. The boundaries of the content are delimited by start-tags and end-tags. In addition, an empty element (one which contains no content) MAY be identified by an empty-element tag, which is equivalent to the corresponding start-tag immediately followed by the corresponding end-tag. In addition, each element MUST have a name and MAY have one or more attributes. Each attribute has an attribute name and an attribute value.

Here are examples of start-tags:

<foo>
<bar baz="foo">

Here are examples of corresponding end-tags:

</foo>
</bar>

Here is an example of an empty-element tag:

<IMG align="left" src="http://www.example.org/Icons/madonna" />

A start-tag begins with <, followed by the name of the element, followed by OPTIONAL attributes, followed by >. An end-tag begins with </ followed by the name of the element, followed by >. An empty-element tag is like a start-tag, but ends with /> instead of >. Whitespace MUST be used before each attribute and MAY be used before > or />.

For compatibility, an empty element whose name is br SHOULD be expressed with an empty-element tag.

The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that matches the element name given in the start-tag.

Element names are drawn from a restricted character repertoire; see section 2.8.

This specification does not constrain the semantics or use of element names.

For all elements other than the root element, if the start-tag is in the content of another element, the end-tag MUST be in the content of the same element; elements MUST nest properly.

2.3 Attributes

[9] attribute ::= attributeName s* '=' s* attributeValue
[10] attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                      | "'" ((attributeValueChar - "'") | charRef)* "'"
[11] attributeValueChar ::= char - ('<'|'&')
[12] attributeName ::= (name ":")? name

An attribute consists of an attribute name, followed by =, followed by a quoted attribute value. Either single or double quotes can be used around the value. Attribute values MUST NOT contain the < or & characters except in the form of a character reference. Likewise, single-quoted attribute values MUST NOT contain single quotes except in the form of a character reference, and similarly for double-quoted attribute values.

Attribute names are drawn from a restricted character repertoire; see sections 2.8 and 4.2.

The order of attributes in a start-tag or empty-element tag is not significant.

An attribute name MUST NOT appear more than once in the same start-tag or empty-element tag.

This specification does not constrain the semantics or use of attribute names except for those beginning with xml.

2.4 Character data

[13] dataChar ::= char - ('<'|'&'|'>')

All text in a document that is not markup constitutes the character data of the document and of the most immediate element in which it exists. Any legal MicroXML character can be a data character except <, which signals the beginning of an element; &, which signals the beginning of a character reference, and >, which is forbidden for compatibility. If these characters are to appear in the data model of a document, they must appear as character references.

2.5 Character references

[14] charRef ::= decCharRef | hexCharRef | namedCharRef
[15] decCharRef ::= '&#' [0-9]+ ';'
[16] hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
[17] namedCharRef ::= '&' charName ';'
[18] charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'

A character reference in character content or attribute values stands for a specific Unicode character. Characters referred to using character references MUST match the production for char (see Section 2.10).

If the character reference begins with &#x, the digits and letters up to the terminating semicolon provide a hexadecimal representation of the character's code point in Unicode. If it begins just with &#, the digits up to the terminating semicolon provide a decimal representation of the character's code point.

For readability, a set of predefined character references is also provided for the purpose of escaping MicroXML's special characters: &amp; for &, &lt; for <, &gt; for >, &apos; for ', and &quot; for ". This has exactly the same effect as using character references: &#60; for <, &#38; for &, and so on.

Examples of character references: &lt; for LESS THAN, &#xA0; for NON-BREAKING SPACE, &#916; for GREEK CAPITAL LETTER DELTA, &#66352; or &#x10330; for GOTHIC LETTER AHSA.

2.6 Comments

[19] comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
[20] commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
[21] commentContentContinue ::= (char - '-') | ('-' (char - '-'))

Comments are provided in MicroXML for human consumption only, and are not part of the MicroXML data model. They MAY appear before or after the root, or anywhere else in a document except inside other markup. They

A comment begins with <!-- and ends with -->. For compatibility, a comment MUST NOT begin with <!--> or <!---, end with --->, or contain -- anywhere except as part of the beginning or end.

An example of a comment (note that <head> and <body> are not start-tags):

<!-- declarations for <head> & <body> -->

2.7 Processing Instructions

[22] pi ::= '<?' target (s+ attribute)* s* '?>'
[23] target = name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

Processing instructions (PIs) allow documents to contain instructions for applications. PIs are not part of the MicroXML data model, but processors SHOULD make them available to the application. A PI begins with a target used to identify the application to which the instruction is directed, and contains attributes which give the application information on how to process the PI. For compatibility, the target name xml in any combination of upper and lower case characters MUST NOT be used.

2.8 Names

[24] name ::= nameStartChar nameChar*
[25] nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                     | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                     | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[26] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Element and attribute names use only a subset of the legal MicroXML characters. The first character of a name MUST be a nameStartChar, and any other characters MUST be nameChars; this mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining characters. Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive. See section 8 for suggestions on how to create names.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where MicroXML names are used outside MicroXML documents. Providing this group gives those contexts hard guarantees about what cannot be part of a MicroXML name. The character #x037E GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon. Note that #x2D HYPHEN-MINUS, #x2E FULL STOP (period), #x5F LOW LINE (underscore), and #xB7 MIDDLE DOT are explicitly permitted.

Names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization by the W3C.

In the case of two-part attribute names, these rules apply to both parts individually.

2.9 Whitespace

[27] s ::= #x9 | #xA | #x20

Whitespace consists of tabs, newlines, and spaces, all of which are permitted in various places within markup to increase readability.

2.10 Characters

[28] char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
[29] forbiddenChar ::= surrogateCodePoint | #FFFE | #FFFF
[30] surrogateCodePoint ::= [#xD800-#xDFFF]

Documents contain text, a sequence of characters, which represent markup or character data. A character is an atomic unit of text as specified by [Unicode]. The legal MicroXML characters exclude the Unicode non-characters #xFFFE and #xFFFF, as well as the Unicode surrogate code points (which are not actually Unicode characters). Unassigned Unicode code points are explicitly permitted. Do not confuse code points with UTF-8 or UTF-16 code units, or with octets.

To simplify the tasks of applications, MicroXML processors MUST behave as if they normalized all line breaks in documents before parsing them by translating both the two-character sequence #xD #xA, and any #xD that is not followed by #xA, to a single #xA character.

Document authors are, however, encouraged to avoid "compatibility characters" as defined in section 2.3 of [Unicode]). The characters defined in the following ranges are also discouraged (they are either control characters or Unicode non-characters):

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

Documents MAY begin with the Byte Order Mark described by [Unicode], also known as #xFEFF ZERO WIDTH NO-BREAK SPACE. This is an encoding signature, not part of either the markup or the character data of the MicroXML document.

[Unicode] says that canonically equivalent sequences of characters ought to be treated as identical. However, documents that are canonically equivalent according to Unicode but which use distinct code point sequences are considered distinct by MicroXML processors. Therefore, all documents SHOULD be in Normalization Form C as described by [Unicode]. Otherwise the user might unknowingly create canonically equivalent but unequal sequences that appear identical to the user but which are treated as distinct by MicroXML processors.

MicroXML processors MAY verify that their input is normalized, and MAY report non-normalized character sequences.

3 The Data Model

This section defines an abstract data set called the MicroXML data model. It exists to provide:

The contents of the data model for a document are designed to convey its structure and content as expressed by its markup and character data. However, there are some items of markup which have no effect on the contents of the data model: the DOCTYPE declaration, comments, and processing instructions. The use or non-use of character references for non-reserved characters also has no effect.

The MicroXML data model does not require or favor a specific interface or class of interfaces. This specification presents the data model as a tree for the sake of clarity and simplicity, but there is no requirement that the the model be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces, are also capable of providing information conforming to the MicroXML data model.

The terms data model and element object are similar in meaning to the generic terms tree and node as they are used in computing. However, the former terms are used in this specification to reduce possible confusion with other specific data models. Element objects do not map one-to-one with the nodes of the DOM or the tree and nodes of the XPath data model.

3.1 Element Objects

A document's data model consists of at least one element object. An element object is an abstract description of a single element in a document. Each element object has three associated properties: the name, the attribute map, and the sequence of children. The name is a string, the attribute map maps name strings to value strings, and each child in the sequence is either a string representing character data or an element object.

There is one element object in the data model for each element appearing in the document being modeled. One element object corresponds to the root of the element tree, and all other element objects are accessible by recursively following the sequence of its children.

Namespace information does not affect the data model; attributes named xmlns or containing prefixes are treated as ordinary attributes.

3.2 Synthetic data models

This specification describes the data model resulting from parsing a MicroXML document. Data models MAY be constructed by other means, for example by use of an API or by transforming an existing data model.

3.3 JSON

This specification adopts the JsonML convention for translating MicroXML data models to and from JSON. In this convention, each element object is represented by a JSON array. The first element of the array is the element name, and the second element is a JSON object representing the attribute map. The attribute values in this JSON object are all strings. The remaining elements of the array are either JSON strings representing character content or JSON arrays representing child elements.

Note that this convention is suitable for round-tripping MicroXML documents through JSON. It is not suitable for round-tripping arbitrary JSON through MicroXML; for that purpose, the JSONX convention is RECOMMENDED.

4 Namespaces

MicroXML namespaces are conceptually identical to XML namespaces, but are explained in this specification in order to keep it as self-contained as possible.

The names that appear as element and attribute names serve as labels for the logical components of a document. Software modules are often designed to process a particular set of elements and attributes and their content, identifying them using these labels. Let us refer to such a set, understood by some software module, as a namespace.

A single document MAY contain elements and attributes from more than one namespace. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use names rather than re-invent them.

Such documents pose problems of recognition and collision. Software modules need to be able to recognize the elements and attributes which they are designed to process, even in the face of collisions occurring when markup intended for some other software package uses the same element or attribute name.

These considerations require that there exist a way to determine which namespace an element or attribute belongs to. The rest of this section describes mechanisms that accomplish this.

A namespace is identified by a URI called its namespace name. Namespace names are considered identical when they are exactly the same character-for-character. Note that URIs which are not identical in this sense MAY otherwise be functionally equivalent.

To serve their intended purpose, namespace names SHOULD be unique and persistent. It is not necessary that they be directly usable for retrieving a resource of any kind.

4.1 Elements in Namespaces

By default, the elements of a document are not in any namespace. An element can be placed in a namespace by attaching the reserved attribute xmlns to the element's start-tag or empty-element tag. The value of an xmlns attribute is the namespace name. An xmlns attribute whose value is the empty string declares that its element is not in any namespace.

An xmlns attribute affects not only the element to which it is attached, but is also inherited by all elements inside it that do not have or inherit their own xmlns attributes. This allows, for example, all elements to be placed in a single namespace by attaching a single xmlns attribute to the root element.

Here is an example of a document, with some elements in the HTML namespace and some not in any namespace:

<Beers>
  <!-- the default namespace is now that of HTML -->
  <table xmlns='http://www.w3.org/1999/xhtml'>
   <th><td>Name</td><td>Origin</td><td>Description</td></th>
   <tr> 
     <!-- no default namespace inside table cells -->
     <td><brandName xmlns="">Huntsman</brandName></td>
     <td><origin xmlns="">Bath, UK</origin></td>
     <td>
       <details xmlns=""><class>Bitter</class><hop>Fuggles</hop>
         <pro>Wonderful hop, light alcohol, good summer beer</pro>
         <con>Fragile; excessive variance pub to pub</con>
         </details>
        </td>
      </tr>
    </table>
  </Beers>

4.2 Attributes in Namespaces

By default, attributes are also not in any namespace. Their meaning is determined by the elements to which they are attached. However, some attributes may have the same meaning no matter what element they are attached to, and so belong in a namespace. Attributes in namespaces are distinguishable from ordinary attributes because they have >two-part names: a first name called the prefix, followed by :, followed by a second name called the local part.

In order to specify the namespace name of an attribute, a reserved attribute named xmlns:prefix whose value is the namespace name is used to bind the prefix to the namespace name. An attribute of this form is called a namespace declaration.

Namespace declarations affect not only the element to which they are attached, but also all elements inside it that do not have their own xmlns:prefix attributes for the same prefix.

All prefixes used in a document other than xml SHOULD be bound in an element containing the element in which they are used. The prefix xml MAY be bound, and if it is bound, it MUST be bound to the namespace name http://www.w3.org/xml/1998/namespace. This namespace name MUST NOT be bound to any other prefix, and MUST NOT be used as the value of an xmlns attribute either. The prefix xmlns is used only for namespace bindings and MUST NOT be bound to any namespace name.

Note that the prefix functions only as a placeholder for a namespace name. Applications SHOULD use the namespace name, not the prefix, in constructing and interpreting names.

Here is an example of a document using an attribute from the XLink namespace to specify links to other documents:

<doc xmlns:xlink="http://www.w3.org/1999/xlink">
  <para xlink:href="http://example.com/doc1.xml">Document #1</para>
  <para xlink:href="http://example.com/doc2.xml">Document #2</para>
</doc>

More than one prefix can be declared as attributes of a single element, as shown in this example:

<!-- both namespace prefixes are available -->
<book xmlns:bk="http://www.example.com/books/"
      xmlns:isbn='http://www.example.com/isbn/"
      bk:title="Cheaper by the Dozen"
      isbn:number="1568491379"/>

5 Special-Purpose Attributes

This section describes several attributes with the prefix xml, associating them with predefined semantics useful in a wide variety of applications. These attributes have no special effects on the MicroXML data model; they simply appear as ordinary attributes. These attributes are conceptually identical to their XML lookalikes, but are explained in this specification in order to keep it as self-contained as possible.

5.1 Element Identification

A mechanism allowing unique element identifiers to be recognized by all conformant MicroXML processors is desirable in making MicroXML the identification of the individual elements or other parts of a MicroXML document robust. The xml:id attribute allows authors to identify elements with IDs that can be recognized by any MicroXML processor. Document authors are encouraged to name their ID attributes xml:id to increase the interoperability of these identifiers on the Web.

The value of an xml:id attribute SHOULD be a name as specified in section 2.8.

Application-level processing of IDs, including which elements can actually be addressed by which ID values, is beyond the scope of this specification.

5.2 Language Identification

In document processing, it is often useful to identify the natural language in which the content is written. The xml:lang attribute MAY be attached to an element to specify the language used in the content and attribute values of the element. The value of an xml:lang attribute SHOULD be either a language tag as defined by [BCP 47] or else the empty string.

An xml:lang attribute affects all elements and attribute values, not only of the element to which it is attached, but also of all elements (and their attribute values) inside it that do not have or inherit their own xml:lang attributes. However, applications determine which of an element's attribute values and which parts of its character content, if any, are treated as language-dependent values described by xml:lang.

An empty value of xml:lang is used to specify that there is no language information available, just as if xml:lang had not been specified.

Language information can also be provided by external transport protocols (e.g. HTTP or MIME). When available, this information MAY be used by MicroXML applications, but the more local information provided by xml:lang SHOULD be considered to override it.

Here are some examples:

<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>

<p xml:lang="en-GB">What colour is it?</p>

<p xml:lang="en-US">What color is it?</p>

<sp who="Faust" desc='leise' xml:lang="de">
  <l>Habe nun, ach! Philosophie,</l>
  <l>Juristerei, und Medizin</l>
  <l>und leider auch Theologie</l>
  <l>durchaus studiert mit heißem Bemüh'n.</l>
</sp>

5.3 Base URI Specification

This section describes a reserved attribute named xml:base with semantics similar to that of the HTML base element, for defining base URIs for parts of documents.

The terms base URI and relative URI are used in this section as defined in [RFC3986].

The attribute xml:base can be inserted into an element to specify a base URI other than the base URI of the document or external entity. The value of this attribute is interpreted as a URI reference as defined in [RFC3986], after processing according to Section 3.1 of that RFC. An xml:base attribute affects not only the element to which it is attached, but also all elements inside it that do not have or inherit their own xml:base attributes.

Applications determine which of an element's attribute values and which parts of its character content, if any, are treated as URI references affected by xml:base.

Here is an example of xml:base in a simple document containing XLinks.

<!DOCTYPE doc>
<doc xml:base="http://example.org/today/"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <paragraph>See <link xlink:href="new.xml">what's
      new</link>!</paragraph>
    <paragraph>Check out the hot picks of the day!</paragraph>
    <olist xml:base="/hotpicks/">
      <item>
        <link xlink:href="pick1.xml">Hot Pick #1</link>
      </item>
      <item>
        <link xlink:href="pick2.xml">Hot Pick #2</link>
      </item>
      <item>
        <link xlink:href="pick3.xml">Hot Pick #3</link>
      </item>
    </olist>
  </body>
</doc>

The URIs in this example resolve to full URIs as follows:

"new.xml" resolves to the URI "http://example.org/today/new.MicroXML"
"pick1.xml" resolves to the URI "http://example.org/hotpicks/pick1.MicroXML"
"pick2.xml" resolves to the URI "http://example.org/hotpicks/pick2.MicroXML"
"pick3.xml" resolves to the URI "http://example.org/hotpicks/pick3.MicroXML"

The set of characters allowed in xml:base attributes is the same as for any other attribute. However, some Unicode characters are not allowed in URI references, as specified in [RFC3986], and thus processors MUST encode and escape these characters to obtain a valid URI reference from the attribute value. Disallowed characters MUST be escaped as follows:

  1. Each disallowed character is converted to UTF-8 [Unicode] as one or more bytes.
  2. Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
  3. The original character is replaced by the resulting character sequence.

5.4 Application Whitespace Handling

The xml:space attribute MAY be attached to an element with the value preserve to indicate that applications SHOULD preserve all whitespace in the character data of that element. Alternatively, the attribute can be attached to an element with the value default to indicate that applications SHOULD apply their default whitespace-handling policies. An xml:space attribute affects not only the element to which it is attached, but also to all elements inside it that do not have their own xml:space attributes.

6 Conformance

6.1 UTF-8 Encoding

MicroXML documents MUST be plain text encoded in UTF-8 [Unicode].

6.2 Syntax Checking

Conforming MicroXML processors MUST detect and report violations of this specification's grammar and other constraints in documents they process. If such violations exist, the documents are by definition not MicroXML documents.

When any such violation is encountered, the MicroXML processor MAY continue processing the document, or MAY abandon processing and report a non-continuable error to the application. This is different from the corresponding rule for XML.

6.3 MicroXML Processors and the MicroXML Data Model

Conforming MicroXML processors MUST provide a mechanism to make the complete data model available to applications and SHOULD provide a mechanism to make processing instructions available as well. Processors SHOULD NOT make comments available to the application, to prevent them from being used in place of elements, attributes, or processing instructions.

7 Notation

The formal grammar of MicroXML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

#xN
where N is a hexadecimal integer, the expression matches the character in Unicode whose code point has the value indicated.
[a-zA-Z], [#xN-#xN]
matches any character with a value in the range(s) indicated (inclusive).
[abc], [#xN#xN#xN]
matches any character with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
"string"
matches a literal string matching the one given inside the double quotes.
'string'
matches a literal string matching the one given inside the single quotes.

These symbols can be combined to match more complex patterns as follows, where A and B represent expressions:

(A)
expression is treated as a unit and can be combined as described in this list.
A?
matches A or nothing; OPTIONAL A.
A B
matches A followed by B. This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D).
A | B
matches A or B.
A - B
matches any string that matches A but does not match B.
A+
matches one or more occurrences of A. This operation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+).
A*
matches zero or more occurrences of A. This operation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*).

8 Suggestions for MicroXML Names (Non-Normative)

The following suggestions define what is believed to be best practice in the construction of MicroXML names. All references to Unicode are understood with respect to a particular version of the Unicode Standard greater than or equal to 5.0; which version is used is left to the discretion of the document author or schema designer.

The first two suggestions exclude all control characters, enclosing nonspacing marks, non-decimal numbers, private-use characters, punctuation characters (with the noted exceptions), symbol characters, unassigned code points, and whitespace characters.

9 References

While these reference cite a particular edition of a specification, conforming implementations of MicroXML MAY support later editions either in addition or as replacements, thus allowing MicroXML users to benefit from corrections and extensions to the other specifications on which it depends.

FIXME