Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
MicroXML is a subset of XML intended for use in contexts where full XML is, or is perceived to be, too large and complex. MicroXML provides a set of rules for defining markup languages intended for use in encoding data objects, and specifies behavior for certain software modules that access them.
This document is a private skunkworks and has no official standing of any kind, not having been reviewed by any organization in any way.
This draft was edited by John Cowan from bits and pieces of W3C and non-W3C documents edited by himself, James Clark, Tim Bray, Dave Hollander, Andrew Layman, Eve Maler, Jonathan Marsh, Jean Paoli, C. Michael Sperberg-McQueen, and Richard Tobin. There should be no suggestion that anybody other than John Cowan approves of the content or even the existence of the present document.
The copyright statement above applies to much of the text assembled for this document, but should not be taken as an indication that the W3C approves of the contents or existence of this document.
MicroXML describes a class of data objects called MicroXML documents, or just documents, provides a data model for them, and partially describes the behavior of computer programs which process them. By construction, MicroXML documents are well-formed XML 5th Edition documents. MicroXML documents are also XML namespace-well-formed provided that all prefixes in attribute names have been bound (see section 4.3).
The creation of an XML subset can be justified even though the costs of XML complexity have already been paid, for at least the following reasons:
MicroXML documents are made up of characters, some of which form character data, and some of which form markup. Markup primarily encodes a description of the document's logical structure.
A software module called a MicroXML processor is used to read MicroXML documents and provide access to their content and structure. It is assumed that a MicroXML processor is doing its work on behalf of another module called the application. This specification describes the behavior of a MicroXML processor in terms of how it MUST process MicroXML documents and what information it MUST, SHOULD, and MAY provide to the application.
This specification, together with the associated standards [RFC 2119] for requirement keywords, [Unicode] for characters, [BCP 47] for language tags, and [RFC 3986] for URI syntax, provides all the information necessary to understand MicroXML and construct computer programs to process it.
This version of the MicroXML specification can be distributed freely, as long as all text and legal notices remain intact.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
The term "for compatibility" in this specification marks a sentence describing a feature of MicroXML that is included solely to ensure that MicroXML remains compatible with XML, HTML or SGML.
A sequence of characters is a MicroXML document if taken as a whole, it matches the production labeled document, and meets the further constraints found in the text of this specification marked with the keywords MUST or REQUIRED.
[1] document ::= comments (doctype comments)? element comments [2] comments ::= (comment | pi | s)* [3] doctype ::= '<!DOCTYPE' s+ name s* ">"
Each document contains a single element called the root element, plus OPTIONAL
comments and whitespace before and after it. For compatibility, the root element
MAY be preceded by a DOCTYPE declaration, which contains a name
that MUST match the element name of the root element.
Here is a simple example of a document whose root element is named greeting:
<!DOCTYPE greeting> <greeting><w>Hello</w> <w>world</w>!</greeting>
[4] element ::= startTag content endTag
| emptyElementTag
[5] content ::= (element | comment | pi | dataChar | charRef)*
[6] startTag ::= '<' name (s+ attribute)* s* '>'
[7] emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
[8] endTag ::= '</' name s* '>'
Elements are the basic building blocks of documents. An element MAY contain a span of text called its content. The boundaries of the content are delimited by start-tags and end-tags. In addition, an empty element (one which contains no content) MAY be identified by an empty-element tag, which is equivalent to the corresponding start-tag immediately followed by the corresponding end-tag. In addition, each element MUST have a name and MAY have one or more attributes. Each attribute has an attribute name and an attribute value.
Here are examples of start-tags:
<foo> <bar baz="foo">
Here are examples of corresponding end-tags:
</foo> </bar>
Here is an example of an empty-element tag:
<IMG align="left" src="http://www.example.org/Icons/madonna" />
A start-tag begins with <, followed by the name of the element, followed by
OPTIONAL attributes, followed by >. An end-tag begins with
</ followed by the name of the element, followed by >. An
empty-element tag is like a start-tag, but ends with /> instead of >.
Whitespace MUST be used before each attribute and MAY be used
before > or />.
For compatibility, an empty element whose name is br
SHOULD be expressed with an empty-element tag.
The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that matches the element name given in the start-tag.
Element names are drawn from a restricted character repertoire; see section 2.8.
This specification does not constrain the semantics or use of element names.
For all elements other than the root element, if the start-tag is in the content of another element, the end-tag MUST be in the content of the same element; elements MUST nest properly.
[9] attribute ::= attributeName s* '=' s* attributeValue
[10] attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
| "'" ((attributeValueChar - "'") | charRef)* "'"
[11] attributeValueChar ::= char - ('<'|'&')
[12] attributeName ::= (name ":")? name
An attribute consists of an attribute name, followed by =, followed by a quoted attribute
value. Either single or double quotes can be used around the value. Attribute values
MUST NOT contain the < or & characters
except in the form of a character reference. Likewise, single-quoted attribute values
MUST NOT contain single quotes except in the form of a character reference,
and similarly for double-quoted attribute values.
Attribute names are drawn from a restricted character repertoire; see sections 2.8 and 4.2.
The order of attributes in a start-tag or empty-element tag is not significant.
An attribute name MUST NOT appear more than once in the same start-tag or empty-element tag.
This specification does not constrain the semantics or use of attribute names except for
those beginning with xml.
[13] dataChar ::= char - ('<'|'&'|'>')
All text in a document that is not markup constitutes the character data of the
document and of the most immediate element in which it exists. Any legal MicroXML character
can be a data character except <, which signals the beginning of an element;
&, which signals the beginning of a character reference, and
>, which is forbidden for compatibility. If these characters are to appear
in the data model of a document, they must appear as character references.
[14] charRef ::= decCharRef | hexCharRef | namedCharRef [15] decCharRef ::= '&#' [0-9]+ ';' [16] hexCharRef ::= '&#x' [0-9a-fA-F]+ ';' [17] namedCharRef ::= '&' charName ';' [18] charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
A character reference in character content or attribute values stands for a specific
Unicode character. Characters referred to using character references MUST match
the production for char (see Section 2.10).
If the character reference begins with &#x, the digits and letters up to the
terminating semicolon provide a hexadecimal representation of the character's code point in
Unicode. If it begins just with &#, the digits up to the terminating
semicolon provide a decimal representation of the character's code point.
For readability, a set of predefined character references is also provided for the purpose of
escaping MicroXML's special characters: & for &,
< for <, > for >,
' for ', and " for ".
This has exactly the same effect as using character references: < for
<, & for &, and so on.
Examples of character references: < for LESS THAN,
  for NON-BREAKING SPACE, Δ for
GREEK CAPITAL LETTER DELTA, 𐌰 or
𐌰 for GOTHIC LETTER AHSA.
[19] comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
[20] commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
[21] commentContentContinue ::= (char - '-') | ('-' (char - '-'))
Comments are provided in MicroXML for human consumption only, and are not part of the MicroXML data model. They MAY appear before or after the root, or anywhere else in a document except inside other markup. They
A comment begins with <!-- and ends with -->. For
compatibility, a comment MUST NOT begin with <!--> or
<!---, end with --->, or contain -- anywhere
except as part of the beginning or end.
An example of a comment (note that <head> and <body>
are not start-tags):
<!-- declarations for <head> & <body> -->
[22] pi ::= '<?' target (s+ attribute)* s* '?>'
[23] target = name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
Processing instructions (PIs) allow documents to contain instructions for
applications. PIs are not part of the MicroXML data model, but processors
SHOULD make them available to the application. A PI begins with a target used
to identify the application to which the instruction is directed, and contains attributes
which give the application information on how to process the PI. For compatibility, the target
name xml in any combination of upper and lower case characters MUST
NOT be used.
[24] name ::= nameStartChar nameChar*
[25] nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
| [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
| [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[26] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
Element and attribute names use only a subset of the legal MicroXML characters. The first
character of a name MUST be a nameStartChar, and any other
characters MUST be nameChars; this mechanism is used to prevent
names from beginning with European (ASCII) digits or with basic combining characters. Almost
all characters are permitted in names, except those which either are or reasonably could be
used as delimiters. The intention is to be inclusive rather than exclusive. See section 8 for
suggestions on how to create names.
The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol
characters, are excluded from names because they are more useful as delimiters in contexts
where MicroXML names are used outside MicroXML documents. Providing this group gives those
contexts hard guarantees about what cannot be part of a MicroXML name. The character
#x037E GREEK QUESTION MARK, is excluded because when normalized it becomes a
semicolon. Note that #x2D HYPHEN-MINUS, #x2E FULL STOP (period),
#x5F LOW LINE (underscore),
and #xB7 MIDDLE DOT are explicitly permitted.
Names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for
standardization by the W3C.
In the case of two-part attribute names, these rules apply to both parts individually.
[27] s ::= #x9 | #xA | #x20
Whitespace consists of tabs, newlines, and spaces, all of which are permitted in various places within markup to increase readability.
[28] char ::= s | ([#x21-#x10FFFF] - forbiddenChar) [29] forbiddenChar ::= surrogateCodePoint | #FFFE | #FFFF [30] surrogateCodePoint ::= [#xD800-#xDFFF]
Documents contain text, a sequence of characters, which represent markup or character data. A
character is an atomic unit of text as specified by [Unicode]. The legal MicroXML characters
exclude the Unicode non-characters #xFFFE and #xFFFF, as well as the
Unicode surrogate code points (which are not actually Unicode characters). Unassigned Unicode
code points are explicitly permitted. Do not confuse code points with UTF-8 or UTF-16 code
units, or with octets.
To simplify the tasks of applications, MicroXML processors MUST behave as if
they normalized all line breaks in documents before parsing them by translating both the
two-character sequence #xD #xA, and any #xD that is not followed by
#xA, to a single #xA character.
Document authors are, however, encouraged to avoid "compatibility characters" as defined in section 2.3 of [Unicode]). The characters defined in the following ranges are also discouraged (they are either control characters or Unicode non-characters):
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF], [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], [#x10FFFE-#x10FFFF].
Documents MAY begin with the Byte Order Mark described by [Unicode], also
known as #xFEFF ZERO WIDTH NO-BREAK SPACE. This is an encoding signature, not
part of either the markup or the character data of the MicroXML document.
[Unicode] says that canonically equivalent sequences of characters ought to be treated as identical. However, documents that are canonically equivalent according to Unicode but which use distinct code point sequences are considered distinct by MicroXML processors. Therefore, all documents SHOULD be in Normalization Form C as described by [Unicode]. Otherwise the user might unknowingly create canonically equivalent but unequal sequences that appear identical to the user but which are treated as distinct by MicroXML processors.
MicroXML processors MAY verify that their input is normalized, and MAY report non-normalized character sequences.
This section defines an abstract data set called the MicroXML data model. It exists to provide:
The contents of the data model for a document are designed to convey its structure and content as expressed by its markup and character data. However, there are some items of markup which have no effect on the contents of the data model: the DOCTYPE declaration, comments, and processing instructions. The use or non-use of character references for non-reserved characters also has no effect.
The MicroXML data model does not require or favor a specific interface or class of interfaces. This specification presents the data model as a tree for the sake of clarity and simplicity, but there is no requirement that the the model be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces, are also capable of providing information conforming to the MicroXML data model.
The terms data model and element object are similar in meaning to the generic terms tree and node as they are used in computing. However, the former terms are used in this specification to reduce possible confusion with other specific data models. Element objects do not map one-to-one with the nodes of the DOM or the tree and nodes of the XPath data model.
A document's data model consists of at least one element object. An element object is an abstract description of a single element in a document. Each element object has three associated properties: the name, the attribute map, and the sequence of children. The name is a string, the attribute map maps name strings to value strings, and each child in the sequence is either a string representing character data or an element object.
There is one element object in the data model for each element appearing in the document being modeled. One element object corresponds to the root of the element tree, and all other element objects are accessible by recursively following the sequence of its children.
Namespace information does not affect the data model; attributes named xmlns or
containing prefixes are treated as ordinary attributes.
This specification describes the data model resulting from parsing a MicroXML document. Data models MAY be constructed by other means, for example by use of an API or by transforming an existing data model.
This specification adopts the JsonML convention for translating MicroXML data models to and from JSON. In this convention, each element object is represented by a JSON array. The first element of the array is the element name, and the second element is a JSON object representing the attribute map. The attribute values in this JSON object are all strings. The remaining elements of the array are either JSON strings representing character content or JSON arrays representing child elements.
Note that this convention is suitable for round-tripping MicroXML documents through JSON. It is not suitable for round-tripping arbitrary JSON through MicroXML; for that purpose, the JSONX convention is RECOMMENDED.
MicroXML namespaces are conceptually identical to XML namespaces, but are explained in this specification in order to keep it as self-contained as possible.
The names that appear as element and attribute names serve as labels for the logical components of a document. Software modules are often designed to process a particular set of elements and attributes and their content, identifying them using these labels. Let us refer to such a set, understood by some software module, as a namespace.
A single document MAY contain elements and attributes from more than one namespace. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use names rather than re-invent them.
Such documents pose problems of recognition and collision. Software modules need to be able to recognize the elements and attributes which they are designed to process, even in the face of collisions occurring when markup intended for some other software package uses the same element or attribute name.
These considerations require that there exist a way to determine which namespace an element or attribute belongs to. The rest of this section describes mechanisms that accomplish this.
A namespace is identified by a URI called its namespace name. Namespace names are considered identical when they are exactly the same character-for-character. Note that URIs which are not identical in this sense MAY otherwise be functionally equivalent.
To serve their intended purpose, namespace names SHOULD be unique and persistent. It is not necessary that they be directly usable for retrieving a resource of any kind.
By default, the elements of a document are not in any namespace. An element can be placed in
a namespace by attaching the reserved attribute xmlns to the element's start-tag
or empty-element tag. The value of an xmlns attribute is the namespace name. An
xmlns attribute whose value is the empty string declares that its element is
not in any namespace.
An xmlns attribute affects not only the element to which it is attached, but is
also inherited by all elements inside it that do not have or inherit their own
xmlns attributes. This allows, for example, all elements to be placed in a
single namespace by attaching a single xmlns attribute to the root element.
Here is an example of a document, with some elements in the HTML namespace and some not in any namespace:
<Beers>
<!-- the default namespace is now that of HTML -->
<table xmlns='http://www.w3.org/1999/xhtml'>
<th><td>Name</td><td>Origin</td><td>Description</td></th>
<tr>
<!-- no default namespace inside table cells -->
<td><brandName xmlns="">Huntsman</brandName></td>
<td><origin xmlns="">Bath, UK</origin></td>
<td>
<details xmlns=""><class>Bitter</class><hop>Fuggles</hop>
<pro>Wonderful hop, light alcohol, good summer beer</pro>
<con>Fragile; excessive variance pub to pub</con>
</details>
</td>
</tr>
</table>
</Beers>
By default, attributes are also not in any namespace. Their meaning is determined by the
elements to which they are attached. However, some attributes may have the same meaning no
matter what element they are attached to, and so belong in a namespace. Attributes in
namespaces are distinguishable from ordinary attributes because they have >two-part
names: a first name called the prefix, followed by :, followed
by a second name called the local part.
In order to specify the namespace name of an attribute, a reserved attribute named
xmlns:prefix whose value is the namespace name is used to bind the
prefix to the namespace name. An attribute of this form is called a namespace
declaration.
Namespace declarations affect not only the element to which they are attached, but also all
elements inside it that do not have their own xmlns:prefix attributes for
the same prefix.
All prefixes used in a document other than xml
SHOULD be bound in an element containing the element in which they are used.
The prefix xml
MAY be bound, and if it is bound, it MUST be bound to the
namespace name http://www.w3.org/xml/1998/namespace. This namespace name
MUST NOT be bound to any other prefix, and MUST NOT be used as
the value of an xmlns attribute either. The prefix xmlns is used
only for namespace bindings and MUST NOT be bound to any namespace name.
Note that the prefix functions only as a placeholder for a namespace name. Applications SHOULD use the namespace name, not the prefix, in constructing and interpreting names.
Here is an example of a document using an attribute from the XLink namespace to specify links to other documents:
<doc xmlns:xlink="http://www.w3.org/1999/xlink"> <para xlink:href="http://example.com/doc1.xml">Document #1</para> <para xlink:href="http://example.com/doc2.xml">Document #2</para> </doc>
More than one prefix can be declared as attributes of a single element, as shown in this example:
<!-- both namespace prefixes are available -->
<book xmlns:bk="http://www.example.com/books/"
xmlns:isbn='http://www.example.com/isbn/"
bk:title="Cheaper by the Dozen"
isbn:number="1568491379"/>
This section describes several attributes with the prefix xml, associating them
with predefined semantics useful in a wide variety of applications. These attributes have no
special effects on the MicroXML data model; they simply appear as ordinary attributes. These
attributes are conceptually identical to their XML lookalikes, but are explained in this
specification in order to keep it as self-contained as possible.
A mechanism allowing unique element identifiers to be recognized by all conformant MicroXML
processors is desirable in making MicroXML the identification of the individual elements or
other parts of a MicroXML document robust. The xml:id attribute allows authors to
identify elements with IDs that can be recognized by any MicroXML processor. Document authors
are encouraged to name their ID attributes xml:id to increase the
interoperability of these identifiers on the Web.
The value of an xml:id attribute SHOULD be a name as specified in
section 2.8.
Application-level processing of IDs, including which elements can actually be addressed by which ID values, is beyond the scope of this specification.
In document processing, it is often useful to identify the natural language in which the
content is written. The xml:lang attribute MAY be attached to an
element to specify the language used in the content and attribute values of the element. The
value of an xml:lang attribute SHOULD be either a language tag as
defined by [BCP 47] or else the empty string.
An xml:lang attribute affects all elements and attribute values, not only of the
element to which it is attached, but also of all elements (and their attribute values) inside
it that do not have or inherit their own xml:lang attributes. However,
applications determine which of an element's attribute values and which parts of its character
content, if any, are treated as language-dependent values described by
xml:lang.
An empty value of xml:lang is used to specify that there is no language
information available, just as if xml:lang had not been specified.
Language information can also be provided by external transport protocols (e.g. HTTP or
MIME). When available, this information MAY be used by MicroXML applications,
but the more local information provided by xml:lang
SHOULD be considered to override it.
Here are some examples:
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p> <p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p> <sp who="Faust" desc='leise' xml:lang="de"> <l>Habe nun, ach! Philosophie,</l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n.</l> </sp>
This section describes a reserved attribute named xml:base with semantics
similar to that of the HTML base element, for defining base URIs for parts of documents.
The terms base URI and relative URI are used in this section as defined in [RFC3986].
The attribute xml:base can be inserted into an element to specify a base URI
other than the base URI of the document or external entity. The value of this attribute is
interpreted as a URI reference as defined in [RFC3986], after processing according to Section
3.1 of that RFC. An xml:base attribute affects not only the element to which it
is attached, but also all elements inside it that do not have or inherit their own
xml:base attributes.
Applications determine which of an element's attribute values and which parts of its
character content, if any, are treated as URI references affected by
xml:base.
Here is an example of xml:base in a simple document containing XLinks.
<!DOCTYPE doc>
<doc xml:base="http://example.org/today/"
xmlns:xlink="http://www.w3.org/1999/xlink">
<head>
<title>Virtual Library</title>
</head>
<body>
<paragraph>See <link xlink:href="new.xml">what's
new</link>!</paragraph>
<paragraph>Check out the hot picks of the day!</paragraph>
<olist xml:base="/hotpicks/">
<item>
<link xlink:href="pick1.xml">Hot Pick #1</link>
</item>
<item>
<link xlink:href="pick2.xml">Hot Pick #2</link>
</item>
<item>
<link xlink:href="pick3.xml">Hot Pick #3</link>
</item>
</olist>
</body>
</doc>
The URIs in this example resolve to full URIs as follows:
"new.xml" resolves to the URI "http://example.org/today/new.MicroXML" "pick1.xml" resolves to the URI "http://example.org/hotpicks/pick1.MicroXML" "pick2.xml" resolves to the URI "http://example.org/hotpicks/pick2.MicroXML" "pick3.xml" resolves to the URI "http://example.org/hotpicks/pick3.MicroXML"
The set of characters allowed in xml:base attributes is the same as for any
other attribute. However, some Unicode characters are not allowed in URI references, as
specified in [RFC3986], and thus processors MUST encode and escape these
characters to obtain a valid URI reference from the attribute value. Disallowed characters
MUST be escaped as follows:
%HH, where HH is the hexadecimal
notation of the byte value).The xml:space attribute MAY be attached to an element with the
value preserve to indicate that applications SHOULD preserve all
whitespace in the character data of that element. Alternatively, the attribute can be attached
to an element with the value default to indicate that applications
SHOULD apply their default whitespace-handling policies. An
xml:space attribute affects not only the element to which it is attached, but
also to all elements inside it that do not have their own xml:space
attributes.
MicroXML documents MUST be plain text encoded in UTF-8 [Unicode].
Conforming MicroXML processors MUST detect and report violations of this specification's grammar and other constraints in documents they process. If such violations exist, the documents are by definition not MicroXML documents.
When any such violation is encountered, the MicroXML processor MAY continue processing the document, or MAY abandon processing and report a non-continuable error to the application. This is different from the corresponding rule for XML.
Conforming MicroXML processors MUST provide a mechanism to make the complete data model available to applications and SHOULD provide a mechanism to make processing instructions available as well. Processors SHOULD NOT make comments available to the application, to prevent them from being used in place of elements, attributes, or processing instructions.
The formal grammar of MicroXML is given in this specification using a simple Extended
Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form
symbol ::= expression.
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:
#xNN is a hexadecimal integer, the expression matches the character in
Unicode whose code point has the value indicated.[a-zA-Z], [#xN-#xN][abc], [#xN#xN#xN]"string"'string'These symbols can be combined to match more complex patterns as follows, where A
and B represent expressions:
(A)A?A or nothing; OPTIONAL
A.A BA followed by B. This operator has higher precedence
than alternation; thus A B | C D is identical to (A B) | (C
D).A | BA or B.A - BA but does not match B.A+A. This operation has higher precedence
than alternation; thus A+ | B+ is identical to (A+) | (B+).A*A* | B* is identical to (A*) | (B*).The following suggestions define what is believed to be best practice in the construction of MicroXML names. All references to Unicode are understood with respect to a particular version of the Unicode Standard greater than or equal to 5.0; which version is used is left to the discretion of the document author or schema designer.
The first two suggestions exclude all control characters, enclosing nonspacing marks, non-decimal numbers, private-use characters, punctuation characters (with the noted exceptions), symbol characters, unassigned code points, and whitespace characters.
ID_Start, or else be #x5F LOW LINE (underscore).ID_Continue, or be one of the characters listed in the table entitled
"Characters for Natural Language Identifiers" in UAX #31, with the exception of x27
APOSTROPHE# and #x2019 RIGHT SINGLE QUOTATION MARK.[#xF900-#xFAFF] and [#x2F800-#x2FFFD], with 12 exceptions)
SHOULD NOT be used in names.#x0E33 THAI CHARACTER SARA AM or
#x0EB3 LAO CHARACTER AM, which despite their compatibility decompositions are
in regular use in those scripts.)[#x20D0-#x20EF] and [#x1D165-#x1D1AD]) SHOULD NOT
be used in names.[#xFFF9-#xFFFB]) SHOULD
NOT be used in names.While these reference cite a particular edition of a specification, conforming implementations of MicroXML MAY support later editions either in addition or as replacements, thus allowing MicroXML users to benefit from corrections and extensions to the other specifications on which it depends.
FIXME