This is a preliminary description of a work in progress called "Architectural Forms: A New Generation", or AF:NG for short. Comments to cowan@ccil.org. This document is highly subject to change without notice.
Copyright 2002 John Cowan.
AF:NG provides the facilities, but does not employ the syntax, of SGML Architectural Forms. AF:NG is intended to be used in conjunction with the schema language RELAX NG, but is not dependent on it in any way.
The purpose of AF:NG is to provide for tightly specified transformations of XML documents, consisting of renaming or omitting elements, attributes, and character data. AF:NG is not intended as a general-purpose transformation language like XSLT or Omnimark. Using AF:NG, a recipient may, instead of specifying a schema to which documents must conform exactly, specify a schema to be applied to the output of an AF:NG transformation. In that way, the actual element and attribute names, and to some degree the document structure, may vary from the schema without rendering the document unacceptable. In particular, it is easy to use AF:NG to reduce a complex document to a much simpler one, when only a subset of the document is of interest to the recipient.
The information provided to AF:NG consists of a short XML document called an architectural map, or archmap, plus the appearance of a special attribute called the form attribute within the source document. The name of the form attribute is given in the archmap, and it is the only required portion of the archmap.
Note: This draft of AF:NG does not have the ability to map a source attribute into architectural character data.
The following RELAX NG schema (in non-XML format) specifies the syntax of an archmap:
namespace default = "x-whatever:somethingorother"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
inheritable =
attribute arch-ns {text}?,
attribute source-ns {text}?
tokenmap =
element tokenmap {
attribute to {xsd:token},
attribute from {xsd:token}
}
start = element archmap {
inheritable,
attribute form-att {xsd:Name},
attribute doc-elem {xsd:Name}?,
attribute output {"transform" | "decorate"}?,
element form {
inheritable,
attribute data {"preserve" | "ignore"}?,
attribute children {"process" | "skip" | "literal"}?,
attribute name {xsd:Name},
attribute arch-elem {"#NONE" | xsd:Name}?,
attribute source-elem {xsd:Name}?,
element attmap {
inheritable,
attribute arch-att {xsd:Name},
(attribute value {text} |
(attribute source {"#CONTENT" | xsd:Name}, tokenmap*))?
}*
}*
}
If the inheritable attributes are not present on an element, they are given the value specified on the nearest ancestor element, analogously to the treatment of xml:lang and xml:space. The arch-ns attribute specifies the namespace name of the architecture, and is used to namespace qualify names appearing in form-att, doc-elem, arch-elem, and arch-att attributes. The source-ns attribute specifies the namespace name of references to the source document, and is used to namespace qualify names appearing in source-elem and source attributes.
An AF:NG processor descends the tree of elements in document order, processing each element in accordance with which form it matches in the architectural map, and in accordance with its current modes. The initial modes are: data mode is preserve, children mode is process.
Each element in the source document is tested by the method given in Section 4 to see if it matches any of the form elements in the archmap.
In any case, any form attribute present in the element is removed. If the form element has any attmap or content elements as children, attribute mapping is done according to the method given in Section 5. The children of the element are then processed according to the current modes, possibly as modified by the form:
The form-att attribute, as namespace qualified by the arch-ns attribute, specifies the name of the form attribute for the source document. The processor matches elements in the source document with the form elements in the architectural map according to the following rules:
If a source element has matched some form, and attmap child elements exist in the form, then attribute mapping must be done. This provides additional attributes in the output whose values are either fixed, or are derived from attributes or character data in the source.
Each attmap element specifies an architectural attribute to appear in the output. The name of the attribute is specified by the arch-att attribute, as qualified by the arch-ns attribute. Any pre-existing attribute of that name in the source element is removed.
The value of each architectural attribute is determined as follows:
If an attmap element has tokenmap children, then the value of the architectural attribute is treated as a whitespace-separated list of tokens, and appropriate normalization is done. Any tokens appearing in from attributes are replaced with the tokens in the corresponding to attributes, all replacements being done in parallel.
<html><head><title>Reuters Health Information (2002-02-01): Whooping cough increasing among US infants</title></head> <body bgcolor="white"><p class="headline"><strong> Whooping cough increasing among US infants</strong></p> <p class="lead">ATLANTA, Feb 01 (Reuters Health) - In the last 20 years, the number of cases of whooping cough increased overall in the US, especially among infants too young to receive three pertussis vaccine doses, according to the US Centers for Disease Control and Prevention (CDC).</p> <p>Whooping cough, or pertussis, is caused by infection with the Bordetella pertussis bacterium. Symptoms of whooping cough include having a cough lasting 14 or more days accompanied by a gasping sound or "whoop" while coughing. Children may also vomit or have difficulty breathing during a coughing spell.</p> <p>Until the advent of the pertussis vaccine in the late 1940s, the respiratory illness was a major cause of illness and death, especially among infants and small children. Since the introduction of the vaccine (usually administered as part of the diphtheria-tetanus-pertussis combo vaccine), rates for whooping cough have dropped dramatically in the developed world.</p> </body></html>
Here is a trivial map:
<archmap form-att="class" doc-elem="story">
<form name="para" source-elem="p"/>
<form name="title" source-elem="title" arch-elem="#NONE" data="ignore"/>
</archmap>
The effect of this map is:
The output document is:
<story> <headline> Whooping cough increasing among US infants</headline> <lead>ATLANTA, Feb 01 (Reuters Health) - In the last 20 years, the number of cases of whooping cough increased overall in the US, especially among infants too young to receive three pertussis vaccine doses, according to the US Centers for Disease Control and Prevention (CDC).</lead> <para>Whooping cough, or pertussis, is caused by infection with the Bordetella pertussis bacterium. Symptoms of whooping cough include having a cough lasting 14 or more days accompanied by a gasping sound or "whoop" while coughing. Children may also vomit or have difficulty breathing during a coughing spell.</para> <para>Until the advent of the pertussis vaccine in the late 1940s, the respiratory illness was a major cause of illness and death, especially among infants and small children. Since the introduction of the vaccine (usually administered as part of the diphtheria-tetanus-pertussis combo vaccine), rates for whooping cough have dropped dramatically in the developed world.</para> </story>