C.2. Syntax

The goals of the Typechain syntax are to be:

The Typechain string represents a series of transformations to be applied to some associated data resource. (The association should be performed by mechanisms defined by the application.) The syntax is a series of identifiers, perhaps with parameters, seperated by delimiters. By applying the indicated transformations in order, from left to right, the associated data resource may be transformed into a final media type, which is indicated by the last (rightmost) identifer in the Typechain string.

C.2.1. ABNF Syntax

The syntax definition follows, described using Augmented Backus-Naur Form as described in RFC 2234 [RFC 2234].


typechain = [URN-prefix] *(encoding-identifier [parameters] delim)
media-type-identifier *WS [parameters]

encoding-identifier = *(!delim) !URN-prefix      XXX TODO: fix this syntax

media-type-identifier = *(!delim) !URN-prefix                XXX TODO: fix this syntax

URN-prefix = "type:"

unknown-type = "?"

delim = ( ";" / ":" / "|" )

parameters = "(" *WS *(param *WS "," *WS) param *WS")" 

param = key [ "=" value ]

WS = ( " " / TAB )

empty-parens = "(" *WS ")"

repeated-delim = 2*delim

The format of a conforming type string is defined by the rule named typechain. encoding-identifier and media-type-identifier are identifiers, which are described in section 3 below. The encoding-identifier and media-type-identifier identifers are case-insensitive. The exact numerical representation of all characters depends on the character set used to encode the Typechain string. This character set will be defined by the application. TAB represents the character whose meaning is "horizontal tabulation." In US-ASCII [ASCII] a horizontal tabulation is represented by decimal number 9.

There are three possible delimiters to allow the type string itself to be included in a larger context which might use any one possible delim, as they are all characters commonly used for delimters. The delimters are preferred in this order: "|", ";", ":". Mixing delimiters in the same type string is not recommended, but not prohibited either.

Note that whitespace MUST NOT occur within a single param. Whitespace is allowed between all other punctuation and identifiers.

Text matching empty-parens MUST NOT appear as a substring in any conforming type string, but MAY be ignored by a process reading and interpreting type strings. Likewise text matching repeated-delim MUST NOT appear as a substring in any conforming type string, but MAY be replaced with a single delim by a process reading and interpreting type strings.

When it is desired that a typechain be used in a URN [RFC 2396] context -- when it must take the form of a URN -- the URN-prefix may be used: "type:". Thus, the token "type" is reserved and MAY NOT be used as an encoding identifier, and the URN prefix "type:" also does not indicate any data encoding, and may be ignored when processing (or considered equivalent to "identity")

A value of ? (question mark) means "unknown type". See section 3.2.

C.2.2. Examples

Each of these type strings is equivalent:

(foo, bar, baz, qux, corge, fred and plug are not valid identifiers, they have been invented for demonstration purposes only (See also RFC 3092))

An example of realsitic type string which illustrates all possible components of a typestring is:

URL;base64;gzip;text/plain(charset=us-ascii,language=eng)