C.3. Identifiers

Identifiers are essentially application-defined. No application is required to support any given encoding or media type identifier. Some identifers will not make sense for some applications. However, it is strongly recommendedthat applications support all standard Internet media types and encoding sas defined, registered or endorsed by the IETF and IANA, as well as the special identifiers defined in this document.

C.3.1. Final Media-Type Identifiers

The final media type Identifier is the last token in the type string, and it indicates the final format of the associated data, once all the previous decodings have been performed. The application may then use the data as necesary or intended (displaying text or an image bitmap on-screen, or executing code for example).

The final media-type identifier is a MIME type [RFC 2046]. MIME types may have parameters, which in the Typechain are specified in a parameters field following the media-type identifier.

For example, the multipart/mixed MIME type has a boundary parameter. In a MIME email message transmission, the content type is indicated by a Content-Type header field:

Content-Type: multipart/mixed; boundary=--xxx--

The equivalent Typechain media -type identifer would be:

multipart/mixed(boundary=--xxx--)

Basic IANA standard Internet media types are listed in [IANA MEDIA]. Many current operating environments also provide an extensive list of known MIME types, (for example, /etc/mime.types on many UNIX-like systems) and associated handler applications (for example, the File Types configuration options of Microsoft Windows, KDE[KDE] and GNOME[GNOME]).

C.3.2. Encoding Identifiers

Encoding Identifiers indicate a decoding transformation to be applied to the data. If the indicated decoding is performed in order, from left to right, on the source data, the final result should be in the format described by the Final Media Type Identifier. Common encodings that applications are recommended to support include: identity, deflate, compress, gzip, bzip2, base64, uuencoded, binhex and URL.

Like final media types, encoding identifiers my also have parameters, specified in the same way.

identity

A decoder for the identity type should do nothing: its output is simply its input.

deflate

The deflate encoding is described in [RFC 1951]. It is a lossless algorithm combining Lempel-Ziv (77) and Huffman compression. (The free "zlib" library implements deflate well: see http://www.gzip.org/zlib/).

This type identifier MUST contain a parameter called "datalen" with the size of the uncompressed data. It MAY contain a parameter called "level" with compression level for information purposes.

compress

The compress encoding is described in section 3.5 of the HTTP/1.1 specification [RFC 2616] as: "The encoding format produced by the common UNIX file compression program 'compress'. This format is an adaptive Lempel-Ziv-Welch coding (LZW)."

gzip

The gzip encoding is described in section 3.5 of the HTTP/1.1 specification [RFC 2616] as: "An encoding format produced by the file compression program 'gzip' (GNU zip) as described in RFC 1952 [RFC 1952]. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC."

bzip2

This is the format produced by the "bzip2" file compression program. TODO reference a standard or other published info on bzip2.

base64

Base 64 encoding is a standard method of encoding binary data as 6-bit text. TODO reference a standard.

uuencoded

"UU Encoding" is a popular method of encoding binary data for transmission as 7-bit text. TODO reference a standard.

binhex

BinHex (aka HQX) is a standard encoding for Macintosh files capable of being transferred as lines of 7-bit text. TODO reference a standard

URL

The input for this decoder is a URL (or URI poiting to a fetchable resource) as defined in RFC 2396 [RFC 2396]. The output should be the data fetched from this URL. For example, if the input is an HTTP URL to a text resource, the decoder would use the GET method of HTTP to fetch the text. It is recommended that http://, https://, ftp:// and file:// schemes be supported.

type

The token "type" is reserved for use as a URN [RFC 2396] prefix, and has no meaning as a data type or encoding.

?

The token "?" (question mark) means "unknown type". A decoder may use this as a signal that the type should be guessed from the data itself, or enter an error condition. When the type is unkown, the number of layers of encoding may also be unknown, and the data may need to be re-decoded until reaching the next encoding expressed in the type string, or if there is none, a probable final-type or unknown type.