MIME Overview

by Mark Grand mgrand@mindspring.com

Revised October 26, 1993

Internet e-mail allows mail messages to be exchanged between users of computers around the world and occasionally beyond: to space shuttles. One of the main reasons that Internet e-mail has achieved such wide use is because it provides a standard mechanism for messages to be exchanged between over 10,000,000 computers connected to the Internet.

The standards that are the basis for Internet e-mail were established in 1982. Though they were state of the art in 1982, in the intervening years they have begun to show their age. The 1982 standards allow for mail messages that contain a single human readable message with the restrictions that:

the message contains only ASCII characters.
the message contains no lines longer than 1000 characters.
the message does not exceed a certain length

The 1982 standards do not allow EDI to be reliably transmitted through Internet e-mail, since EDI messages can violate all of these restrictions. There are a number of other types of messages and services that have are supported by other more recently designed e-mail standards. A new Internet mail standard was approved in June of 1992. The new standard is called MIME.

MIME is an acronym for Multipurpose Internet Mail Extensions. It builds on the older standard by standardizing additional fields for mail message headers that describe new types of content and organization for messages.

MIME allows mail messages to contain:

Multiple objects in a single message.
Text having unlimited line length or overall length.
Character sets other than ASCII, allowing non-English language messages.
Multi-font messages.
Binary or application specific files.
Images, Audio, Video and multi-media messages.

MIME defines the following new header fields:

The MIME-Version header field, which uses a version number to declare that a message conforms to the MIME standard.
The Content-Type header field, which can be used to specify the type and subtype of data in the body of a message and to fully specify the encoding of such data.

The Content-Type value Text, which can be used to represent textual information in a number of character sets and formatted text description languages in a standardized manner.
The Content-Type value Multipart, which can be used to combine several body parts, possibly of differing types of data, into a single message.
The Content-Type value Application, which can be used to transmit application data or binary data.
The Content-Type value Message, for encapsulating a mail message.
The Content-Type value Image, for transmitting still image (picture) data.
The Content-Type value Audio, for transmitting audio or voice data.
The Content-Type value Video, for transmitting video or moving image data, possibly with audio as part of the composite video data format.

The Content-Transfer-Encoding header field, that specifies how the data is encoded to allow it to pass through mail transports having data or character set limitations.
Two header fields that can be used to further identify and describe the data in a message body: the Content-ID and Content-Description header fields.

MIME is an extensible mechanism. It is expected that the set of content-type/subtype pairs and their associated parameters will grow with time. Several other MIME fields, such as character set names, are likely to have new values defined over time. To ensure that the set of such values develops in an orderly, and public manner, MIME defines a registration process that uses the Internet Assigned Numbers Authority (IANA) as a central registry for such values.

To promote interoperability between implementations, the MIME standard document specifies a minimal subset of the above mechanisms that are required for an implementation to claim to conform to the MIME standard.

MIME Technical Summary

MIME is defined by an Internet standard document called RFC 1521. This document summarizes the contents of RFC1521. Sufficient detail is presented here to understand the capabilities of MIME. For sufficient detail to implement MIME please read RFC1521.

MIME allows messages to contain multiple objects. When multiple objects are in a MIME message, they are represented in a form called a body part. A body part has a header and a body, so it makes sense to speak about the body of a body part. Also, body parts can be nested in bodies that contain one or multiple body parts.

The Content-Type values, subtypes, and parameter names defined in the MIME standard are case-insensitive. However, many parameter values are case sensitive

The MIME standard is written to allow MIME to be extended in certain ways, without having to revise the standard. MIME specifies sets of values that are allowed for various fields and parameters. The provides a procedure for extending these sets of values by registering them with an entity called the Internet Assigned Numbers Authority (IANA).

The MIME-Version Header Field

MIME is designed to be compatible with older Internet mail standards. In particular, it is compatible with RFC822. If a mail reading program receives a message that is a MIME message then it will likely perform additional processing for the MIME message that it would not perform for non-MIME messages. To allow mail reading programs to recognize MIME messages, MIME messages are required to contain a MIME-Version header field. The MIME-Version header field specifies the version of the MIME standard that the message conforms to.

As of this writing, there is only version (1.0) of the MIME standard. Messages that comply with the standard must include a header field, with the following verbatim text:

MIME-Version: 1.0

The MIME-Version header field is required at the top level of a message. It is not required for each body part of a multipart entity. It is required for the embedded headers of a body of type Message if and only if the embedded message is claimed to be MIME-compliant.

The Content-Type Header Field

The Content-Type field describes the data contained in a body fully enough that the mail reader can pick an appropriate mechanism to present the data to the user, or otherwise deal with the data appropriately.

The Content-Type header field is used to specify the nature of data in the body or body part, by giving type and subtype identifiers, and by providing parameters that may be needed for certain types. After the type and subtype names, the remainder of the header field is a set of parameters, specified in an attribute=value notation. The set of meaningful parameters differs for different types. The order of parameters is not significant. Comments are allowed (in accordance with RFC822 rules) in structured header fields by placing them in parentheses.

The top-level Content-Type is used to declare the general type of data, while the subtype specifies a specific format for that type of data. Thus, a Content-Type of Image/xyz is enough to tell a mail reader that the data is an image, even if the mail reader has no knowledge of the specific image format xyz. Such information can be used, to decide whether or not to show a user the raw data from an unrecognized subtype: such an action might be reasonable for unrecognized subtypes of Text, but not for unrecognized subtypes of Image or Audio. For this reason, registered subtypes of Audio, Image, Text, and Video, should not contain embedded information that is really of a different type. Such compound types are usually represented using the Multipart or Application types.

Parameters are modifiers of the content-subtype. Although most parameters make sense only with certain content-types, others are "global" in the sense that they might apply to any subtype. For example, the Boundary parameter, which is used to indicate how body parts are separated from each other, makes sense only for the Multipart content-type. The Charset parameter might make sense with several content-types.

The MIME standard defines seven content-types. The authors of the MIME standard state that the set of seven types is "substantially complete". They expect additional supported types to be accommodated by creating new subtypes of the seven initial top-level types. The MIME standard, functioning as a constitution for the MIME community, states that new standard content types can be defined only by revising the standard (as opposed to the registration procedure for other types of extensions). However, MIME does provide for the use of non-standard content types. Non-standard content-types can be used, but must be given names starting with X-. Future standard content type names will not begin with X-.

The syntax for the content type header field is

     Content-Type := type "/" subtype [";" parameter]_

The defined content types are:

Application: indicates data that does not fit into any of the other categories, such as uninterpreted binary data or information to be processed by a mail-based application. In addition to the following subtypes, it is likely that additional subtypes will be defined for applications such as mail-based scheduling systems, spreadsheets and EDI.
Audio: Indicates audio data. Audio requires an audio output device (such as a speaker or a telephone) to "display" the contents.
Image: Image data. Image requires a display device (such as a graphical display, a printer, or a FAX machine) to view the information.
Message: indicates an encapsulated message.
Multipart: indicates data consisting of multiple body parts, each having its own data type. It is possible to tell where each body part begins and ends because each body part is preceded by a special string called an encapsulation boundary; the last body part is followed by a closing boundary.
Text: The text Content-Type is for sending material that is principally textual in form. It is the default Content-Type. The character set of the text can be indicated by using the Charset parameter. The default Content-Type for Internet mail is
Video: indicates that the body contains a time-varying-picture image, possibly with color and coordinated sound. The term Video is used very generically and does not refer to any particular technology or format. It is not meant to preclude subtypes such as animated drawings encoded compactly.
X-TypeName: This is any type name that begins with X-. A Content-Type value beginning with X- is a private value, to be used by consenting mail systems by mutual agreement. The standard specifies no subtypes.

No type may be specified without a subtype.

The standard allows the use of additional sub-types without having to change the standard. However, it is important to insure that sub-types used by different user communities of MIME do not conflict. It would be confusing if ContentType: application/foobar meant two different things. The standard specifies two mechanisms for defining new Content-Type subtypes:

1. Private values (starting with X-) may be defined between cooperating mail composing and reading programs without outside registration. Use of this mechanism requires knowing that the reader of the message will not mistake the content type for something other than originally intended.

2. New standard values must be registered with IANA. Where intended for public use, the formats they refer to must also be defined by a published specification.

Messages that do not have a Content-Type field in their header are displayed by user agents as if
Content-Type: Text/plain; Charset=US-ASCII
had been specified.

When a mail reader finds mail with an unknown Content-Type value, it will generally treat it as equivalent to application/octet-stream.

The Content-Transfer-Encoding Header Field

Many Content-Types that could usefully be transported by e-mail are represented, in their "natural" format, as 8-bit character or binary data. Such data cannot be transmitted over some transport protocols. For example, SMTP (Simple Mail Transfer Protocol is an Internet standard for transporting e-mail defined by a document called RFC821) restricts mail messages to 7-bit ASCII data with lines no longer than 1000 characters.

MIME provides two mechanisms for re-encoding such data into a 7-bit short-line format. The Content-Transfer-Encoding header field indicates the mechanism used to perform such an encoding.

The possible values for the Content-Transfer-Encoding field are:

BASE64
QUOTED-PRINTABLE
8BIT
7BIT
BINARY
x-EncodingName

These values are not case sensitive: Base64, BASE64 and bAsE64 are all equivalent. An encoding type of 7BIT requires that the body is already in a 7-bit mail-ready representation. That is the default value:
Content-Transfer-Encoding: 7BIT
is assumed if the Content-Transfer-Encoding header field is not present.

Both BASE64 and the QUOTED-PRINTABLE imply an encoding that consists of lines no longer than 76 ASCII characters. In other respects the two encoding schemes are very different.

The encoding scheme implied by QUOTED-PRINTABLE is most appropriate for data that consists primarily of printable ASCII characters. Using this encoding method, printable ASCII characters are represented as themselves. The equals sign (=) serves as an escape character. Any character that is not a printable or white space ASCII character is represented as an equals sign followed by two hexadecimal digits. An equals sign in the message is also represented in this way. Lines that are longer than 76 characters are cut off after the 75th character and the line ends with an equals sign.

The advantages of using the QUOTED-PRINTABLE encoding for message that are mostly printable ASCII characters are:

Few additional characters are required.
The message can be read by human beings who to not have a MIME aware mail reading program.

As an example, here is an EDI interchange in QUOTED-PRINTABLE encoding:

ISA*00*          *00*          *01*987654321      *12*8005551234    *910=
607*0111*U*00200*110000777*0*T*>
GS*PO*987654321*8005551234*920501*2032*7721*X*002003
ST*850*000000001
BEG*00*NE*MS1112**920501**CONTRACT#
REF*IT*8128827763
N1*ST*MAVERICK SYSTEMS
N3*3312 NEW HAMPSHIRE STREET
N4*SAN JOSE*CA*94811
PO1*1*25*EA***VC*TP8MM*CB*TAPE8MM
PO1*2*30*EA***VC*TP1/4*CB*TAPE1/4INCH
PO1*3*125*EA***VC*DSK31/2*CB*DISK35
CTT*3
SE*11*000000001
GE*1    *7721
IEA*1*110000777

Except for the ISA segment having been wrapped onto two lines, the QUOTED-PRINTABLE encoding of the interchange is identical to its 7BIT representation.

The BASE64 encoding mechanism is well suited for representing binary files. It represents any sequence of three bytes as four printable ASCII characters. The same interchange as shown above but using the BASE64 encoding would look like:

SVNBKjAwKiAgICAgICAgICAqMDAqICAgICAgICAgICowMSo5ODc2NTQzMjEgICAgICAqMTIq
ODAwNTU1MTIzNCAgICAgKjkxMDYwNyowMTExKlUqMDAyMDAqMTEwMDAwNzc3KjAqVCo+CkdT
KlBPKjk4NzY1NDMyMSo4MDA1NTUxMjM0KjkyMDUwMSoyMDMyKjc3MjEqWCowMDIwMDMKU1Qq
ODUwKjAwMDAwMDAwMQpCRUcqMDAqTkUqTVMxMTEyKio5MjA1MDEqKkNPTlRSQUNUIwpSRUYq
SVQqODEyODgyNzc2MwpOMSpTVCpNQVZFUklDSyBTWVNURU1TCk4zKjMzMTIgTkVXIEhBTVBT
SElSRSBTVFJFRVQKTjQqU0FOIEpPU0UqQ0EqOTQ4MTEKUE8xKjEqMjUqRUEqKipWQypUUDhN
TSpDQipUQVBFOE1NClBPMSoyKjMwKkVBKioqVkMqVFAxLzQqQ0IqVEFQRTEvNElOQ0gKUE8x
KjMqMTI1KkVBKioqVkMqRFNLMzEvMipDQipESVNLMzUKQ1RUKjMKU0UqMTEqMDAwMDAwMDAx
CkdFKjEqNzcyMQpJRUEqMSoxMTAwMDA3NzcK

BASE64 bears some resemblance to uuencode in both appearance and function. However, uuencode uses characters that may not be processed properly by an EBCDIC gateway.

The values 8bit, 7bit and binary all imply that no encoding has been performed. They are useful to indicate of the kind of data contained in the object, and therefore of the kind of encoding that might need to be performed for transmission in a given transport system. 7bit means that the data is all represented as short lines of ASCII data. 8bit means that the lines are short, but there may be non-ASCII characters. Binary means that not only may non-ASCII characters be present, but also that the lines are not necessarily short enough for SMTP transport.

The difference between 8bit and binary is that binary does not require adherence to any limits on line length. 8bit and binary are intended for compatibility with future Internet e-mail transport standards and with gateways to non-Internet environments. As of this writing, there are no standardized Internet e-mail transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies.

Note that the five values defined for the Content-Transfer-Encoding field imply nothing about the Content-Type other than the algorithm by which it was encoded or the transport system requirements if unencoded.

Some implementations may support additional Content-Transfer-Encoding values (it is permitted but strongly discouraged by the standard). Any such additional values must have names that begin with X- to indicate its non-standard status For example:

Content-Transfer-Encoding: x-my-new-encoding.

If a Content-Transfer-Encoding header field appears as part of a message header, it applies to the entire body of that message. If a Content-Transfer-Encoding header field appears as part of a body part's headers, it applies only to the body of that body part. If a message or body part is of type Multipart or Message, the Content-Transfer-Encoding must be 7bit, 8bit or Binary.

The encoding mechanisms defined here explicitly encode all data in ASCII. Thus, for example, suppose a message or body part has header fields such as:

Content-Type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: base64

This should be interpreted to mean that the body is a Base64 ASCII encoding of data that was originally in ISO-8859-1, and will be in that character set again after decoding.

Optional Content-ID Header Field

It may be desirable to allow one body to reference another. Accordingly, bodies may be labeled using the Content-ID header field, which is syntactically identical to the RFC822 Message-ID header field. Content-ID values should be as unique as possible.

Although the Content-ID header is generally optional, its use is mandatory in headers embedded in a body of type Message/External-Body.

Optional Content-Description Header Field

The ability to associate descriptive information with a body is often desirable. For example, it may be useful to mark an Image body as "a picture of the Space Shuttle Endeavor." Such text may be placed in the Content-Description header field.

Sumary

Using MIME-Version, Content-Type, and Content-Transfer-Encoding header fields, it is possible to include arbitrary types of data objects in RFC822 conformant mail messages. No restrictions imposed by RFC822 or RFC822 are violated. MIME has been designed to avoid problems caused by additional restrictions imposed by some Internet mail transport mechanisms. The Multipart and Message content types allow mixing and hierarchical structuring of objects of different types in a single message. Further content types provide a mechanism for tagging messages or body parts as audio, image, or other kinds of data. A parameter syntax allows further specification of data format details, particularly the specification of alternate character sets. Additional optional header fields provide mechanisms for certain extensions deemed desirable by many implementors. Finally, a number of useful content types are defined for general use by consenting user agents, notably Text/Richtext, Message/Partial, and Message/External-Body.

To promote interoperability between user agents, the MIME standard specifies a minimal subset of MIME features a user agent must support to be considered MIME conformant.

A Complex Multipart Example

The outline of a complex multipart message follows. This message has five parts to be displayed serially: two introductory plain text parts, an embedded multipart message, a richtext part, and a closing encapsulated text message in a non-ASCII character set. The embedded multipart message has two parts to be displayed in parallel, a picture and an audio fragment.

MIME-Version: 1.0
From: Nathaniel Borenstein 
Subject: A multipart example
Content-Type: multipart/mixed;
                 boundary=unique-boundary-1

This is the preamble area of a multipart message.  Mail readers
that understand multipart format should ignore this preamble.

If you are reading this text, you might want to consider
changing to a mail reader that understands how to properly
display multipart messages.
--unique-boundary-1

Some text appears here...
[Note that the preceding blank line means
no header fields were given and this is text,
with charset US ASCII.  It could have been
done with explicit typing as in the next part.]
--unique-boundary-1
Content-type: text/plain; charset=US-ASCII

This could have been part of the previous part, but illustrates
explicit versus implicit typing of body parts.

--unique-boundary-1
Content-Type: multipart/parallel; boundary=unique-boundary-2

--unique-boundary-2
Content-Type: audio/basic
Content-Transfer-Encoding: base64

... base64-encoded 8000 Hz single-channel u-law-format audio data
goes here ...

--unique-boundary-2
Content-Type: image/gif
Content-Transfer-Encoding: Base64

... base64-encoded image data goes here ...

--unique-boundary-2--

--unique-boundary-1
Content-type: text/richtext

This is richtext.Isn't it
cool?

--unique-boundary-1
Content-Type: message/rfc822

From: (name in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable

... Additional text in ISO-8859-1 goes here ...

--unique-boundary-1--

Return to the Java Consultant's home page