Bugzilla – Bug 532
Reword how we require that XML documents that use <meta charset> must use UTF-8. Also require it in the first 512 bytes.
Last modified: 2009-11-23 17:17:26 CET
Index: source =================================================================== --- source (revision 2860) +++ source (revision 2861) @@ -9488,15 +9488,18 @@ also be specified. Otherwise, it must be omitted.</p> <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn> - attribute specifies the character encoding used by the document. In - <span title="HTML5">HTML documents</span> this is a <span>character - encoding declaration</span>. If the attribute is present in an <span - title="XHTML">XML document</span>, its value must be an <span>ASCII + attribute specifies the character encoding used by the + document. This is a <span>character encoding declaration</span>. If + the attribute is present in an <span title="XHTML">XML + document</span>, its value must be an <span>ASCII case-insensitive</span> match for the string "<code - title="">UTF-8</code>", and the resource must be encoded using the - UTF-8 character encoding. (The element has no effect in XML - documents, and is only allowed to facilitate migration to and from - XHTML.)</p> + title="">UTF-8</code>" (and the document is therefore required to + use UTF-8 as its encoding).</p> + + <p class="note">The <code title="attr-meta-charset">charset</code> + attribute on the <code>meta</code> element has no effect in XML + documents, and is only allowed in order to facilitate migration to + and from XHTML.</p> <p>There must not be more than one <code>meta</code> element with a <code title="attr-meta-charset">charset</code> attribute per @@ -10081,7 +10084,9 @@ <!-- XXX maybe the rest should move to "writing html" section, though if we do then we have to duplicate the requirements in the - parsing section for conformance checkers --> + parsing section for conformance checkers, and we have to make sure + that the requirements for charset="" apply even in XML, for the + <meta charset=""> polyglot hack --> <p>A <dfn>character encoding declaration</dfn> is a mechanism by which the character encoding used to store or transmit a document is @@ -10110,18 +10115,20 @@ </ul> - <p>If the document does not start with a BOM, and if its encoding is - not explicitly given by <span title="Content-Type">Content-Type - metadata</span>, then the character encoding used must be an - <span>ASCII-compatible character encoding</span>, and, in addition, - if that encoding isn't US-ASCII itself, then the encoding must be - specified using a <code>meta</code> element with a <code + <p>If an <span title="HTML documents">HTML document</span> does not + start with a BOM, and if its encoding is not explicitly given by + <span title="Content-Type">Content-Type metadata</span>, then the + character encoding used must be an <span>ASCII-compatible character + encoding</span>, and, in addition, if that encoding isn't US-ASCII + itself, then the encoding must be specified using a + <code>meta</code> element with a <code title="attr-meta-charset">charset</code> attribute or a <code>meta</code> element in the <span title="attr-meta-http-equiv-content-type">Encoding declaration state</span>.</p> - <p>If the document contains a <code>meta</code> element with a <code + <p>If an <span title="HTML documents">HTML document</span> contains + a <code>meta</code> element with a <code title="attr-meta-charset">charset</code> attribute or a <code>meta</code> element in the <span title="attr-meta-http-equiv-content-type">Encoding declaration