NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 101 - 'character encoding declaration' is now a cross-reference term; made the content='' attribute of <meta> case-insensitive for charset decls. switched utf-8 and win1252 defaults around. other minor editorial jiggling.
'character encoding declaration' is now a cross-reference term; made the cont...
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-03 13:11 CET by Nobody
Modified: 2008-03-20 16:12 CET (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nobody 2008-03-03 13:11:15 CET
Index: source
===================================================================
--- source	(revision 1274)
+++ source	(revision 1275)
@@ -5862,9 +5862,9 @@
   metadata with the <code title="attr-meta-name">name</code>
   attribute, pragma directives with the <code
   title="attr-meta-http-equiv">http-equiv</code> attribute, and the
-  file's character encoding declaration when an HTML document is
-  serialised to string form (e.g. for transmission over the network or
-  for disk storage) with the <code
+  file's <span>character encoding declaration</span> when an HTML
+  document is serialised to string form (e.g. for transmission over
+  the network or for disk storage) with the <code
   title="attr-meta-charset">charset</code> attribute.</p>
 
   <p>Exactly one of the <code title="attr-meta-name">name</code>,
@@ -5877,6 +5877,11 @@
   the <code title="attr-meta-content">content</code> attribute must
   also be specified. Otherwise, it must be omitted.</p>
 
+  <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn>
+  attribute specifies the character encoding used by the
+  document. This is called a <span>character encoding
+  declaration</span>.</p>
+
   <p>The <code title="attr-meta-charset">charset</code> attribute may
   be specified in <span title="HTML5">HTML documents</span> only, it
   must not be used in <span title="XHTML">XML documents</span>. If the
@@ -6158,18 +6163,19 @@
     declaration state's</span> user agent requirements are all handled
     by the parsing section of the specification. The state is just an
     alternative form of setting the <code
-    title="meta-charset">charset</code> attribute: it is <a
-    href="#charset">a character encoding declaration</a>.</p>
+    title="meta-charset">charset</code> attribute: it is a
+    <span>character encoding declaration</span>.</p>
 
     <p>For <code>meta</code> elements in the <span
     title="attr-meta-http-equiv-content-type">Encoding declaraton
     state</span>, the <code title="attr-meta-content">content</code>
-    attribute must have a value consisting of the literal string
-    "<code title="">text/html;</code>", optionally followed by a
-    single U+0020 SPACE character, followed by the literal string
-    "<code title="">charset=</code>", followed by the character
-    encoding name of <a href="#charset">the character encoding
-    declaration</a>.</p>
+    attribute must have a value that is a case-insensitive<!-- ASCII
+    XXX--> match of a string that consists of the literal string
+    "<code title="">text/html;</code>", optionally followed by any
+    number of <span title="space character">space characters</span>,
+    followed by the literal string "<code title="">charset=</code>",
+    followed by the character encoding name of <a href="#charset">the
+    character encoding declaration</a>.</p>
 
     <p>If the document contains a <code>meta</code> element in the
     <span title="attr-meta-http-equiv-content-type">Encoding
@@ -6357,17 +6363,14 @@
 
   <h5 id="charset">Specifying the document's character encoding</h5>
 
-  <p>The <code>meta</code> element may also be used to provide UAs
-  with character encoding information for <span
-  title="HTML5">HTML</span> files, by setting the <dfn
-  title="attr-meta-charset"><code>charset</code></dfn> attribute to
-  the name of a character encoding. This is called a character
-  encoding declaration.</p>
-
   <!-- XXX maybe the rest should move to "writing html" section,
   though if we do then we have to duplicate the requirements in the
   parsing section for conformance checkers -->
 
+  <p>A <dfn>character encoding declaration</dfn> is a mechanism by
+  which the character encoding used to store or transmit a document is
+  specified.</p>
+
   <p>The following restrictions apply to character encoding
   declarations:</p>
 
@@ -6381,8 +6384,8 @@
    href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
    http://www.iana.org/assignments/character-sets --></li>
 
-   <li>The attribute value must be serialised without the use of
-   character entity references of any kind.</li>
+   <li>The encoding name must be serialised without the use of
+   character entity references or character escapes of any kind.</li>
 
   </ul>
 
@@ -34680,9 +34683,9 @@
   <p>The various types of content mentioned above are described in the
   next few sections.</p>
 
-  <p>In addition, there are some restrictions on how <a
-  href="#charset">character encoding declarations</a> are to be
-  serialised, as discussed in the section on that topic.</p>
+  <p>In addition, there are some restrictions on how <span>character
+  encoding declarations</span> are to be serialised, as discussed in
+  the section on that topic.</p>
 
   <p>The U+0000 NULL character, control characters other than the
   <span title="space character">space characters</span>, and
@@ -35925,13 +35928,13 @@
    <li><p>Otherwise, return an implementation-defined or
    user-specified default character encoding, with the <span
    title="concept-encoding-confidence">confidence</span>
-   <i>tentative</i>. Due to its use in legacy content, <code
+   <i>tentative</i>. In non-legacy environments, the more
+   comprehensive <code title="">UTF-8</code> encoding is
+   recommended. Due to its use in legacy content, <code
    title="">windows-1252</code> is recommended as a default in
-   predominantly Western demographics. In non-legacy environments, the
-   more comprehensive <code title="">UTF-8</code> encoding is
-   recommended instead. Since these encodings can in many cases be
-   distinguished by inspection, a user agent may heuristically decide
-   which to use as a default.</p></li>
+   predominantly Western demographics instead. Since these encodings
+   can in many cases be distinguished by inspection, a user agent may
+   heuristically decide which to use as a default.</p></li>
 
   </ol>