NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 531 - change schema to allow <meta charset='utf-8'> in XML documents
change schema to allow <meta charset='utf-8'> in XML documents
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML5 schema
HEAD
All All
: P2 normal
Assigned To: Michael[tm] Smith
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-27 14:23 CEST by Henri Sivonen
Modified: 2009-07-05 09:50 CEST (History)
1 user (show)

See Also:


Attachments
patch with proposed change (622 bytes, patch)
2009-05-30 16:27 CEST, Michael[tm] Smith
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2009-05-27 14:23:03 CEST
Index: source
===================================================================
--- source	(revision 2858)
+++ source	(revision 2859)
@@ -9452,7 +9452,7 @@
    <dd><code title="attr-meta-name">name</code></dd>
    <dd><code title="attr-meta-http-equiv">http-equiv</code></dd>
    <dd><code title="attr-meta-content">content</code></dd>
-   <dd><code title="attr-meta-charset">charset</code> (<span title="HTML documents">HTML</span> only)</dd>
+   <dd><code title="attr-meta-charset">charset</code></dd>
    <dt>DOM interface:</dt>
    <dd>
 <pre class="idl">interface <dfn>HTMLMetaElement</dfn> : <span>HTMLElement</span> {
@@ -9490,13 +9490,13 @@
   <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn>
   attribute specifies the character encoding used by the
   document. This is called a <span>character encoding
-  declaration</span>.</p>
-
-  <p>The <code title="attr-meta-charset">charset</code> attribute may
-  be specified in <span title="HTML5">HTML documents</span> only, it
-  must not be used in <span title="XHTML">XML documents</span>. There
-  must not be more than one element with a <code
-  title="attr-meta-charset">charset</code> attribute per document.</p>
+  declaration</span>. There must not be more than one element with a
+  <code title="attr-meta-charset">charset</code> attribute per
+  document. If the attribute is present in an <span title="XHTML">XML
+  document</span>, its value must be an <span>ASCII
+  case-insensitive</span> match for the string "<code
+  title="">UTF-8</code>", and the resource must be encoded using the
+  UTF-8 character encoding.</p>
 
   <p>The <dfn title="attr-meta-content"><code>content</code></dfn>
   attribute gives the value of the document metadata or pragma
Comment 1 Michael[tm] Smith 2009-05-30 16:27:01 CEST
Created attachment 104 [details]
patch with proposed change

I'm a bit surprised that this patch works, but it does seem to behave as expected.
Comment 2 Michael[tm] Smith 2009-06-25 11:19:59 CEST
syntax r440
http://svn8.cvsdude.com/vvc/whattf/syntax?view=revision&revision=440
Comment 3 Michael[tm] Smith 2009-06-25 11:31:55 CEST
I meant to note that this isn't the ideal way to handle this, because it's not going to produce a very useful error message, so at some point later, we'll want to add an improved way for reporting this.
Comment 4 Simon Pieters 2009-06-26 23:52:33 CEST
The patch probably also doesn't check that the document's encoding actually is utf-8.
Comment 5 Michael[tm] Smith 2009-06-27 06:47:24 CEST
(In reply to comment #4)
> The patch probably also doesn't check that the document's encoding actually is
> utf-8.

True, it doesn't. It seems like a constraint that either needs to be checked in the htmlparser code, or the syntax/non-schema code.

Anyway, I'm reopening this, since my patch is only a partial fix. 
Comment 6 Michael[tm] Smith 2009-07-05 09:50:50 CEST
(In reply to comment #4)
> The patch probably also doesn't check that the document's encoding actually is
> utf-8.

I have opened bug 591 for that.