Bugzilla – Bug 531
change schema to allow <meta charset='utf-8'> in XML documents
Last modified: 2009-07-05 09:50:50 CEST
Index: source =================================================================== --- source (revision 2858) +++ source (revision 2859) @@ -9452,7 +9452,7 @@ <dd><code title="attr-meta-name">name</code></dd> <dd><code title="attr-meta-http-equiv">http-equiv</code></dd> <dd><code title="attr-meta-content">content</code></dd> - <dd><code title="attr-meta-charset">charset</code> (<span title="HTML documents">HTML</span> only)</dd> + <dd><code title="attr-meta-charset">charset</code></dd> <dt>DOM interface:</dt> <dd> <pre class="idl">interface <dfn>HTMLMetaElement</dfn> : <span>HTMLElement</span> { @@ -9490,13 +9490,13 @@ <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn> attribute specifies the character encoding used by the document. This is called a <span>character encoding - declaration</span>.</p> - - <p>The <code title="attr-meta-charset">charset</code> attribute may - be specified in <span title="HTML5">HTML documents</span> only, it - must not be used in <span title="XHTML">XML documents</span>. There - must not be more than one element with a <code - title="attr-meta-charset">charset</code> attribute per document.</p> + declaration</span>. There must not be more than one element with a + <code title="attr-meta-charset">charset</code> attribute per + document. If the attribute is present in an <span title="XHTML">XML + document</span>, its value must be an <span>ASCII + case-insensitive</span> match for the string "<code + title="">UTF-8</code>", and the resource must be encoded using the + UTF-8 character encoding.</p> <p>The <dfn title="attr-meta-content"><code>content</code></dfn> attribute gives the value of the document metadata or pragma
Created attachment 104 [details] patch with proposed change I'm a bit surprised that this patch works, but it does seem to behave as expected.
syntax r440 http://svn8.cvsdude.com/vvc/whattf/syntax?view=revision&revision=440
I meant to note that this isn't the ideal way to handle this, because it's not going to produce a very useful error message, so at some point later, we'll want to add an improved way for reporting this.
The patch probably also doesn't check that the document's encoding actually is utf-8.
(In reply to comment #4) > The patch probably also doesn't check that the document's encoding actually is > utf-8. True, it doesn't. It seems like a constraint that either needs to be checked in the htmlparser code, or the syntax/non-schema code. Anyway, I'm reopening this, since my patch is only a partial fix.
(In reply to comment #4) > The patch probably also doesn't check that the document's encoding actually is > utf-8. I have opened bug 591 for that.