Bugzilla – Bug 139
Revamp how end-of-file tokens work and how </body> is handled in conformance checkers.
Last modified: 2008-03-10 16:19:59 CET
Index: source =================================================================== --- source (revision 1347) +++ source (revision 1348) @@ -38469,20 +38469,16 @@ </dd> - <dt>A start tag token</dt> - <dt>An end tag token</dt> - <dt>A character token that is not one of one of U+0009 CHARACTER - TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C - FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt> - <dt>An end-of-file token</dt> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>.</p> <p>Set the document to <span>quirks mode</span>.</p> - <p>Then, switch the <span>insertion mode</span> to "<span - title="insertion mode: before html">before html</span>".</p> + <p>Switch the <span>insertion mode</span> to "<span + title="insertion mode: before html">before html</span>", then + reprocess the current token.</p> </dd> @@ -38536,12 +38532,7 @@ </dd> - <dt>A character token that is <em>not</em> one of U+0009 CHARACTER - TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C - FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt> - <dt>A start tag token</dt> - <dt>An end tag token</dt> - <dt>An end-of-file token</dt> + <dt>Anything else</dt> <dd> <p>Create an <code>HTMLElement</code> node with the tag name @@ -38554,7 +38545,8 @@ selection algorithm</span> with no manifest.</p> <p>Switch the <span>insertion mode</span> to "<span - title="insertion mode: before head">before head</span>".</p> + title="insertion mode: before head">before head</span>", then + reprocess the current token.</p> <p class="big-issue">Should probably make end tags be ignored, so that "</head><!-- --><html>" puts the comment before the @@ -38577,11 +38569,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -38643,10 +38630,7 @@ </dd> - <dt>A character token that is <em>not</em> one of U+0009 CHARACTER - TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C - FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt> - <dt>Any other start tag token</dt> + <dt>Anything else</dt> <dd> <p>Act as if a start tag token with the tag name "head" and no @@ -38670,11 +38654,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -38930,11 +38909,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A DOCTYPE token</dt> <dd> <p><span>Parse error</span>. Ignore the token.</p> @@ -39000,11 +38974,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -39091,11 +39060,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token</dt> <dd> @@ -39153,26 +39117,46 @@ </dd> + <dt>An end-of-file token</dt> + <dd> + + <p>If there is a node in the <span>stack of open elements</span> + that is not either a <code>dd</code> element, a <code>dt</code> + element, an <code>li</code> element, a <code>p</code> element, a + <code>tbody</code> element, a <code>td</code> element, a + <code>tfoot</code> element, a <code>th</code> element, a + <code>thead</code> element, a <code>tr</code> element, the + <code>body</code> element, or the <code>html</code> element, then + this is a <span>parse error</span>.</p> <!-- (some of those are + fragment cases) --> + + <p><span>Stop parsing</span>.</p> + + </dd> + <dt>An end tag whose tag name is "body"</dt> <dd> - <p>If the second element in the <span>stack of open - elements</span> is not a <code>body</code> element, this is a - <span>parse error</span>. Ignore the token. (<span>fragment - case</span>)</p> + <p>If the <span>stack of open elements</span> does not <span + title="has an element in scope">have a <code>body</code> element + in scope</span>, this is a <span>parse error</span>; ignore the + token.</p> <p>Otherwise, if there is a node in the <span>stack of open - elements</span> that is not either a <code>dd</code> element, - a <code>dt</code> element, an <code>li</code> element, a + elements</span> that is not either a <code>dd</code> element, a + <code>dt</code> element, an <code>li</code> element, a <code>p</code> element, a <code>tbody</code> element, a <code>td</code> element, a <code>tfoot</code> element, a <code>th</code> element, a <code>thead</code> element, a <code>tr</code> element, the <code>body</code> element, or the <code>html</code> element, then this is a <span>parse - error</span>.</p> + error</span>.</p> <!-- (some of those are fragment cases) --> + + <!-- the insertion mode here is forcibly "in body". --> <p>Switch the <span>insertion mode</span> to "<span - title="insertion mode: after body">after body</span>".</p> + title="insertion mode: after body">after body</span>". Otherwise, + ignore the token.</p> </dd> @@ -40163,11 +40147,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -40322,6 +40301,11 @@ </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Parse error</span>. <span>Stop parsing</span>.</p> + </dd> + <dt>Anything else</dt> <dd> @@ -40452,11 +40436,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -40836,11 +40815,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token</dt> <dd> <p><span title="insert a character">Insert the token's @@ -40951,6 +40925,11 @@ name "select" had been seen, and reprocess the token.</p> </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Parse error</span>. <span>Stop parsing</span>.</p> + </dd> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>. Ignore the token.</p> @@ -41004,11 +40983,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -41053,6 +41027,11 @@ </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Stop parsing.</span></p> + </dd> + <dt>Anything else</dt> <dd> @@ -41072,11 +41051,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -41145,6 +41119,11 @@ mode</span>.</p> </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Parse error</span>. <span>Stop parsing.</span></p> + </dd> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>. Ignore the token.</p> @@ -41161,11 +41140,6 @@ <!-- due to rules in the "in frameset" mode, this can't be entered in the fragment case --> <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing, with prejudice.</span></p> - </dd> - <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> @@ -41202,6 +41176,11 @@ frameset</span>".</p> </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Stop parsing.</span></p> + </dd> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>. Ignore the token.</p> @@ -41222,11 +41201,6 @@ <dl class="switch"> - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing</span>.</p> - </dd> - <dt>A comment token</dt> <dd> <p>Append a <code>Comment</code> node to the <code>Document</code> @@ -41246,6 +41220,11 @@ mode</span>.</p> </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Stop parsing</span>.</p> + </dd> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>. Switch the <span>insertion mode</span> @@ -41262,11 +41241,6 @@ mode: after after frameset">after after frameset</span>", tokens must be handled as follows:</p> <dl class="switch"> - - <dt>An end-of-file token</dt> - <dd> - <p><span>Stop parsing</span>.</p> - </dd> <dt>A comment token</dt> <dd> @@ -41287,6 +41261,11 @@ mode</span>.</p> </dd> + <dt>An end-of-file token</dt> + <dd> + <p><span>Stop parsing</span>.</p> + </dd> + <dt>Anything else</dt> <dd> <p><span>Parse error</span>. Switch the <span>insertion mode</span> @@ -41297,39 +41276,6 @@ </dl> - <h4>The unexpected end</h4> - - <p>When the user agent is to <dfn title="stop parsing, with - prejudice">stops parsing, with prejudice</dfn>, the user agent must - follow the steps in this section.</p> - - <ol> - - <li> - <p><span>Generate implied end tags.</span></p> - </li> - - <li> - - <p>If there are more than two nodes on the <span>stack of open - elements</span>, or if there are two nodes but the second node is - not a <code>body</code> node, this is a <span>parse - error</span>.</p> - - <p>Otherwise, if the parser was originally created as part of the - <span>HTML fragment parsing algorithm</span>, and there's more - than one element in the <span>stack of open elements</span>, and - the second node on the <span>stack of open elements</span> is not - a <code>body</code> node, then this is a <span>parse - error</span>. (<span>fragment case</span>)</p> - - </li> - - <li><p><span>Stop parsing</span> (see the next section).</p></li> - - </ol> - - <h4>The end</h4> <p>Once the user agent <dfn title="stop parsing">stops parsing</dfn>