NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 139 - Revamp how end-of-file tokens work and how </body> is handled in conformance checkers.
Revamp how end-of-file tokens work and how </body> is handled in conformance ...
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-06 13:01 CET by Nobody
Modified: 2008-03-10 16:19 CET (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nobody 2008-03-06 13:01:07 CET
Index: source
===================================================================
--- source	(revision 1347)
+++ source	(revision 1348)
@@ -38469,20 +38469,16 @@
 
    </dd>
 
-   <dt>A start tag token</dt>
-   <dt>An end tag token</dt>
-   <dt>A character token that is not one of one of U+0009 CHARACTER
-   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
-   <dt>An end-of-file token</dt>
+   <dt>Anything else</dt>
    <dd>
 
     <p><span>Parse error</span>.</p>
 
     <p>Set the document to <span>quirks mode</span>.</p>
 
-    <p>Then, switch the <span>insertion mode</span> to "<span
-    title="insertion mode: before html">before html</span>".</p>
+    <p>Switch the <span>insertion mode</span> to "<span
+    title="insertion mode: before html">before html</span>", then
+    reprocess the current token.</p>
 
    </dd>
 
@@ -38536,12 +38532,7 @@
 
    </dd>
 
-   <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
-   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
-   <dt>A start tag token</dt>
-   <dt>An end tag token</dt>
-   <dt>An end-of-file token</dt>
+   <dt>Anything else</dt>
    <dd>
 
     <p>Create an <code>HTMLElement</code> node with the tag name
@@ -38554,7 +38545,8 @@
     selection algorithm</span> with no manifest.</p>
 
     <p>Switch the <span>insertion mode</span> to "<span
-    title="insertion mode: before head">before head</span>".</p>
+    title="insertion mode: before head">before head</span>", then
+    reprocess the current token.</p>
 
     <p class="big-issue">Should probably make end tags be ignored, so
     that "&lt;/head>&lt;!-- -->&lt;html>" puts the comment before the
@@ -38577,11 +38569,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -38643,10 +38630,7 @@
 
    </dd>
 
-   <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
-   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
-   <dt>Any other start tag token</dt>
+   <dt>Anything else</dt>
    <dd>
 
     <p>Act as if a start tag token with the tag name "head" and no
@@ -38670,11 +38654,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -38930,11 +38909,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A DOCTYPE token</dt>
    <dd>
     <p><span>Parse error</span>. Ignore the token.</p>
@@ -39000,11 +38974,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -39091,11 +39060,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token</dt>
    <dd>
 
@@ -39153,26 +39117,46 @@
 
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+
+    <p>If there is a node in the <span>stack of open elements</span>
+    that is not either a <code>dd</code> element, a <code>dt</code>
+    element, an <code>li</code> element, a <code>p</code> element, a
+    <code>tbody</code> element, a <code>td</code> element, a
+    <code>tfoot</code> element, a <code>th</code> element, a
+    <code>thead</code> element, a <code>tr</code> element, the
+    <code>body</code> element, or the <code>html</code> element, then
+    this is a <span>parse error</span>.</p> <!-- (some of those are
+    fragment cases) -->
+
+    <p><span>Stop parsing</span>.</p>
+
+   </dd>
+
    <dt>An end tag whose tag name is "body"</dt>
    <dd>
 
-    <p>If the second element in the <span>stack of open
-    elements</span> is not a <code>body</code> element, this is a
-    <span>parse error</span>. Ignore the token.  (<span>fragment
-    case</span>)</p>
+    <p>If the <span>stack of open elements</span> does not <span
+    title="has an element in scope">have a <code>body</code> element
+    in scope</span>, this is a <span>parse error</span>; ignore the
+    token.</p>
 
     <p>Otherwise, if there is a node in the <span>stack of open
-    elements</span> that is not either a <code>dd</code> element,
-    a <code>dt</code> element, an <code>li</code> element, a
+    elements</span> that is not either a <code>dd</code> element, a
+    <code>dt</code> element, an <code>li</code> element, a
     <code>p</code> element, a <code>tbody</code> element, a
     <code>td</code> element, a <code>tfoot</code> element, a
     <code>th</code> element, a <code>thead</code> element, a
     <code>tr</code> element, the <code>body</code> element, or the
     <code>html</code> element, then this is a <span>parse
-    error</span>.</p>
+    error</span>.</p> <!-- (some of those are fragment cases) -->
+
+    <!-- the insertion mode here is forcibly "in body". -->
 
     <p>Switch the <span>insertion mode</span> to "<span
-    title="insertion mode: after body">after body</span>".</p>
+    title="insertion mode: after body">after body</span>". Otherwise,
+    ignore the token.</p>
 
    </dd>
 
@@ -40163,11 +40147,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -40322,6 +40301,11 @@
 
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Parse error</span>. <span>Stop parsing</span>.</p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
 
@@ -40452,11 +40436,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -40836,11 +40815,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token</dt>
    <dd>
     <p><span title="insert a character">Insert the token's
@@ -40951,6 +40925,11 @@
     name "select" had been seen, and reprocess the token.</p>
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Parse error</span>. <span>Stop parsing</span>.</p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
     <p><span>Parse error</span>. Ignore the token.</p>
@@ -41004,11 +40983,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -41053,6 +41027,11 @@
 
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Stop parsing.</span></p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
 
@@ -41072,11 +41051,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -41145,6 +41119,11 @@
     mode</span>.</p>
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Parse error</span>. <span>Stop parsing.</span></p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
     <p><span>Parse error</span>. Ignore the token.</p>
@@ -41161,11 +41140,6 @@
   <!-- due to rules in the "in frameset" mode, this can't be entered in the fragment case -->
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing, with prejudice.</span></p>
-   </dd>
-
    <dt>A character token that is one of one of U+0009
    CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
    TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),-->
@@ -41202,6 +41176,11 @@
     frameset</span>".</p>
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Stop parsing.</span></p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
     <p><span>Parse error</span>. Ignore the token.</p>
@@ -41222,11 +41201,6 @@
 
   <dl class="switch">
 
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing</span>.</p>
-   </dd>
-
    <dt>A comment token</dt>
    <dd>
     <p>Append a <code>Comment</code> node to the <code>Document</code>
@@ -41246,6 +41220,11 @@
     mode</span>.</p>
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Stop parsing</span>.</p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
     <p><span>Parse error</span>. Switch the <span>insertion mode</span>
@@ -41262,11 +41241,6 @@
   mode: after after frameset">after after frameset</span>", tokens must be handled as follows:</p>
 
   <dl class="switch">
-
-   <dt>An end-of-file token</dt>
-   <dd>
-    <p><span>Stop parsing</span>.</p>
-   </dd>
  
    <dt>A comment token</dt>
    <dd>
@@ -41287,6 +41261,11 @@
     mode</span>.</p>
    </dd>
 
+   <dt>An end-of-file token</dt>
+   <dd>
+    <p><span>Stop parsing</span>.</p>
+   </dd>
+
    <dt>Anything else</dt>
    <dd>
     <p><span>Parse error</span>. Switch the <span>insertion mode</span>
@@ -41297,39 +41276,6 @@
   </dl>
 
 
-  <h4>The unexpected end</h4>
-
-  <p>When the user agent is to <dfn title="stop parsing, with
-  prejudice">stops parsing, with prejudice</dfn>, the user agent must
-  follow the steps in this section.</p>
-
-  <ol>
-
-   <li>
-    <p><span>Generate implied end tags.</span></p>
-   </li>
-
-   <li>
-
-    <p>If there are more than two nodes on the <span>stack of open
-    elements</span>, or if there are two nodes but the second node is
-    not a <code>body</code> node, this is a <span>parse
-    error</span>.</p>
-
-    <p>Otherwise, if the parser was originally created as part of the
-    <span>HTML fragment parsing algorithm</span>, and there's more
-    than one element in the <span>stack of open elements</span>, and
-    the second node on the <span>stack of open elements</span> is not
-    a <code>body</code> node, then this is a <span>parse
-    error</span>. (<span>fragment case</span>)</p>
-
-   </li>
-
-   <li><p><span>Stop parsing</span> (see the next section).</p></li>
-
-  </ol>
-
-
   <h4>The end</h4>
 
   <p>Once the user agent <dfn title="stop parsing">stops parsing</dfn>