Bug 112 – Merged phases and insertion modes. Theoretically, this should make absolutely no difference. Please let me know the many ways in which I screwed up.

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 112 - Merged phases and insertion modes. Theoretically, this should make absolutely no difference. Please let me know the many ways in which I screwed up.


Summary:	Merged phases and insertion modes. Theoretically, this should make absolutely...

Status:	RESOLVED FIXED

Product:	Validator.nu
Classification:	Unclassified
Component:	HTML parser
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Henri Sivonen

URL:	http://svn.whatwg.org/webapps/source?...

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-03-03 13:14 CET by Nobody
Modified:	2008-03-07 13:16 CET (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nobody 2008-03-03 13:14:06 CET

Index: source
===================================================================
--- source	(revision 1311)
+++ source	(revision 1312)
@@ -37722,13 +37722,9 @@
   parser is created. The "output" of this stage consists of
   dynamically modifying or extending that document's DOM tree.</p>
 
-  <p>Tree construction passes through several phases. Initially, UAs
-  must act according to the steps described as being those of
-  <span>the initial phase</span>.</p>
-
   <p>This specification does not define when an interactive user agent
-  has to render the <code>Document</code> available to the user, or
-  when it has to begin accepting user input.</p>
+  has to render the <code>Document</code> so that it is available to
+  the user, or when it has to begin accepting user input.</p>
 
   <p>When the steps below require the UA to <dfn>append a
   character</dfn> to a node, the UA must collect it and all subsequent
@@ -37758,35 +37754,35 @@
   concerns</a> will likely force user agents to impose nesting
   depths.</p>
 
-
-  <h5><dfn>The main phase</dfn></h5>
-
-  <p>After <span>the root element phase</span>, each token emitted
-  from the <span>tokenisation</span> stage must be processed as
-  described in <em>this</em> section. This is by far the most involved
-  part of parsing an HTML document.</p>
-
-  <p>The tree construction stage in this phase has several pieces of
-  state: a <span>stack of open elements</span>, a <span>list of active
+  <p>The tree construction stage has several pieces of state: a
+  <span>stack of open elements</span>, a <span>list of active
   formatting elements</span>, a <span><code title="">head</code>
   element pointer</span>, a <span><code title="">form</code> element
   pointer</span>, and an <span>insertion mode</span>.</p>
 
-  <p class="big-issue">We could just fold insertion modes and phases
-  into one concept (and duplicate the two rules common to all
-  insertion modes into all of them).</p>
+  <p>As each token is emitted from the tokeniser, the user agent must
+  process the token according to the rules given in the section
+  corresponding to the current <span>insertion mode</span>.</p>
+
 
+  <h5>The stack of open elements</h5>
 
-  <h6>The stack of open elements</h6>
-
-  <p>Initially the <dfn>stack of open elements</dfn> contains just the
-  <code>html</code> root element node created in the <span title="the
-  root element phase">last phase</span> before switching to
-  <em>this</em> phase (or, in the <span>fragment case</span>, the
-  <code>html</code> element created as part of <span title="html
-  fragment parsing algorithm">that algorithm</span>). That's the
-  topmost node of the stack. It never gets popped off the stack. (This
-  stack grows downwards.)</p>
+  <p>Initially the <dfn>stack of open elements</dfn> is empty.</p>
+
+  <p>The <span title="insertion mode: root element">root element
+  insertion mode</span> creates the <code>html</code> root element
+  node, which is then added to the stack.</p>
+
+  <p>In the <span>fragment case</span>, the <span>stack of open
+  elements</span> is initialised to contain an <code>html</code>
+  element that is created as part of <span title="html fragment
+  parsing algorithm">that algorithm</span>. (The <span>fragment
+  case</span> skips the <span title="insertion mode: root
+  element">root element insertion mode</span>.)</p>
+
+  <p>The <code>html</code> node, however it is created, is the topmost
+  node of the stack. It never gets popped off the stack. (This stack
+  grows downwards.)</p>
 
   <p>The <dfn>current node</dfn> is the bottommost node in this
   stack.</p>
@@ -37903,7 +37899,7 @@
   the stack is manipulated in a random-access fashion.</p>
 
 
-  <h6>The list of active formatting elements</h6>
+  <h5>The list of active formatting elements</h5>
 
   <p>Initially the <dfn>list of active formatting elements</dfn> is
   empty. It is used to handle mis-nested <span
@@ -38001,7 +37997,7 @@
   </ol>
 
 
-  <h6>Creating and inserting HTML elements</h6>
+  <h5>Creating and inserting HTML elements</h5>
 
   <p>When the steps below require the UA to <dfn title="create an
   element for the token">create an element for a token</dfn>, the UA
@@ -38070,7 +38066,7 @@
 
 
 
-  <h6>Closing elements that have implied end tags</h6>
+  <h5>Closing elements that have implied end tags</h5>
 
   <p>When the steps below require the UA to <dfn>generate implied end
   tags</dfn>, then, if the <span>current node</span> is a
@@ -38088,7 +38084,7 @@
   list.</p>
 
 
-  <h6>The element pointers</h6>
+  <h5>The element pointers</h5>
 
   <p>Initially the <dfn><code title="">head</code> element
   pointer</dfn> and the <dfn><code title="">form</code> element
@@ -38105,31 +38101,30 @@
   markup, for historical reasons.</p>
 
 
-  <h6>The insertion mode</h6>
+  <h5>The insertion mode</h5>
 
   <p>Initially the <dfn>insertion mode</dfn> is "<span
-  title="insertion mode: before head">before head</span>". It can
-  change to "<span title="insertion mode: in head">in head</span>",
-  "<span title="insertion mode: in head noscript">in head
-  noscript</span>", "<span title="insertion mode: after head">after
-  head</span>", "<span title="insertion mode: in body">in
-  body</span>", "<span title="insertion mode: in table">in
-  table</span>", "<span title="insertion mode: in caption">in
-  caption</span>", "<span title="insertion mode: in column group">in
-  column group</span>", "<span title="insertion mode: in table
-  body">in table body</span>", "<span title="insertion mode: in
-  row">in row</span>", "<span title="insertion mode: in cell">in
-  cell</span>", "<span title="insertion mode: in select">in
-  select</span>", "<span title="insertion mode: after body">after
-  body</span>", "<span title="insertion mode: in frameset">in
-  frameset</span>", and "<span title="insertion mode: after
-  frameset">after frameset</span>" during the course of the parsing,
-  as described below. It affects how certain tokens are processed.</p>
-
-  <p>If the tree construction stage is switched from <span>the main
-  phase</span> to <span>the trailing end phase</span> and back again,
-  the various pieces of state are not reset; the UA must act as if the
-  state was maintained.</p>
+  title="insertion mode: initial">initial</span>". It can change to
+  "<span title="insertion mode: root element">root element</span>",
+  "<span title="insertion mode: in head">in head</span>", "<span
+  title="insertion mode: in head noscript">in head noscript</span>",
+  "<span title="insertion mode: after head">after head</span>", "<span
+  title="insertion mode: in body">in body</span>", "<span
+  title="insertion mode: in table">in table</span>", "<span
+  title="insertion mode: in caption">in caption</span>", "<span
+  title="insertion mode: in column group">in column group</span>",
+  "<span title="insertion mode: in table body">in table body</span>",
+  "<span title="insertion mode: in row">in row</span>", "<span
+  title="insertion mode: in cell">in cell</span>", "<span
+  title="insertion mode: in select">in select</span>", "<span
+  title="insertion mode: after body">after body</span>", "<span
+  title="insertion mode: in frameset">in frameset</span>", "<span
+  title="insertion mode: after frameset">after frameset</span>",
+  "<span title="insertion mode: after after body">after after
+  body</span>", and "<span title="insertion mode: after after
+  frameset">after after frameset</span>" during the course of the
+  parsing, as described below. It affects how certain tokens are
+  processed.</p>
 
   <p>When the steps below require the UA to <dfn>reset the insertion
   mode appropriately</dfn>, it means the UA must follow these
@@ -38277,12 +38272,11 @@
 
   </ol>
 -->
-`
 
-  <h5><dfn>The initial phase</dfn></h5>
 
-  <p>Initially, the tree construction stage must handle each token
-  emitted from the <span>tokenisation</span> stage as follows:</p>
+  <h5>The <dfn title="insertion mode: initial">initial</dfn> insertion mode</h5>
+
+  <p>Handle the token as follows:</p>
 
   <dl class="switch">
 
@@ -38428,8 +38422,8 @@
     be compared to the values given in the lists above in a
     case-insensitive<!-- ASCII --> manner.</p>
 
-    <p>Then, switch to <span>the root element phase</span> of the tree
-    construction stage.</p>
+    <p>Then, change the <span>insertion mode</span> to "<span
+    title="insertion mode: root element">root element</span>".</p>
 
    </dd>
 
@@ -38445,19 +38439,17 @@
 
     <p>Set the document to <span>quirks mode</span>.</p>
 
-    <p>Then, switch to <span>the root element phase</span> of the tree
-    construction stage and reprocess the current token.</p>
+    <p>Then, change the <span>insertion mode</span> to "<span
+    title="insertion mode: root element">root element</span>".</p>
 
    </dd>
 
   </dl>
 
 
-  <h5><dfn>The root element phase</dfn></h5>
+  <h5>The <dfn title="insertion mode: root element">root element</dfn> insertion mode</h5>
 
-  <p>After <span>the initial phase</span>, as each token is emitted
-  from the <span>tokenisation</span> stage, it must be processed as
-  described in this section.</p>
+  <p>Handle the token as follows:</p>
 
   <dl class="switch">
 
@@ -38499,8 +38491,10 @@
 
     <p>Create an <code>HTMLElement</code> node with the tag name
     <code>html</code>, in the <span>HTML namespace</span>. Append it
-    to the <code>Document</code> object. Switch to <span>the main
-    phase</span> and reprocess the current token.</p>
+    to the <code>Document</code> object.</p>
+
+    <p>Change the <span>insertion mode</span> to "<span
+    title="insertion mode: before head">before head</span>".</p>
 
     <p class="big-issue">Should probably make end tags be ignored, so
     that "&lt;/head>&lt;!-- -->&lt;html>" puts the comment before the
@@ -40911,8 +40905,9 @@
     be an <code>html</code> element in this case.)
     (<span>fragment case</span>)</p>
 
-    <p>Otherwise, switch to <span>the trailing end
-    phase</span>.</p>
+    <p>Then, change the <span>insertion mode</span> to "<span
+    title="insertion mode: after after body">after after
+    body</span>".</p>
 
    </dd>
 
@@ -41056,7 +41051,9 @@
 
    <dt>An end tag whose tag name is "html"</dt>
    <dd>
-    <p>Switch to <span>the trailing end phase</span>.</p>
+    <p>Change the <span>insertion mode</span> to "<span
+    title="insertion mode: after after frameset">after after
+    frameset</span>".</p>
    </dd>
 
    <dt>A start tag whose tag name is "noframes"</dt>
@@ -41078,17 +41075,15 @@
   harder.</p>
 
 
-  <h5><dfn>The trailing end phase</dfn></h5>
+  <h5>The <dfn title="insertion mode: after after body">after after body</dfn> insertion mode</h5>
 
-  <p>After <span>the main phase</span>, as each token is emitted from
-  the <span>tokenisation</span> stage, it must be processed as
-  described in this section.</p>
+  <p>Handle the token as follows:</p>
 
   <dl class="switch">
 
-   <dt>A DOCTYPE token</dt>
+   <dt>An end-of-file token</dt>
    <dd>
-    <p><span>Parse error</span>. Ignore the token.</p>
+    <p><span>Stop parsing</span>.</p>
    </dd>
 
    <dt>A comment token</dt>
@@ -41098,32 +41093,62 @@
     data given in the comment token.</p>
    </dd>
 
+   <dt>A DOCTYPE token</dt>
    <dt>A character token that is one of one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
    FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
    SPACE</dt>
+   <dt>A start tag whose tag name is "html"</dt>
    <dd>
-    <p>Process the token as it would be processed in <span>the main
-    phase</span>.</p> <!-- if there was a <body>, the space will go
-    into it, otherwise (e.g. if there was a <frameset>) it'll go into
-    the <html> node (this is important in case we have "foo</html>
-    bar", as we don't want that to become one word) -->
+    <p>Process the token as if the <span>insertion mode</span> had
+    been "<span title="insertion mode: in body">in body</span>".</p>
    </dd>
 
-   <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
-   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
-   <dt>A start tag token</dt>
-   <dt>An end tag token</dt>
+   <dt>Anything else</dt>
    <dd>
-    <p><span>Parse error</span>. Switch back to <span>the main
-    phase</span> and reprocess the token.</p>
+    <p><span>Parse error</span>. Set the <span>insertion mode</span>
+    to "<span title="insertion mode: in body">in body</span>" and
+    reprocess the token.</p>
    </dd>
 
+  </dl>
+
+
+  <h5>The <dfn title="insertion mode: after after frameset">after after frameset</dfn> insertion mode</h5>
+
+  <p>Handle the token as follows:</p>
+
+  <dl class="switch">
+
    <dt>An end-of-file token</dt>
    <dd>
     <p><span>Stop parsing</span>.</p>
    </dd>
+ 
+   <dt>A comment token</dt>
+   <dd>
+    <p>Append a <code>Comment</code> node to the <code>Document</code>
+    object with the <code title="">data</code> attribute set to the
+    data given in the comment token.</p>
+   </dd>
+
+   <dt>A DOCTYPE token</dt>
+   <dt>A character token that is one of one of U+0009 CHARACTER
+   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
+   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+   SPACE</dt>
+   <dt>A start tag whose tag name is "html"</dt>
+   <dd>
+    <p>Process the token as if the <span>insertion mode</span> had
+    been "<span title="insertion mode: in body">in body</span>".</p>
+   </dd>
+
+   <dt>Anything else</dt>
+   <dd>
+    <p><span>Parse error</span>. Set the <span>insertion mode</span>
+    to "<span title="insertion mode: in frameset">in frameset</span>" and
+    reprocess the token.</p>
+   </dd>
 
   </dl>
 
@@ -41590,13 +41615,6 @@
 
    <li>
 
-    <p>Switch the <span>HTML parser</span>'s <span>tree
-    construction</span> stage to <span>the main phase</span>.
-
-   </li>
-
-   <li>
-
     <p>Let <var title="">root</var> be a new <code>html</code> element
     with no attributes.</p>