Bugzilla – Bug 112
Merged phases and insertion modes. Theoretically, this should make absolutely no difference. Please let me know the many ways in which I screwed up.
Last modified: 2008-03-07 13:16:24 CET
Index: source =================================================================== --- source (revision 1311) +++ source (revision 1312) @@ -37722,13 +37722,9 @@ parser is created. The "output" of this stage consists of dynamically modifying or extending that document's DOM tree.</p> - <p>Tree construction passes through several phases. Initially, UAs - must act according to the steps described as being those of - <span>the initial phase</span>.</p> - <p>This specification does not define when an interactive user agent - has to render the <code>Document</code> available to the user, or - when it has to begin accepting user input.</p> + has to render the <code>Document</code> so that it is available to + the user, or when it has to begin accepting user input.</p> <p>When the steps below require the UA to <dfn>append a character</dfn> to a node, the UA must collect it and all subsequent @@ -37758,35 +37754,35 @@ concerns</a> will likely force user agents to impose nesting depths.</p> - - <h5><dfn>The main phase</dfn></h5> - - <p>After <span>the root element phase</span>, each token emitted - from the <span>tokenisation</span> stage must be processed as - described in <em>this</em> section. This is by far the most involved - part of parsing an HTML document.</p> - - <p>The tree construction stage in this phase has several pieces of - state: a <span>stack of open elements</span>, a <span>list of active + <p>The tree construction stage has several pieces of state: a + <span>stack of open elements</span>, a <span>list of active formatting elements</span>, a <span><code title="">head</code> element pointer</span>, a <span><code title="">form</code> element pointer</span>, and an <span>insertion mode</span>.</p> - <p class="big-issue">We could just fold insertion modes and phases - into one concept (and duplicate the two rules common to all - insertion modes into all of them).</p> + <p>As each token is emitted from the tokeniser, the user agent must + process the token according to the rules given in the section + corresponding to the current <span>insertion mode</span>.</p> + + <h5>The stack of open elements</h5> - <h6>The stack of open elements</h6> - - <p>Initially the <dfn>stack of open elements</dfn> contains just the - <code>html</code> root element node created in the <span title="the - root element phase">last phase</span> before switching to - <em>this</em> phase (or, in the <span>fragment case</span>, the - <code>html</code> element created as part of <span title="html - fragment parsing algorithm">that algorithm</span>). That's the - topmost node of the stack. It never gets popped off the stack. (This - stack grows downwards.)</p> + <p>Initially the <dfn>stack of open elements</dfn> is empty.</p> + + <p>The <span title="insertion mode: root element">root element + insertion mode</span> creates the <code>html</code> root element + node, which is then added to the stack.</p> + + <p>In the <span>fragment case</span>, the <span>stack of open + elements</span> is initialised to contain an <code>html</code> + element that is created as part of <span title="html fragment + parsing algorithm">that algorithm</span>. (The <span>fragment + case</span> skips the <span title="insertion mode: root + element">root element insertion mode</span>.)</p> + + <p>The <code>html</code> node, however it is created, is the topmost + node of the stack. It never gets popped off the stack. (This stack + grows downwards.)</p> <p>The <dfn>current node</dfn> is the bottommost node in this stack.</p> @@ -37903,7 +37899,7 @@ the stack is manipulated in a random-access fashion.</p> - <h6>The list of active formatting elements</h6> + <h5>The list of active formatting elements</h5> <p>Initially the <dfn>list of active formatting elements</dfn> is empty. It is used to handle mis-nested <span @@ -38001,7 +37997,7 @@ </ol> - <h6>Creating and inserting HTML elements</h6> + <h5>Creating and inserting HTML elements</h5> <p>When the steps below require the UA to <dfn title="create an element for the token">create an element for a token</dfn>, the UA @@ -38070,7 +38066,7 @@ - <h6>Closing elements that have implied end tags</h6> + <h5>Closing elements that have implied end tags</h5> <p>When the steps below require the UA to <dfn>generate implied end tags</dfn>, then, if the <span>current node</span> is a @@ -38088,7 +38084,7 @@ list.</p> - <h6>The element pointers</h6> + <h5>The element pointers</h5> <p>Initially the <dfn><code title="">head</code> element pointer</dfn> and the <dfn><code title="">form</code> element @@ -38105,31 +38101,30 @@ markup, for historical reasons.</p> - <h6>The insertion mode</h6> + <h5>The insertion mode</h5> <p>Initially the <dfn>insertion mode</dfn> is "<span - title="insertion mode: before head">before head</span>". It can - change to "<span title="insertion mode: in head">in head</span>", - "<span title="insertion mode: in head noscript">in head - noscript</span>", "<span title="insertion mode: after head">after - head</span>", "<span title="insertion mode: in body">in - body</span>", "<span title="insertion mode: in table">in - table</span>", "<span title="insertion mode: in caption">in - caption</span>", "<span title="insertion mode: in column group">in - column group</span>", "<span title="insertion mode: in table - body">in table body</span>", "<span title="insertion mode: in - row">in row</span>", "<span title="insertion mode: in cell">in - cell</span>", "<span title="insertion mode: in select">in - select</span>", "<span title="insertion mode: after body">after - body</span>", "<span title="insertion mode: in frameset">in - frameset</span>", and "<span title="insertion mode: after - frameset">after frameset</span>" during the course of the parsing, - as described below. It affects how certain tokens are processed.</p> - - <p>If the tree construction stage is switched from <span>the main - phase</span> to <span>the trailing end phase</span> and back again, - the various pieces of state are not reset; the UA must act as if the - state was maintained.</p> + title="insertion mode: initial">initial</span>". It can change to + "<span title="insertion mode: root element">root element</span>", + "<span title="insertion mode: in head">in head</span>", "<span + title="insertion mode: in head noscript">in head noscript</span>", + "<span title="insertion mode: after head">after head</span>", "<span + title="insertion mode: in body">in body</span>", "<span + title="insertion mode: in table">in table</span>", "<span + title="insertion mode: in caption">in caption</span>", "<span + title="insertion mode: in column group">in column group</span>", + "<span title="insertion mode: in table body">in table body</span>", + "<span title="insertion mode: in row">in row</span>", "<span + title="insertion mode: in cell">in cell</span>", "<span + title="insertion mode: in select">in select</span>", "<span + title="insertion mode: after body">after body</span>", "<span + title="insertion mode: in frameset">in frameset</span>", "<span + title="insertion mode: after frameset">after frameset</span>", + "<span title="insertion mode: after after body">after after + body</span>", and "<span title="insertion mode: after after + frameset">after after frameset</span>" during the course of the + parsing, as described below. It affects how certain tokens are + processed.</p> <p>When the steps below require the UA to <dfn>reset the insertion mode appropriately</dfn>, it means the UA must follow these @@ -38277,12 +38272,11 @@ </ol> --> -` - <h5><dfn>The initial phase</dfn></h5> - <p>Initially, the tree construction stage must handle each token - emitted from the <span>tokenisation</span> stage as follows:</p> + <h5>The <dfn title="insertion mode: initial">initial</dfn> insertion mode</h5> + + <p>Handle the token as follows:</p> <dl class="switch"> @@ -38428,8 +38422,8 @@ be compared to the values given in the lists above in a case-insensitive<!-- ASCII --> manner.</p> - <p>Then, switch to <span>the root element phase</span> of the tree - construction stage.</p> + <p>Then, change the <span>insertion mode</span> to "<span + title="insertion mode: root element">root element</span>".</p> </dd> @@ -38445,19 +38439,17 @@ <p>Set the document to <span>quirks mode</span>.</p> - <p>Then, switch to <span>the root element phase</span> of the tree - construction stage and reprocess the current token.</p> + <p>Then, change the <span>insertion mode</span> to "<span + title="insertion mode: root element">root element</span>".</p> </dd> </dl> - <h5><dfn>The root element phase</dfn></h5> + <h5>The <dfn title="insertion mode: root element">root element</dfn> insertion mode</h5> - <p>After <span>the initial phase</span>, as each token is emitted - from the <span>tokenisation</span> stage, it must be processed as - described in this section.</p> + <p>Handle the token as follows:</p> <dl class="switch"> @@ -38499,8 +38491,10 @@ <p>Create an <code>HTMLElement</code> node with the tag name <code>html</code>, in the <span>HTML namespace</span>. Append it - to the <code>Document</code> object. Switch to <span>the main - phase</span> and reprocess the current token.</p> + to the <code>Document</code> object.</p> + + <p>Change the <span>insertion mode</span> to "<span + title="insertion mode: before head">before head</span>".</p> <p class="big-issue">Should probably make end tags be ignored, so that "</head><!-- --><html>" puts the comment before the @@ -40911,8 +40905,9 @@ be an <code>html</code> element in this case.) (<span>fragment case</span>)</p> - <p>Otherwise, switch to <span>the trailing end - phase</span>.</p> + <p>Then, change the <span>insertion mode</span> to "<span + title="insertion mode: after after body">after after + body</span>".</p> </dd> @@ -41056,7 +41051,9 @@ <dt>An end tag whose tag name is "html"</dt> <dd> - <p>Switch to <span>the trailing end phase</span>.</p> + <p>Change the <span>insertion mode</span> to "<span + title="insertion mode: after after frameset">after after + frameset</span>".</p> </dd> <dt>A start tag whose tag name is "noframes"</dt> @@ -41078,17 +41075,15 @@ harder.</p> - <h5><dfn>The trailing end phase</dfn></h5> + <h5>The <dfn title="insertion mode: after after body">after after body</dfn> insertion mode</h5> - <p>After <span>the main phase</span>, as each token is emitted from - the <span>tokenisation</span> stage, it must be processed as - described in this section.</p> + <p>Handle the token as follows:</p> <dl class="switch"> - <dt>A DOCTYPE token</dt> + <dt>An end-of-file token</dt> <dd> - <p><span>Parse error</span>. Ignore the token.</p> + <p><span>Stop parsing</span>.</p> </dd> <dt>A comment token</dt> @@ -41098,32 +41093,62 @@ data given in the comment token.</p> </dd> + <dt>A DOCTYPE token</dt> <dt>A character token that is one of one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt> + <dt>A start tag whose tag name is "html"</dt> <dd> - <p>Process the token as it would be processed in <span>the main - phase</span>.</p> <!-- if there was a <body>, the space will go - into it, otherwise (e.g. if there was a <frameset>) it'll go into - the <html> node (this is important in case we have "foo</html> - bar", as we don't want that to become one word) --> + <p>Process the token as if the <span>insertion mode</span> had + been "<span title="insertion mode: in body">in body</span>".</p> </dd> - <dt>A character token that is <em>not</em> one of U+0009 CHARACTER - TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C - FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt> - <dt>A start tag token</dt> - <dt>An end tag token</dt> + <dt>Anything else</dt> <dd> - <p><span>Parse error</span>. Switch back to <span>the main - phase</span> and reprocess the token.</p> + <p><span>Parse error</span>. Set the <span>insertion mode</span> + to "<span title="insertion mode: in body">in body</span>" and + reprocess the token.</p> </dd> + </dl> + + + <h5>The <dfn title="insertion mode: after after frameset">after after frameset</dfn> insertion mode</h5> + + <p>Handle the token as follows:</p> + + <dl class="switch"> + <dt>An end-of-file token</dt> <dd> <p><span>Stop parsing</span>.</p> </dd> + + <dt>A comment token</dt> + <dd> + <p>Append a <code>Comment</code> node to the <code>Document</code> + object with the <code title="">data</code> attribute set to the + data given in the comment token.</p> + </dd> + + <dt>A DOCTYPE token</dt> + <dt>A character token that is one of one of U+0009 CHARACTER + TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C + FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 + SPACE</dt> + <dt>A start tag whose tag name is "html"</dt> + <dd> + <p>Process the token as if the <span>insertion mode</span> had + been "<span title="insertion mode: in body">in body</span>".</p> + </dd> + + <dt>Anything else</dt> + <dd> + <p><span>Parse error</span>. Set the <span>insertion mode</span> + to "<span title="insertion mode: in frameset">in frameset</span>" and + reprocess the token.</p> + </dd> </dl> @@ -41590,13 +41615,6 @@ <li> - <p>Switch the <span>HTML parser</span>'s <span>tree - construction</span> stage to <span>the main phase</span>. - - </li> - - <li> - <p>Let <var title="">root</var> be a new <code>html</code> element with no attributes.</p>