Bugzilla – Bug 160
MathML and SVG support in text/html: the parsing infrastructure
Last modified: 2008-06-25 13:46:10 CEST
Index: source =================================================================== --- source (revision 1403) +++ source (revision 1404) @@ -37035,9 +37035,10 @@ title="insertion mode: in cell">in cell</span>", "<span title="insertion mode: in select">in select</span>", "<span title="insertion mode: in select in table">in select in - table</span>", "<span title="insertion mode: after body">after - body</span>", "<span title="insertion mode: in frameset">in - frameset</span>", "<span title="insertion mode: after + table</span>", "<span title="insertion mode: in foreign content">in + foreign content</span>", "<span title="insertion mode: after + body">after body</span>", "<span title="insertion mode: in + frameset">in frameset</span>", "<span title="insertion mode: after frameset">after frameset</span>", "<span title="insertion mode: after after body">after after body</span>", and "<span title="insertion mode: after after frameset">after after @@ -37061,6 +37062,13 @@ mode</span> unchanged (unless the rules in that section themselves switch the <span>insertion mode</span>).</p> + <p>When the insertion mode is switched to "<span title="insertion + mode: in foreign content">in foreign content</span>", the + <dfn>secondary insertion mode</dfn> is also set. This secondary mode + is used within the rules for the "<span title="insertion mode: in + foreign content">in foreign content</span>" mode to handle HTML + (i.e. not foreign) content.</p> + <p>When the steps below require the UA to <dfn>reset the insertion mode appropriately</dfn>, it means the UA must follow these @@ -37079,7 +37087,12 @@ fragment parsing algorithm</span> is neither a <code>td</code> element nor a <code>th</code> element, then set <var title="">node</var> to the <var title="">context</var> - element. (<span>fragment case</span>)</li> + element. (<span>fragment case</span>)</li> <!-- XXX this fails to + actually set "node" to something useful in the <td> case, which we + want (either body (or nothing, to hit the last clause in this list) + or td/th; in the former case, remove the redundant 'fragment case' + bits in the 'in cell' section, in the latter case, check that + td.innerHTML = "<caption>" works as expected by browsers) --> <li>If <var title="">node</var> is a <code>select</code> element, then switch the <span>insertion mode</span> to "<span @@ -37115,6 +37128,12 @@ title="insertion mode: in table">in table</span>" and abort these steps.</li> + <li>If <var title="">node</var> is an element from the <span>MathML + namespace</span> or the <span>SVG namespace</span>, then switch the + <span>insertion mode</span> to "<span title="insertion mode: in + foreign content">in foreign content</span>" and abort these + steps.</li> + <li>If <var title="">node</var> is a <code>head</code> element, then switch the <span>insertion mode</span> to "<span title="insertion mode: in body">in body</span>" ("<span @@ -37450,7 +37469,8 @@ state. In the RCDATA and CDATA states, a further <dfn>escape flag</dfn> is used to control the behaviour of the tokeniser. It is either true or false, and initially must be set to the false - state.</p> + state. The <span>insertion mode</span> and the <span>stack of open + elements</span> also affects tokenisation.</p> <p>The output of the tokenisation step is a series of zero or more of the following tokens: DOCTYPE, start tag, end tag, comment, @@ -37460,8 +37480,11 @@ identifier, and system identifier must be marked as missing (which is a distinct state from the empty string), and the <i>force-quirks flag</i> must be set to <i>off</i> (its other state is - <i>on</i>). Start and end tag tokens have a tag name and a list of - attributes, each of which has a name and a value. Comment and + <i>on</i>). Start and end tag tokens have a tag name, a + <i>self-closing flag</i>, and a list of attributes, each of which + has a name and a value. When a DOCTYPE token is created, its + <i>self-closing flag</i> must be unset (its other state is that it + be set), and its attributes list must be empty. Comment and character tokens have data.</p> <p>When a token is emitted, it must immediately be handled by the @@ -37472,17 +37495,19 @@ using the <span>dynamic markup insertion</span> APIs to insert characters into the stream being tokenised.)</p> + <p>When a start tag token is emitted with its <i>self-closing + flag</i> set, if the flag is not <dfn title="acknowledge + self-closing flag">acknowledged</dfn> when it is processed by the + tree construction stage, that is a <span>parse error</span>.</p> + <p>When an end tag token is emitted, the <span>content model flag</span> must be switched to the PCDATA state.</p> <p>When an end tag token is emitted with attributes, that is a <span>parse error</span>.</p> - <p>A <dfn>permitted slash</dfn> is a U+002F SOLIDUS character that - is immediately followed by a U+003E GREATER-THAN SIGN, if, and only - if, the current token being processed is a start tag token whose tag - name is the same as the tag name of one of <span>void - elements</span>.</p> + <p>When an end tag token is emitted with its <i>self-closing + flag</i> set, that is a <span>parse error</span>.</p> <p>Before each step of the tokeniser, the user agent may check to see if either one of the scripts in the <span>list of scripts that @@ -37756,9 +37781,7 @@ state</span>.</dd> <dt>U+002F SOLIDUS (/)</dt> - <dd><span>Parse error</span> unless this is a <span>permitted - slash</span>. Switch to the <span>before attribute name - state</span>.</dd> + <dd>Switch to the <span>self-closing start tag state</span>.</dd> <dt>Anything else</dt> <dd>Append the current input character to the current tag token's @@ -37796,9 +37819,7 @@ state</span>.</dd> <dt>U+002F SOLIDUS (/)</dt> - <dd><span>Parse error</span> unless this is a <span>permitted - slash</span>. Stay in the <span>before attribute name - state</span>.</dd> + <dd>Switch to the <span>self-closing start tag state</span>.</dd> <dt>U+0022 QUOTATION MARK (")</dt> <dt>U+0027 APOSTROPHE (')</dt> @@ -37851,9 +37872,7 @@ state</span>.</dd> <dt>U+002F SOLIDUS (/)</dt> - <dd><span>Parse error</span> unless this is a <span>permitted - slash</span>. Switch to the <span>before attribute name - state</span>.</dd> + <dd>Switch to the <span>self-closing start tag state</span>.</dd> <dt>U+0022 QUOTATION MARK (")</dt> <dt>U+0027 APOSTROPHE (')</dt> @@ -37912,9 +37931,7 @@ state</span>.</dd> <dt>U+002F SOLIDUS (/)</dt> - <dd><span>Parse error</span> unless this is a <span>permitted - slash</span>. Switch to the <span>before attribute name - state</span>.</dd> + <dd>Switch to the <span>self-closing start tag state</span>.</dd> <dt>EOF</dt> <dd><span>Parse error</span>. Emit the current tag @@ -38123,10 +38140,29 @@ state</span>.</dd> <dt>U+002F SOLIDUS (/)</dt> - <dd><span>Parse error</span> unless this is a <span>permitted - slash</span>. Switch to the <span>before attribute name - state</span>.</dd> + <dd>Switch to the <span>self-closing start tag state</span>.</dd> + + <dt>Anything else</dt> + <dd><span>Parse error</span>. Reconsume the character in + the <span>before attribute name state</span>.</dd> + + </dl> + + </dd> + + <dt><dfn>Self-closing start tag state</dfn></dt> + <dd> + + <p>Consume the <span>next input character</span>:</p> + + <dl class="switch"> + + <dt>U+003E GREATER-THAN SIGN (>)</dt> + <dd>Set the <i>self-closing flag</i> of the current tag + token. Emit the current tag token. Switch to the <span>data + state</span>.</dd> + <dt>Anything else</dt> <dd><span>Parse error</span>. Reconsume the character in the <span>before attribute name state</span>.</dd> @@ -38171,11 +38207,35 @@ whose data is the empty string, and switch to the <span>comment start state</span>.</p> - <p>Otherwise if the next seven characters are a + <p>Otherwise, if the next seven characters are a <span>case-insensitive</span><!-- XXX xref, ascii only --> match for the word "DOCTYPE", then consume those characters and switch to the <span>DOCTYPE state</span>.</p> + <p>Otherwise, if the <span>insertion mode</span> is "<span + title="insertion mode: in foreign content">in foreign + content</span>" and the <span>current node</span> is not one of + the following:</p> + + <ul> + <li>An <code>mi</code> element in the <span>MathML namespace</span>.</li> + <li>An <code>mo</code> element in the <span>MathML namespace</span>.</li> + <li>An <code>mn</code> element in the <span>MathML namespace</span>.</li> + <li>An <code>ms</code> element in the <span>MathML namespace</span>.</li> + <li>An <code>mtext</code> element in the <span>MathML namespace</span>.</li> + <li>A <code>foreignObject</code> element in the <span>SVG namespace</span>.</li> + <li>A <code>desc</code> element in the <span>SVG namespace</span>.</li> + <li>A <code>title</code> element in the <span>SVG namespace</span>.</li> + </ul> + + <p>...and the next seven characters are a + <span>case-sensitive</span><!-- XXX xref, ascii only --> match for + the string "[CDATA[" (the five uppercase letters "CDATA" with a + U+005B LEFT SQUARE BRACKET character before and after), then + consume those characters and switch to the <span>CDATA block + state</span> (which is unrelated to the <span>content model + flag</span>'s CDATA state).</p> + <p>Otherwise, this is a <span>parse error</span>. Switch to the <span>bogus comment state</span>. The next character that is consumed, if any, is the first character that will be in the @@ -38778,6 +38838,29 @@ </dd> + <dt><dfn>CDATA state</dfn></dt> + + <dd> + + <p><em>(This can only happen if the <span>content model + flag</span> is set to the PCDATA state, and is unrelated to the + <span>content model flag</span>'s CDATA state.)</em></p> + + <p>Consume every character up to the next occurrence of the three + character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE + BRACKET U+003E GREATER-THAN SIGN (<code title="">]]></code>), or + the end of the file (EOF), whichever comes first. Emit a series of + text tokens consisting of all the characters consumed except the + matching three character sequence at the end (if one was found + before the end of the file).</p> + + <p>Switch to the <span>data state</span>.</p> + + <p>If the end of the file was reached, reconsume the EOF + character.</p> + + </dd> + </dl> @@ -39020,33 +39103,46 @@ depths.</p> - <h5>Creating and inserting HTML elements</h5> + <h5>Creating and inserting elements</h5> <p>When the steps below require the UA to <dfn title="create an - element for the token">create an element for a token</dfn>, the UA - must create a node implementing the interface appropriate for the - element type corresponding to the tag name of the token (as given in - the section of this specification that defines that element, - e.g. for an <code>a</code> element it would be the - <code>HTMLAnchorElement</code> interface), with the tag name being - the name of that element, with the node being in the <span>HTML - namespace</span>, and with the attributes on the node being those + element for the token">create an element for a token</dfn> in a + particular namespace, the UA must create a node implementing the + interface appropriate for the element type corresponding to the tag + name of the token in the given namespace (as given in the + specification that defines that element, e.g. for an <code>a</code> + element in the <span>HTML namespace</span>, this specification + defines it to be the <code>HTMLAnchorElement</code> interface), with + the tag name being the name of that element, with the node being in + the given namespace, and with the attributes on the node being those given in the given token.</p> + <p>The interface appropriate for an element in the <span>HTML + namespace</span> that is not defined in this specification is + <code>HTMLElement</code>. The interface appropriate for an element + in another namespace that is not defined by that namespace's + specification is <code>Element</code>.</p> + <p>When the steps below require the UA to <dfn>insert an HTML element</dfn> for a token, the UA must first <span>create an element - for the token</span>, and then append this node to the <span>current - node</span>, and push it onto the <span>stack of open - elements</span> so that it is the new <span>current node</span>.</p> + for the token</span> in the <span>HTML namespace</span>, and then + append this node to the <span>current node</span>, and push it onto + the <span>stack of open elements</span> so that it is the new + <span>current node</span>.</p> <p>The steps below may also require that the UA insert an HTML - element in a particular place, in which case the UA must - <span>create an element for the token</span> and then insert or - append the new node in the location specified. (This happens in - particular during the parsing of tables with invalid content.)</p> + element in a particular place, in which case the UA must follow the + same steps except that it must insert or append the new node in the + location specified insead of appending it to the <span>current + node</span>. (This happens in particular during the parsing of + tables with invalid content.)</p> - <p>The interface appropriate for an element that is not defined in - this specification is <code>HTMLElement</code>.</p> + <p>When the steps below require the UA to <dfn>insert a foreign + element</dfn> for a token, the UA must first <span>create an element + for the token</span> in the given namespace, and then append this + node to the <span>current node</span>, and push it onto the + <span>stack of open elements</span> so that it is the new + <span>current node</span>.</p> <p>The <dfn>generic CDATA parsing algorithm</dfn> and the <dfn>generic RCDATA parsing algorithm</dfn> consist of the following @@ -39301,7 +39397,8 @@ <dt>A start tag whose tag name is "html"</dt> <dd> - <p><span>Create an element for the token</span>. Append it to the + <p><span>Create an element for the token</span>, using the + <span>HTML namespace</span>. Append it to the <code>Document</code> object. Put this element in the <span>stack of open elements</span>.</p> @@ -39385,13 +39482,10 @@ <dt>A start tag whose tag name is "head"</dt> <dd> - <p><span>Create an element for the token</span>.</p> - - <p>Set the <span><code title="">head</code> element - pointer</span> to this new element node.</p> + <p><span>Insert an HTML element</span> for the token.</p> - <p>Append the new element to the <span>current node</span> and - push it onto the <span>stack of open elements</span>.</p> + <p>Set the <span><code title="">head</code> element pointer</span> + to the newly created <code>head</code> element.</p> <p>Switch the <span>insertion mode</span> to "<span title="insertion mode: in head">in head</span>".</p> @@ -39470,17 +39564,26 @@ <dt>A start tag whose tag name is one of: "base", "link"</dt> <dd> + <p><span>Insert an HTML element</span> for the token. Immediately pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + </dd> <dt>A start tag whose tag name is "meta"</dt> <dd> + <p><span>Insert an HTML element</span> for the token. Immediately pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + <p id="meta-charset-during-parse">If the element has a <code title="attr-meta-charset">charset</code> attribute, and its value is a supported encoding, and the <span @@ -39526,7 +39629,8 @@ <dt id="scriptTag">A start tag whose tag name is "script"</dt> <dd> - <p><span>Create an element for the token</span>.</p> + <p><span>Create an element for the token</span> in the <span>HTML + namespace</span>.</p> <p>Mark the element as being <span>"parser-inserted"</span>. This ensures that, if the @@ -40638,6 +40742,9 @@ pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + </dd> <dt>A start tag whose tag name is "hr"</dt> @@ -40652,6 +40759,9 @@ pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + </dd> <dt>A start tag whose tag name is "image"</dt> @@ -40668,17 +40778,19 @@ <p><span>Reconstruct the active formatting elements</span>, if any.</p> - <p><span>Insert an HTML element</span> for the token.</p> + <p><span>Insert an HTML element</span> for the token. Immediately + pop the <span>current node</span> off the <span>stack of open + elements</span>.</p> - <p>If the <span><code title="">form</code> element - pointer</span> is not null, then <span>associate</span><!--XXX - xref! --> the <code>input</code> element with the + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + + <p>If the <span><code title="">form</code> element pointer</span> + is not null, then <span>associate</span><!--XXX xref! --> the + newly created <code>input</code> element with the <code>form</code> element pointed to by the <span><code title="">form</code> element pointer</span>.</p> - <p>Pop that <code>input</code> element off the <span>stack of - open elements</span>.</p> - </dd> <dt id="isindex">A start tag whose tag name is "isindex"</dt> @@ -40759,17 +40871,16 @@ <dt>A start tag whose tag name is "textarea"</dt> <dd> - <p><span>Create an element for the token</span>.</p> + <p><span>Create an element for the token</span>, in the + <span>HTML namespace</span>. Append the new element to the + <span>current node</span>.</p> - <p>If the <span><code title="">form</code> element - pointer</span> is not null, then <span>associate</span><!--XXX - xref! --> the <code>textarea</code> element with the + <p>If the <span><code title="">form</code> element pointer</span> + is not null, then <span>associate</span><!--XXX xref! --> the + newly created <code>textarea</code> element with the <code>form</code> element pointed to by the <span><code title="">form</code> element pointer</span>.</p> - <p>Append the new element to the <span>current - node</span>.</p> - <p>Switch the tokeniser's <span>content model flag</span> to the RCDATA state.</p> @@ -40838,6 +40949,48 @@ the tag name "br" had been seen. Ignore the end tag token.</p> </dd> + <dt>A start tag whose tag name is "math"</dt> + <dd> + + <p><span>Reconstruct the active formatting elements</span>, if + any.</p> + + <p><span>Insert a foreign element</span> for the token, in the + <span>MathML namespace</span>.</p> + + <p>If the token has its <i>self-closing flag</i> set, pop the + <span>current node</span> off the <span>stack of open + elements</span> and <span title="acknowledge self-closing + flag">acknowledge the token's <i>self-closing flag</i></span>.</p> + + <p>Otherwise, let the <span>secondary insertion mode</span> be the + current <span>insertion mode</span>, and then switch the + <span>insertion mode</span> to "<span title="insertion mode: in + foreign content">in foreign content</span>".</p> + + </dd> + + <dt>A start tag whose tag name is "svg"</dt> + <dd> + + <p><span>Reconstruct the active formatting elements</span>, if + any.</p> + + <p><span>Insert a foreign element</span> for the token, in the + <span>SVG namespace</span>.</p> + + <p>If the token has its <i>self-closing flag</i> set, pop the + <span>current node</span> off the <span>stack of open + elements</span> and <span title="acknowledge self-closing + flag">acknowledge the token's <i>self-closing flag</i></span>.</p> + + <p>Otherwise, let the <span>secondary insertion mode</span> be the + current <span>insertion mode</span>, and then switch the + <span>insertion mode</span> to "<span title="insertion mode: in + foreign content">in foreign content</span>".</p> + + </dd> + <dt>A start or end tag whose tag name is one of: "caption", "col", "colgroup", "frame", "frameset", "head", "option", "optgroup", "tbody", "td", "tfoot", "th", "thead", "tr"</dt> @@ -41267,9 +41420,14 @@ <dt>A start tag whose tag name is "col"</dt> <dd> + <p><span>Insert an HTML element</span> for the token. Immediately pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + </dd> <dt>An end tag whose tag name is "colgroup"</dt> @@ -41760,7 +41918,7 @@ </dl> - <h5 id="parsing-main-inselect">The "<dfn title="insertion mode: in select in table">in select in table</dfn>" insertion mode</h5> + <h5 id="parsing-main-inselectintable">The "<dfn title="insertion mode: in select in table">in select in table</dfn>" insertion mode</h5> <p>When the <span>insertion mode</span> is "<span title="insertion mode: in select in table">in select in table</span>", tokens must be handled as follows:</p> @@ -41798,6 +41956,89 @@ </dl> + <h5 id="parsing-main-inforeign">The "<dfn title="insertion mode: in foreign content">in foreign content</dfn>" insertion mode</h5> + + <p>When the <span>insertion mode</span> is "<span title="insertion + mode: in foreign content">in foreign content</span>", tokens must be + handled as follows:</p> + + <dl class="switch"> + + <dt>A character token</dt> + <dd> + <p><span title="insert a character">Insert the token's + character</span> into the <span>current node</span>.</p> + </dd> + + <dt>A comment token</dt> + <dd> + <p>Append a <code>Comment</code> node to the <span>current + node</span> with the <code title="">data</code> attribute set to + the data given in the comment token.</p> + </dd> + + <dt>A DOCTYPE token</dt> + <dd> + <p><span>Parse error</span>. Ignore the token.</p> + </dd> + + <dt>A start tag, if the <span>current node</span> is an <code>mi</code> element in the <span>MathML namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is an <code>mo</code> element in the <span>MathML namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is an <code>mn</code> element in the <span>MathML namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is an <code>ms</code> element in the <span>MathML namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is an <code>mtext</code> element in the <span>MathML namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is a <code>foreignObject</code> element in the <span>SVG namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is a <code>desc</code> element in the <span>SVG namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is a <code>title</code> element in the <span>SVG namespace</span>.</dt> + <dt>A start tag, if the <span>current node</span> is an element in the <span>HTML namespace</span>.</dt> + <dt>A start tag whose tag name is "svg", if the <span>current node</span> is an <code>annotation-xml</code> element in the <span>MathML namespace</span>.</dt> + <dt>An end tag</dt> + <dd> + + <p>Process the token <span>using the rules for</span> the + <span>secondary insertion mode</span>.</p> + + <p>If, after doing so, the <span>insertion mode</span> is still + "<span title="insertion mode: in foreign content">in foreign + content</span>", but there is no element in scope that has a + namespace other than the <span>HTML namespace</span>, switch the + <span>insertion mode</span> to the <span>secondary insertion + mode</span>.</p> + + </dd> + + <dt>A start tag whose tag name is one of: <span title="big-issue">the HTML element tag names</span></dt> + <dd> + + <p><span>Parse error</span>.</p> + + <p>Pop elements from the <span>stack of open elements</span> until + the <span>current node</span> is in the <span>HTML + namespace</span>.</p> + + <p>Switch the <span>insertion mode</span> to the <span>secondary + insertion mode</span>, and reprocess the token.</p> + + </dd> + + <dt>Any other start tag</dt> + <dd> + + <p class="big-issue">Apply case fixups, attribute namespace fixups.</p> + + <p><span>Insert a foreign element</span> for the token, in the + same namespace as the <span>current node</span>.</p> + + <p>If the token has its <i>self-closing flag</i> set, pop the + <span>current node</span> off the <span>stack of open + elements</span> and <span title="acknowledge self-closing + flag">acknowledge the token's <i>self-closing flag</i></span>.</p> + + </dd> + + </dl> + + <h5 id="parsing-main-afterbody">The "<dfn title="insertion mode: after body">after body</dfn>" insertion mode</h5> <p>When the <span>insertion mode</span> is "<span title="insertion @@ -41929,9 +42170,14 @@ <dt>A start tag whose tag name is "frame"</dt> <dd> + <p><span>Insert an HTML element</span> for the token. Immediately pop the <span>current node</span> off the <span>stack of open elements</span>.</p> + + <p><span title="acknowledge self-closing flag">Acknowledge the + token's <i>self-closing flag</i></span>, if it is set.</p> + </dd> <dt>A start tag whose tag name is "noframes"</dt>