NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 246 - add support for the new html elements. fix the handling of optional tags we added recently. also: clarify some notes, remove redundant requirements, clean up some punctuation.
add support for the new html elements. fix the handling of optional tags we a...
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-10 13:10 CEST by Henri Sivonen
Modified: 2008-09-15 19:47 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-06-10 13:10:39 CEST
Index: source
===================================================================
--- source	(revision 1730)
+++ source	(revision 1731)
@@ -3553,7 +3553,7 @@
  Test: http://www.hixie.ch/tests/adhoc/html/flow/image-maps/004-demo.html
  IE6 on Wine treats the following characters like this also: U+1-U+1f,
  U+21-U+2b, U+2d-U+2f, U+3a, U+3c-U+40, U+5b-U+60, U+7b-U+82,
- U+84-U+89, U+8b, U+8d, U+8f-U`+99, U+9b, U+9d, U+a0-U+bf, U+d7, U+f7,
+ U+84-U+89, U+8b, U+8d, U+8f-U+99, U+9b, U+9d, U+a0-U+bf, U+d7, U+f7,
  U+1f6-U+1f9, U+218-U+24f, U+2a9-U+385, U+387, U+38b, U+38d, U+3a2,
  U+3cf, U+3d7-U+3d9, U+3db, U+3dd, U+3df, U+3e1, U+3f4-U+400, U+40d,
  U+450, U+45d, U+482-U+48f, U+4c5-U+4c6, U+4c9-U+4ca, U+4cd-U+4cf,
@@ -7204,17 +7204,6 @@
   (if any), and that element's child nodes. Otherwise, the specified
   styles must, if applied, be applied to the entire document.</p>
 
-  <p>If the <code title="attr-style-scoped">scoped</code> attribute is
-  not specified, the <code>style</code> element must be the child of a
-  <code>head</code> element or of a <code>noscript</code> element that
-  is a child of a <code>head</code> element.</p>
-
-  <p>If the <code title="attr-style-scoped">scoped</code> attribute
-  <em>is</em> specified, then the <code>style</code> element must be
-  the child of a <span>flow content</span> element, before any text
-  nodes other than <span>inter-element whitespace</span>, and before
-  any elements other than other <code>style</code> elements.</p>
-
   <p id="title-on-style">The <dfn
   title="attr-style-title"><code>title</code></dfn> attribute on
   <code>style</code> elements defines <span>alternative style sheet
@@ -39414,12 +39403,10 @@
   <span>space character</span> or a <span
   title="syntax-comments">comment</span>, except if the first thing
   inside the <code>body</code> element is a <code>script</code> or
-  <code>style</code> element<!-- and the node immediately preceding
-  the <code>body</code> element is a <code>head</code> element whose
-  end tag has been omitted (XXX this last bit is commented out for now
-  because we have the dubious rule in the parser that makes <style>
-  and <script> elements between </head> and <body> end up in the
-  <head> instead of the <body>)-->.</p>
+  <code>style</code> element. <!-- Note that even if the </head> end
+  tag is present, the parser makes <style> and <script> elements
+  between </head> and <body> end up in the <head> instead of implying
+  the <body> --></p>
 
   <!-- </body> -->
   <p>A <code>body</code> element's <span title="syntax-end-tag">end
@@ -40723,19 +40710,19 @@
    <code>fieldset</code>, <code>form</code>, <code>frame</code>,
    <code>frameset</code>, <code>h1</code>, <code>h2</code>,
    <code>h3</code>, <code>h4</code>, <code>h5</code>, <code>h6</code>,
-   <code>head</code>, <code>hr</code>, <code>iframe</code>,
-   <code>image</code><!-- XXX ? this isn't an element that can end up
-   on the stack-->, <code>img</code>, <code>input</code>,
-   <code>isindex</code>, <code>li</code>, <code>link</code>,
-   <code>listing</code>, <code>menu</code>, <code>meta</code>,
-   <code>noembed</code>, <code>noframes</code>, <code>noscript</code>,
-   <code>ol</code>, <code>optgroup</code>, <code>option</code>,
-   <code>p</code>, <code>param</code>, <code>plaintext</code>,
-   <code>pre</code>, <code>script</code>, <code>select</code>,
-   <code>spacer</code>, <code>style</code>, <code>tbody</code>,
-   <code>textarea</code>, <code>tfoot</code>, <code>thead</code>,
-   <code>title</code>, <code>tr</code>, <code>ul</code>, and
-   <code>wbr</code>.</p></dd>
+   <code>head</code>, <code>hr</code>, <code>iframe</code>, <!--
+   <code>image</code>, (commented out because this isn't an element
+   that can end up on the stack, so it doesn't matter) -->
+   <code>img</code>, <code>input</code>, <code>isindex</code>,
+   <code>li</code>, <code>link</code>, <code>listing</code>,
+   <code>menu</code>, <code>meta</code>, <code>noembed</code>,
+   <code>noframes</code>, <code>noscript</code>, <code>ol</code>,
+   <code>optgroup</code>, <code>option</code>, <code>p</code>,
+   <code>param</code>, <code>plaintext</code>, <code>pre</code>,
+   <code>script</code>, <code>select</code>, <code>spacer</code>,
+   <code>style</code>, <code>tbody</code>, <code>textarea</code>,
+   <code>tfoot</code>, <code>thead</code>, <code>title</code>,
+   <code>tr</code>, <code>ul</code>, and <code>wbr</code>.</p></dd>
 
    <dt><dfn>Scoping</dfn></dt>
    <dd><p>The following HTML elements introduce new <span title="has
@@ -42807,7 +42794,8 @@
   <p>When the steps below require the UA to <dfn>generate implied end
   tags</dfn>, then, while the <span>current node</span> is a
   <code>dd</code> element, a <code>dt</code> element, an
-  <code>li</code> element, a <code>p</code> element, an
+  <code>li</code> element, an <code>option</code> element, an
+  <code>optgroup</code> element, a <code>p</code> element, an
   <code>rp</code> element, or an <code>rt</code> element, the UA must
   pop the <span>current node</span> off the <span>stack of open
   elements</span>.</p>
@@ -43225,7 +43213,8 @@
     mode</span>.</p>
    </dd>
 
-   <dt>A start tag whose tag name is one of: "base", "link"</dt>
+   <dt>A start tag whose tag name is one of: "base", "command",
+   "event-source", "link"</dt>
    <dd>
 
     <p><span>Insert an HTML element</span> for the token. Immediately
@@ -43272,13 +43261,13 @@
     <p>Follow the <span>generic RCDATA parsing algorithm</span>.</p>
    </dd>
 
-   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is enabled:</dt>
+   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is enabled</dt>
    <dt>A start tag whose tag name is one of: "noframes", "style"</dt>
    <dd>
     <p>Follow the <span>generic CDATA parsing algorithm</span>.</p>
    </dd>
 
-   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is disabled:</dt>
+   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is disabled</dt>
    <dd>
 
     <p><span>Insert an HTML element</span> for the token.</p>
@@ -43656,8 +43645,9 @@
     add the attribute and its corresponding value to that element.</p>
    </dd>
 
-   <dt>A start tag token whose tag name is one of: "base", "link",
-   "meta", "noframes", "script", "style", "title"</dt>
+   <dt>A start tag token whose tag name is one of: "base", "command",
+   "event-source", "link", "meta", "noframes", "script", "style",
+   "title"</dt>
    <dd>
     <p>Process the token <span>using the rules for</span> the "<span
     title="insertion mode: in head">in head</span>" <span>insertion
@@ -43737,9 +43727,13 @@
 
    </dd>
 
-   <dt>A start tag whose tag name is one of: "address", "blockquote",
-   "center", "dir", "div", "dl", "fieldset", "h1", "h2", "h3", "h4",
-   "h5", "h6", "menu", "ol", "p", "ul"</dt>
+   <!-- start tags for non-phrasing flow content elements -->
+
+   <!-- the normal ones -->
+   <dt>A start tag whose tag name is one of: "address", "article",
+   "aside", "blockquote", "center", "datagrid", "dialog", "dir",
+   "div", "dl", "fieldset", "footer", "h1", "h2", "h3", "h4", "h5",
+   "h6", "header", "menu", "nav", "ol", "p", "section", "ul"</dt>
    <dd>
 
     <!-- As of May 2008 this doesn't match any browser exactly, but is
@@ -43769,6 +43763,7 @@
 
    </dd>
 
+   <!-- as normal, but drops leading newline -->
    <dt>A start tag whose tag name is one of: "pre", "listing"</dt>
    <dd>
 
@@ -43786,6 +43781,7 @@
 
    </dd>
 
+   <!-- as normal, but interacts with the form element pointer -->
    <dt>A start tag whose tag name is "form"</dt>
    <dd>
 
@@ -43806,14 +43802,10 @@
 
    </dd>
 
+   <!-- as normal, but imply </li> when there's another <li> open in weird cases -->
    <dt>A start tag whose tag name is "li"</dt>
    <dd>
 
-    <p>If the <span>stack of open elements</span> <span title="has
-    an element in scope">has a <code>p</code> element in
-    scope</span>, then act as if an end tag with the tag name
-    <code>p</code> had been seen.</p>
-
     <p>Run the following algorithm:</p>
 
     <ol>
@@ -43821,44 +43813,47 @@
      <li><p>Initialise <var title="">node</var> to be the <span>current
      node</span> (the bottommost node of the stack).</p></li>
 
-     <li><p>If <var title="">node</var> is an <code>li</code>
-     element, then pop all the nodes from the <span>current
-     node</span> up to <var title="">node</var>, including <var
-     title="">node</var>, then stop this algorithm. If more than
-     one node is popped, then this is a <span>parse
-     error</span>.</p></li>
+     <li><p>If <var title="">node</var> is an <code>li</code> element,
+     then act as if an end tag with the tag name <code>li</code> had
+     been seen, then jump to the last step.</p></li>
 
      <li><p>If <var title="">node</var> is not in the
      <span>formatting</span> category, and is not in the
      <span>phrasing</span> category, and is not an
-     <code>address</code> or <code>div</code> element, then stop
-     this algorithm.</p></li> <!-- an element <foo> is in this
-     list if the following markup:
+     <code>address</code> or <code>div</code> element, then jump to
+     the last step.</p></li> <!-- an element <foo> is in this list if
+     the following markup:
 
          <!DOCTYPE html><body><ol><li><foo><li>
 
-     ...results in the second <li> not being (in any way) a
-     descendant of the first <li>, or if <foo> is a formatting
-     element that gets reopened later. -->
+     ...results in the second <li> not being (in any way) a descendant
+     of the first <li>, or if <foo> is a formatting element that gets
+     reopened later. -->
+
+     <li><p>Otherwise, set <var title="">node</var> to the previous
+     entry in the <span>stack of open elements</span> and return to
+     step 2.</p></li>
 
-     <li><p>Otherwise, set <var title="">node</var> to the previous entry
-     in the <span>stack of open elements</span> and return to step
-     2.</p></li>
+     <li>
 
-    </ol>
+      <p>If the <span>stack of open elements</span> <span title="has
+      an element in scope">has a <code>p</code> element in
+      scope</span>, then act as if an end tag with the tag name
+      <code>p</code> had been seen.</p>
+
+      <p>Finally, <span>insert an HTML element</span> for the
+      token.</p>
+
+     </li>
 
-    <p>Finally, <span>insert an HTML element</span> for the token.</p>
+    </ol>
 
    </dd>
 
+   <!-- as normal, but imply </dt> or </dd> when there's another <dt> or <dd> open in weird cases  -->
    <dt>A start tag whose tag name is one of: "dd", "dt"</dt>
    <dd>
 
-    <p>If the <span>stack of open elements</span> <span title="has
-    an element in scope">has a <code>p</code> element in
-    scope</span>, then act as if an end tag with the tag name
-    <code>p</code> had been seen.</p>
-
     <p>Run the following algorithm:</p>
 
     <ol>
@@ -43867,35 +43862,44 @@
      node</span> (the bottommost node of the stack).</p></li>
 
      <li><p>If <var title="">node</var> is a <code>dd</code> or
-     <code>dt</code> element, then pop all the nodes from the
-     <span>current node</span> up to <var title="">node</var>,
-     including <var title="">node</var>, then stop this algorithm.
-     If more than one node is popped, then this is a <span>parse
-     error</span>.</p></li>
+     <code>dt</code> element, then act as if an end tag with the same
+     tag name as <var title="">node</var> had been seen, then jump to
+     the last step.</p></li>
 
      <li><p>If <var title="">node</var> is not in the
      <span>formatting</span> category, and is not in the
      <span>phrasing</span> category, and is not an
-     <code>address</code> or <code>div</code> element, then stop
-     this algorithm.</p></li> <!-- an element <foo> is in this
-     list if the following markup:
-
-         <!DOCTYPE html><body><ol><dt><foo><dt>
-
-     ...results in the second <li> not being (in any way) a
-     descendant of the first <li>, or if <foo> is a formatting
-     element that gets reopened later. -->
+     <code>address</code> or <code>div</code> element, then jump to
+     the last step.</p></li> <!-- an element <foo> is in this list if
+     the following markup:
+
+         <!DOCTYPE html><body><dl><dt><foo><dt>
+
+     ...results in the second <dt> not being (in any way) a descendant
+     of the first <dt>, or if <foo> is a formatting element that gets
+     reopened later. -->
+
+     <li><p>Otherwise, set <var title="">node</var> to the previous
+     entry in the <span>stack of open elements</span> and return to
+     step 2.</p></li>
 
-     <li><p>Otherwise, set <var title="">node</var> to the previous entry
-     in the <span>stack of open elements</span> and return to step
-     2.</p></li>
+     <li>
 
-    </ol>
+      <p>If the <span>stack of open elements</span> <span title="has
+      an element in scope">has a <code>p</code> element in
+      scope</span>, then act as if an end tag with the tag name
+      <code>p</code> had been seen.</p>
 
-    <p>Finally, <span>insert an HTML element</span> for the token.</p>
+      <p>Finally, <span>insert an HTML element</span> for the
+      token.</p>
+
+     </li>
+
+    </ol>
 
    </dd>
 
+   <!-- same as normal, but effectively ends parsing -->
    <dt>A start tag whose tag name is "plaintext"</dt>
    <dd>
 
@@ -43917,9 +43921,13 @@
 
    </dd>
 
-   <dt>An end tag whose tag name is one of: "address",
-   "blockquote", "center", "dir", "div", "dl", "fieldset",
-   "listing", "menu", "ol", "pre", "ul"</dt>
+   <!-- end tags for non-phrasing flow content elements -->
+
+   <!-- the normal ones -->
+   <dt>An end tag whose tag name is one of: "address", "article",
+   "aside", "blockquote", "center", "datagrid", "dialog", "dir",
+   "div", "dl", "fieldset", "footer", "header", "listing", "menu",
+   "nav", "ol", "pre", "section", "ul"</dt>
    <dd>
 
     <p>If the <span>stack of open elements</span> does not <span
@@ -43945,6 +43953,7 @@
 
    </dd>
 
+   <!-- as normal, but interacts with the form element pointer -->
    <dt>An end tag whose tag name is "form"</dt>
    <dd>
 
@@ -43974,24 +43983,37 @@
 
    </dd>
 
+   <!-- as normal, except </p> implies <p> if there's no <p> in scope, and needs care as the elements have optional tags -->
    <dt>An end tag whose tag name is "p"</dt>
    <dd>
 
-    <p>If the <span>current node</span> is not a <code>p</code>
-    element, then this is a <span>parse error</span>.</p>
-
-    <p>If the <span>stack of open elements</span> <span title="has
-    an element in scope">has a <code>p</code> element in
-    scope</span>, then pop elements from this stack until the
-    stack no longer <span title="has an element in scope">has a
-    <code>p</code> element in scope</span>.</p>
-
-    <p>Otherwise, act as if a start tag with the tag name
+    <p>If the <span>stack of open elements</span> does not <span
+    title="has an element in scope">have an element in scope</span>
+    with the same tag name as that of the token, then this is a
+    <span>parse error</span>; act as if a start tag with the tag name
     <code>p</code> had been seen, then reprocess the current
     token.</p>
 
+    <p>Otherwise, run these steps:</p>
+
+    <ol>
+
+     <li><p><span>Generate implied end tags</span>, except
+     for elements with the same tag name as the token.</p></li>
+
+     <li><p>If the <span>current node</span> is not an element with
+     the same tag name as that of the token, then this is a
+     <span>parse error</span>.</p></li>
+
+     <li><p>Pop elements from the <span>stack of open elements</span>
+     until an element with the same tag name as the token has been
+     popped from the stack.</p></li>
+
+    </ol>
+
    </dd>
 
+   <!-- as normal, but needs care as the elements have optional tags -->
    <dt>An end tag whose tag name is one of: "dd", "dt", "li"</dt>
    <dd>
 
@@ -44019,8 +44041,8 @@
 
    </dd>
 
-   <dt>An end tag whose tag name is one of: "h1", "h2", "h3",
-   "h4", "h5", "h6"</dt>
+   <!-- as normal, except acts as a closer for any of the h1-h6 elements -->
+   <dt>An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"</dt>
    <dd>
 
     <p>If the <span>stack of open elements</span> does not <span
@@ -44048,6 +44070,12 @@
 
    </dd>
 
+   <dt>An end tag whose tag name is "sarcasm"</dt>
+   <dd>
+    <p>Take a deep breath, then act as described in the "any other end
+    tag" entry below.</p>
+   </dd>
+
    <!-- ADOPTION AGENCY ELEMENTS
         Mozilla-only: bdo blink del ins sub sup q
         Safari-only: code dfn kbd nobr samp var wbr
@@ -44394,7 +44422,7 @@
    </dd>
 
    <dt>A start tag whose tag name is one of: "area", "basefont",
-   "bgsound", "br", "embed", "img", "param", "spacer", "wbr"</dt>
+   "bgsound", "br", "embed", "img", "spacer", "wbr"</dt>
    <dd>
 
     <p><span>Reconstruct the active formatting elements</span>, if
@@ -44409,6 +44437,18 @@
 
    </dd>
 
+   <dt>A start tag whose tag name is one of: "param", "source"</dt>
+   <dd>
+
+    <p><span>Insert an HTML element</span> for the token. Immediately
+    pop the <span>current node</span> off the <span>stack of open
+    elements</span>.</p>
+
+    <p><span title="acknowledge self-closing flag">Acknowledge the
+    token's <i>self-closing flag</i></span>, if it is set.</p>
+
+   </dd>
+
    <dt>A start tag whose tag name is "hr"</dt>
    <dd>
 
@@ -44576,7 +44616,7 @@
    </dd>
 
    <dt>A start tag whose tag name is one of: "iframe", "noembed"</dt>
-   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is enabled:</dt>
+   <dt>A start tag whose tag name is "noscript", if the <span>scripting flag</span> is enabled</dt>
    <dd>
     <p>Follow the <span>generic CDATA parsing algorithm</span>.</p>
    </dd>
@@ -44701,32 +44741,28 @@
 
    </dd>
 -->
-   <dt>A start or end tag whose tag name is one of: "caption", "col",
-   "colgroup", "frame", "frameset", "head", "option", "optgroup",
-   "tbody", "td", "tfoot", "th", "thead", "tr"</dt>
-   <dt>An end tag whose tag name is one of: "area", "basefont",
-   "bgsound", "embed", "hr", "iframe", "image", "img", "input",
-   "isindex", "noembed", "noframes", "param", "select", "spacer",
-   "table", "textarea", "wbr"</dt> <!-- add keygen if we add the start
-   tag -->
-   <dt>An end tag whose tag name is "noscript", if the <span>scripting flag</span> is enabled:</dt>
-   <dd>
-    <p><span>Parse error</span>. Ignore the token.</p>
-   </dd>
-
-   <dt>A start or end tag whose tag name is one of:
-   "event-source", "section", "nav", "article", "aside", "header",
-   "footer", "datagrid", "command"</dt>
 
+   <dt>A start <!--or end--> tag whose tag name is one of: "caption",
+   "col", "colgroup", "frame", "frameset", "head", "tbody", "td",
+   "tfoot", "th", "thead", "tr"</dt>
+   <!--<dt>An end tag whose tag name is one of: "area", "base",
+   "basefont", "bgsound", "command", "embed", "event-source", "hr",
+   "iframe", "image", "img", "input", "isindex", "link", "meta",
+   "noembed", "noframes", "param", "script", "select", "source",
+   "spacer", "style", "table", "textarea", "title", "wbr"</dt>--> <!--
+   add keygen if we add the start tag -->
+   <!--<dt>An end tag whose tag name is "noscript", if the
+   <span>scripting flag</span> is enabled</dt>-->
    <dd>
-
-    <!-- XXXX -->
-
-    <p class="big-issue">Work in progress!</p>
-
+    <p><span>Parse error</span>. Ignore the token.</p>
+    <!-- end tags are commented out because since they can never end
+    up on the stack anyway, the default end tag clause will
+    automatically handle them. we don't want to have text in the spec
+    that is just an optimisation, as that detracts from the spec
+    itself -->
    </dd>
 
-   <dt>A start tag token not covered by the previous entries</dt>
+   <dt>Any other start tag</dt>
    <dd>
 
     <p><span>Reconstruct the active formatting elements</span>, if
@@ -44739,7 +44775,7 @@
 
    </dd>
 
-   <dt>An end tag token not covered by the previous entries</dt>
+   <dt>Any other end tag</dt>
    <dd>
 
     <p>Run the following algorithm:</p>