NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 575 - How to extract an Atom feed from an HTML5 document
How to extract an Atom feed from an HTML5 document
Status: RESOLVED INTENTIONAL
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-27 14:38 CEST by Henri Sivonen
Modified: 2009-05-27 14:53 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2009-05-27 14:38:27 CEST
Index: source
===================================================================
--- source	(revision 3115)
+++ source	(revision 3116)
@@ -13352,11 +13352,12 @@
    <dt>Content attributes:</dt>
    <dd><span>Global attributes</span></dd>
    <dd><code title="attr-article-cite">cite</code></dd>
-   <!-- v2 attributes to give the date authored, date published, name of author, etc -->
+   <dd><code title="attr-article-pubdate">pubdate</code></dd>
    <dt>DOM interface:</dt>
    <dd>
 <pre class="idl">interface <dfn>HTMLArticleElement</dfn> : <span>HTMLElement</span> {
            attribute DOMString <span title="dom-article-cite">cite</span>;
+           attribute DOMString <span title="dom-article-pubDate">pubDate</span>;
 };</pre>
    </dd>
   </dl>
@@ -13397,11 +13398,20 @@
   element. User agents should allow users to follow such citation
   links.</span></p>
 
+  <p>The <dfn title="attr-article-pubdate"><code>pubdate</code></dfn>
+  attribute may be used to specify the time and date that the article
+  was first published. If present, the <code
+  title="attr-article-pubdate">pubdate</code> attribute must be a
+  <span>valid global date and time string</span> value.</p>
+
   <div class="impl">
 
   <p>The <dfn title="dom-article-cite"><code>cite</code></dfn> DOM
   attribute must <span>reflect</span> the element's <code
-  title="attr-article-cite">cite</code> content attribute.</p>
+  title="attr-article-cite">cite</code> content attribute. The <dfn
+  title="dom-article-pubDate"><code>pubDate</code></dfn> DOM attribute
+  must <span>reflect</span> the element's <code
+  title="attr-article-pubdate">pubdate</code> content attribute.</p>
 
   </div>
 
@@ -28445,12 +28455,12 @@
   rules given for URL decomposition attributes, with the <span
   title="concept-uda-input">input</span> being the result of <span
   title="resolve a url">resolving</span> the element's <code
-  title="attr-area-href">href</code> attribute relative to the
+  title="attr-hyperlink-href">href</code> attribute relative to the
   element, if there is such an attribute and resolving it is
   successful, or the empty string otherwise; and the <span
   title="concept-uda-setter">common setter action</span> being the
   same as setting the element's <code
-  title="attr-area-href">href</code> attribute to the new output
+  title="attr-hyperlink-href">href</code> attribute to the new output
   value.</p>
 
   </div>
@@ -28480,9 +28490,9 @@
   specifying a <dfn
   title="attr-hyperlink-usemap"><code>usemap</code></dfn> attribute on
   the <code>img</code> or <code>object</code> element. The <code
-  title="attr-area-usemap">usemap</code> attribute, if specified, must
-  be a <span>valid hash-name reference</span> to a <code>map</code>
-  element.</p>
+  title="attr-hyperlink-usemap">usemap</code> attribute, if specified,
+  must be a <span>valid hash-name reference</span> to a
+  <code>map</code> element.</p>
 
   <div class="example">
 
@@ -28515,8 +28525,8 @@
 
   <p>If an <code>img</code> element or an <code>object</code> element
   representing an image has a <code
-  title="attr-area-usemap">usemap</code> attribute specified, user
-  agents must process it as follows:</p>
+  title="attr-hyperlink-usemap">usemap</code> attribute specified,
+  user agents must process it as follows:</p>
 
   <ol>
 
@@ -49967,7 +49977,7 @@
       <p>If the element is, or is a descendant of, an
       <code>address</code> element that <a
       href="#applyToSection">applies</a> to <span>the
-      <code>body</code> element</span>, an the <span
+      <code>body</code> element</span>, and the <span
       title="concept-item">item</span> has the type <code
       title="md-vcard">vcard</code>, generate the following
       triple:</p>
@@ -51225,6 +51235,560 @@
   </ol>
 
 
+  <h4>Atom</h4>
+
+  <p>Given a <code>Document</code> <var title="">source</var>, a user
+  agent must run the following algorithm to <dfn title="extracting
+  Atom">extract an Atom feed</dfn>:</p>
+
+  <ol>
+
+   <li><p>If the <code>Document</code> <var title="">source</var> does
+   not contain any <code>article</code> elements, then return nothing
+   and abort these steps. This algorithm can only be used with
+   documents that contain distinct articles.</p>
+
+   <li><p>Let <var title="">R</var> be an empty <span title="XML
+   documents">XML</span> <code>Document</code> object whose <span
+   title="the document's address">address</span> is user-agent
+   defined.</p></li>
+
+   <li><p>Append a <code title="">feed</code> element in the
+   <span>Atom namespace</span> to <var title="">R</var>.</p></li>
+
+   <li>
+
+    <p>For each element <var title="">candidate</var> that is, or is a
+    descendant of, an <code>address</code> element that <a
+    href="#applyToSection">applies</a> to <span>the <code>body</code>
+    element</span>, and that is an <span
+    title="concept-item">item</span> that has the type <code
+    title="md-vcard">vcard</code>, if there is a property <var
+    title="">property</var> named <code title="md-vcard-fn">fn</code>
+    whose <span title="concept-item-corresponding">corresponding
+    item</span> is <var title="">candidate</var>, and the <span
+    title="concept-property-value">value</span> of <var
+    title="">property</var> is not an <span
+    title="concept-item">item</span>, then append an <code
+    title="">author</code> element in the <span>Atom namespace</span>
+    to the root element of <var title="">R</var> whose contents is a
+    text node with its data set to the <span
+    title="concept-property-value">value</span> of <var
+    title="">property</var>.</p>
+
+   </li>
+
+   <li>
+
+    <p>If there is a <code>link</code> element whose <code
+    title="attr-link-rel">rel</code> attribute's value includes the
+    keyword <code title="rel-icon">icon</code>, and that element also
+    has an <code title="attr-link-href">href</code> attribute, then
+    append an <code title="">icon</code> element in the <span>Atom
+    namespace</span> to the root element of <var title="">R</var>
+    whose contents is a text node with its data set to the
+    <span>absolute URL</span> resulting from <span title="resolve a
+    url">resolving</span> the value of the <code
+    title="attr-link-href">href</code> attribute relative to the
+    <code>link</code> element.</p>
+
+    <!-- could check ratio, could check type... -->
+
+   </li>
+
+   <li>
+
+    <p>Append an <code title="">id</code> element in the <span>Atom
+    namespace</span> to the root element of <var title="">R</var>
+    whose contents is a text node with its data set to <span>the
+    document's current address</span>.</p>
+
+   </li>
+
+   <li>
+
+    <p>Optionally: Let <var title="">x</var> be a <code
+    title="">link</code> element in the <span>Atom
+    namespace</span>. Add a <code title="">rel</code> attribute whose
+    value is the string "<code title="">self</code>" to <var
+    title="">x</var>. Append a text node with its data set to the
+    (user-agent defined) <span title="the document's
+    address">address</span> of <var title="">R</var> to <var
+    title="">x</var>. Append <var title="">x</var> to the root element
+    of <var title="">R</var>.</p>
+
+    <p class="note">This step would be skipped when the document <var
+    title="">R</var> has no convenient <span title="the document's
+    address">address</span>. The presence of the <code
+    title="">rel="self"</code> link is a "should"-level requirement in
+    the Atom specification.</p>
+
+   </li>
+
+   <li>
+
+    <p>Let <var title="">x</var> be a <code title="">link</code>
+    element in the <span>Atom namespace</span>. Add a <code
+    title="">rel</code> attribute whose value is the string "<code
+    title="">alternate</code>" to <var title="">x</var>. Add a <code
+    title="">type</code> attribute whose value is the string "<code
+    title="">text/html</code>" to <var title="">x</var>. Append a text
+    node with its data set to <span>the document's current
+    address</span> to <var title="">x</var>. Append <var
+    title="">x</var> to the root element of <var title="">R</var>.</p>
+
+   </li>
+
+   <li>
+
+    <p>Let <var title="">x</var> be a <code title="">link</code>
+    element in the <span>Atom namespace</span>. Add a <code
+    title="">rel</code> attribute whose value is the string "<code
+    title="">alternate</code>" to <var title="">x</var>. If the
+    document being converted is an <span title="HTML documents">HTML
+    document</span>, add a <code title="">type</code> attribute whose
+    value is the string "<code title="">text/html</code>" to <var
+    title="">x</var>. Otherwise, the document being converted is an
+    <span title="XML documents">XML document</span>; add a <code
+    title="">type</code> attribute whose value is the string "<code
+    title="">application/xhtml+xml</code>" to <var
+    title="">x</var>. Append a text node with its data set to
+    <span>the document's current address</span> to <var
+    title="">x</var>. Append <var title="">x</var> to the root element
+    of <var title="">R</var>.</p>
+
+   </li>
+
+   <li><p>Let <var title="">subheading text</var> be the empty
+   string.</p></li>
+
+   <li><p>Let <var title="">heading</var> be the first element of
+   <span>heading content</span> whose nearest ancestor of
+   <span>sectioning content</span> is the <span>the body
+   element</span>, if any, or null if there is none.</p></li>
+
+   <li>
+
+    <p>Take the appropriate action from the following list, as
+    determined by the type of the <var title="">heading</var>
+    element:</p>
+
+    <dl>
+
+     <dt>If <var title="">heading</var> is null</dt>
+
+     <dd>
+
+      <p>Let <var title="">heading text</var> be the
+      <code>textContent</code> of <span>the <code>title</code>
+      element</span>, if there is one, or the empty string
+      otherwise.</p>
+
+     </dd>
+
+     <dt>If <var title="">heading</var> is a <code>hgroup</code> element</dt>
+
+     <dd>
+
+      <p>If <var title="">heading</var> contains no child
+      <code>h1</code>&ndash;<code>h6</code> elements, let <var
+      title="">heading text</var> be the empty string.</p>
+
+      <p>Otherwise, let <var title="">headings list</var> be a list of
+      all the <code>h1</code>&ndash;<code>h6</code> element children
+      of <var title="">heading</var>, sorted first by descending
+      <span>rank</span> and then in <span>tree order</span> (so
+      <code>h1</code>s first, then <code>h2</code>s, etc, with each
+      group in the order they appear in the document). Then, let <var
+      title="">heading text</var> be the <code>textContent</code> of
+      the first entry in <var title="">headings list</var>, and if
+      there are multiple entries, let <var title="">subheading
+      text</var> be the <code>textContent</code> of the second entry
+      in <var title="">headings list</var>.</p>
+
+     </dd>
+
+     <dt>If <var title="">heading</var> is an <code>h1</code>&ndash;<code>h6</code> element</dt>
+
+     <dd>
+
+      <p>Let <var title="">heading text</var> be the
+      <code>textContent</code> of <var title="">heading</var>.</p>
+
+     </dd>
+
+    </dl>
+
+   </li>
+
+   <li>
+
+    <p>Append a <code title="">title</code> element in the <span>Atom
+    namespace</span> to the root element of <var title="">R</var>
+    whose contents is a text node with its data set to <var
+    title="">heading text</var>.</p>
+
+   </li>
+
+   <li>
+
+    <p>If <var title="">subheading text</var> is not the empty string,
+    append a <code title="">subtitle</code> element in the <span>Atom
+    namespace</span> to the root element of <var title="">R</var>
+    whose contents is a text node with its data set to <var
+    title="">subheading text</var>.</p>
+
+   </li>
+
+   <li><p>Let <var title="">global update date</var> have no
+   value.</p></li>
+
+   <li>
+
+    <p>For each <code>article</code> element <var
+    title="">article</var> that does not have an ancestor
+    <code>article</code> element, run the following steps:</p>
+
+    <ol>
+
+     <li><p>Let <var title="">E</var> be an <code
+     title="">entry</code> element in the <span>Atom namespace</span>,
+     and append <var title="">E</var> to the root element of <var
+     title="">R</var>.</p></li>
+
+     <li><p>Let <var title="">heading</var> be the first element of
+     <span>heading content</span> whose nearest ancestor of
+     <span>sectioning content</span> is <var title="">article</var>,
+     if any, or null if there is none.</p></li>
+
+     <li>
+
+      <p>Take the appropriate action from the following list, as
+      determined by the type of the <var title="">heading</var>
+      element:</p>
+
+      <dl>
+
+       <dt>If <var title="">heading</var> is null</dt>
+
+       <dd>
+
+        <p>Let <var title="">heading text</var> be the empty
+        string.</p>
+
+       </dd>
+
+       <dt>If <var title="">heading</var> is a <code>hgroup</code> element</dt>
+
+       <dd>
+
+        <p>If <var title="">heading</var> contains no child
+        <code>h1</code>&ndash;<code>h6</code> elements, let <var
+        title="">heading text</var> be the empty string.</p>
+
+        <p>Otherwise, let <var title="">headings list</var> be a list
+        of all the <code>h1</code>&ndash;<code>h6</code> element
+        children of <var title="">heading</var>, sorted first by
+        descending <span>rank</span> and then in <span>tree
+        order</span> (so <code>h1</code>s first, then
+        <code>h2</code>s, etc, with each group in the order they
+        appear in the document). Then, let <var title="">heading
+        text</var> be the <code>textContent</code> of the first entry
+        in <var title="">headings list</var>.</p>
+
+       </dd>
+
+       <dt>If <var title="">heading</var> is an <code>h1</code>&ndash;<code>h6</code> element</dt>
+
+       <dd>
+
+        <p>Let <var title="">heading text</var> be the
+        <code>textContent</code> of <var title="">heading</var>.</p>
+
+       </dd>
+
+      </dl>
+
+     </li>
+
+     <li>
+
+      <p>Append a <code title="">title</code> element in the
+      <span>Atom namespace</span> to <var title="">E</var> whose
+      contents is a text node with its data set to <var
+      title="">heading text</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>For each element <var title="">candidate</var> that is, or is
+      a descendant of, an <code>address</code> element that <a
+      href="#applyToSection">applies</a> to <var
+      title="">article</var>, and that is an <span
+      title="concept-item">item</span> that has the type <code
+      title="md-vcard">vcard</code>, if there is a property <var
+      title="">property</var> named <code
+      title="md-vcard-fn">fn</code> whose <span
+      title="concept-item-corresponding">corresponding item</span> is
+      <var title="">candidate</var>, and the <span
+      title="concept-property-value">value</span> of <var
+      title="">property</var> is not an <span
+      title="concept-item">item</span>, then append an <code
+      title="">author</code> element in the <span>Atom
+      namespace</span> to <var title="">E</var> whose contents is a
+      text node with its data set to the <span
+      title="concept-property-value">value</span> of <var
+      title="">property</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>Clone <var title="">article</var> and its descendants into an
+      environment that has <span title="concept-bc-noscript">scripting
+      disabled</span>, has no <span title="plugin">plugins</span>, and
+      fails any attempt to <span title="fetch">fetch</span> any
+      resources. Let <var title="">cloned article</var> be the
+      resulting clone <code>article</code> element.</p>
+
+     </li>
+
+     <li>
+
+      <p>Remove from the subtree rooted at <var title="">cloned
+      article</var> any <code>article</code> elements other than the
+      <var title="">cloned article</var> itself, any
+      <code>header</code>, <code>footer</code>, or <code>nav</code>
+      elements whose nearest ancestor of <span>sectioning
+      content</span> is the <var title="">cloned article</var>, and
+      the first element of <span>heading content</span> whose nearest
+      ancestor of <span>sectioning content</span> is the <var
+      title="">cloned article</var>, if any.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">cloned article</var> contains any
+      <code>ins</code> or <code>del</code> elements with <code
+      title="attr-mod-datetime">datetime</code> attributes whose
+      values <span title="parse a global date and time string">parse
+      as global date and time strings</span> without errors, then let
+      <var title="">update date</var> be the value of the <code
+      title="attr-mod-datetime">datetime</code> attribute that parses
+      to the newest <span title="concept-datetime">global date and
+      time</span>.</p>
+
+      <p>Otherwise, let <var title="">update date</var> have no
+      value.</p>
+
+      <p class="note">This value is used below; it is calculated here
+      because in certain cases the next step mutates the <var
+      title="">cloned article</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>If the document being converted is an <span title="HTML
+      documents">HTML document</span>, then: Let <var title="">x</var>
+      be a <code title="">content</code> element in the <span>Atom
+      namespace</span>. Add a <code title="">type</code> attribute
+      whose value is the string "<code title="">html</code>" to <var
+      title="">x</var>. Append a text node with its data set to the
+      result of running the <span>HTML fragment serialization
+      algorithm</span> on <var title="">cloned article</var> to <var
+      title="">x</var>. Append <var title="">x</var> to <var
+      title="">E</var>.</p>
+
+      <p>Otherwise, the document being converted is an <span
+      title="XML documents">XML document</span>: Let <var
+      title="">x</var> be a <code title="">content</code> element in
+      the <span>Atom namespace</span>. Add a <code
+      title="">type</code> attribute whose value is the string "<code
+      title="">xml</code>" to <var title="">x</var>. Append a
+      <code>div</code> element to <var title="">x</var>. Move all the
+      child nodes of the <var title="">cloned article</var> node to
+      that <code>div</code> element, preserving their relative
+      order. Append <var title="">x</var> to <var
+      title="">E</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>Establish the value of <var title="">id</var> and <var
+      title="">has-alternate</var> from the first of the following to
+      apply:</p>
+
+      <dl>
+
+       <dt>If the <var title="">article</var> node has a <code
+       title="attr-article-cite">cite</code> attribute</dt>
+
+       <dd>Let <var title="">id</var> be the <span>absolute URL</span>
+       resulting from <span title="resolve a url">resolving</span> the
+       value of the <code title="attr-article-cite">cite</code>
+       relative to the <var title="">article</var> element. Let <var
+       title="">has-alternate</var> be true.</dd>
+
+       <dt>If the <var title="">article</var> node has a descendant
+       <code>a</code> or <code>area</code> element with an <code
+       title="attr-hyperlink-href">href</code> attribute and a <code
+       title="attr-a-rel">rel</code> attribute whose value includes
+       the <code title="rel-bookmark">bookmark</code> keyword</dt>
+
+       <dd>Let <var title="">id</var> be the <span>absolute URL</span>
+       resulting from <span title="resolve a url">resolving</span> the
+       value of the <code title="attr-hyperlink-href">href</code>
+       attribute of the first such <code>a</code> ot <code>area</code>
+       element, relative to the element. Let <var
+       title="">has-alternate</var> be true.</dd>
+
+       <dt>If the <var title="">article</var> node has an <code
+       title="attr-id">id</code> attribute</dt>
+
+       <dd>Let <var title="">id</var> be <span>the document's current
+       address</span>, with the fragment identifier (if any) removed,
+       and with a new fragment identifier specified, consisting of the
+       value of the <var title="">article</var> element's <code
+       title="attr-id">id</code> attribute. Let <var
+       title="">has-alternate</var> be false.</dd>
+
+       <dt>Otherwise</dt>
+
+       <dd>Let <var title="">id</var> be a user-agent defined
+       undereferencable yet globally unique <span>absolute
+       URL</span>. Let <var title="">has-alternate</var> be
+       false.</dd>
+
+      </dl>
+
+     </li>
+
+     <li>
+
+      <p>Append an <code title="">id</code> element in the <span>Atom
+      namespace</span> to <var title="">E</var> whose contents is a
+      text node with its data set to <var title="">id</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">has-alternate</var> is true: Let <var
+      title="">x</var> be a <code title="">link</code> element in the
+      <span>Atom namespace</span>. Add a <code title="">rel</code>
+      attribute whose value is the string "<code
+      title="">alternate</code>" to <var title="">x</var>. Append a
+      text node with its data set to <var title="">id</var> to <var
+      title="">x</var>. Append <var title="">x</var> to <var
+      title="">E</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">article</var> has a <code
+      title="attr-article-pubdate">pubdate</code> attribute, and <span
+      title="parse a global date and time string">parsing that
+      attribute's value as a global date and time string</span> does
+      not result in an error, then let <var title="">publication
+      date</var> be the value of that attribute.</p>
+
+      <p>Otherwise, let <var title="">publication date</var> have no
+      value.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">update date</var> has no value but <var
+      title="">publication date</var> does, then let <var
+      title="">update date</var> have the value of <var
+      title="">publication date</var>.</p>
+
+      <p>Otherwise, if <var title="">publication date</var> has no
+      value but <var title="">update date</var> does, then let <var
+      title="">publication date</var> have the value of <var
+      title="">update date</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">update date</var> has a value, and <var
+      title="">global update date</var> has no value or is less recent
+      than <var title="">update date</var>, then let <var
+      title="">global update date</var> have the value of <var
+      title="">update date</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>If <var title="">publication date</var> and <var
+      title="">update date</var> both still have no value, then let
+      them both value a value that is a <span>valid global date and
+      time string</span> representing the <span
+      title="concept-datetime">global date and time</span> of the
+      moment that this algorithm was invoked.</p>
+
+     </li>
+
+     <li>
+
+      <p>Append an <code title="">published</code> element in the
+      <span>Atom namespace</span> to <var title="">E</var> whose
+      contents is a text node with its data set to <var
+      title="">publication date</var>.</p>
+
+     </li>
+
+     <li>
+
+      <p>Append an <code title="">updated</code> element in the
+      <span>Atom namespace</span> to <var title="">E</var> whose
+      contents is a text node with its data set to <var
+      title="">update date</var>.</p>
+
+     </li>
+
+    </ol>
+
+   </li>
+
+   <li>
+
+    <p>If <var title="">global update date</var> has no value, then
+    let it have a value that is a <span>valid global date and time
+    string</span> representing the <span
+    title="concept-datetime">global date and time</span> of the date
+    and time of the <code>Document</code>'s source file's last
+    modification, if it is known, or else of the moment that this
+    algorithm was invoked.</p>
+
+   </li>
+
+   <li>
+
+    <p>Insert an <code title="">updated</code> element in the
+    <span>Atom namespace</span> into the root element of <var
+    title="">R</var> before the first <code title="">entry</code> in
+    the <span>Atom namespace</span> whose contents is a text node with
+    its data set to <var title="">global update date</var>.</p>
+
+   </li>
+
+   <li><p>Return the Atom document <var title="">R</var>.</p></li>
+
+  </ol>
+
+  <p>The <dfn>Atom namespace</dfn> is: <code>http://www.w3.org/2005/Atom</code></p>
+
+
+
 
 
   <h2 id="browsers">Web browsers</h2>