Bugzilla – Bug 575
How to extract an Atom feed from an HTML5 document
Last modified: 2009-05-27 14:53:58 CEST
Index: source =================================================================== --- source (revision 3115) +++ source (revision 3116) @@ -13352,11 +13352,12 @@ <dt>Content attributes:</dt> <dd><span>Global attributes</span></dd> <dd><code title="attr-article-cite">cite</code></dd> - <!-- v2 attributes to give the date authored, date published, name of author, etc --> + <dd><code title="attr-article-pubdate">pubdate</code></dd> <dt>DOM interface:</dt> <dd> <pre class="idl">interface <dfn>HTMLArticleElement</dfn> : <span>HTMLElement</span> { attribute DOMString <span title="dom-article-cite">cite</span>; + attribute DOMString <span title="dom-article-pubDate">pubDate</span>; };</pre> </dd> </dl> @@ -13397,11 +13398,20 @@ element. User agents should allow users to follow such citation links.</span></p> + <p>The <dfn title="attr-article-pubdate"><code>pubdate</code></dfn> + attribute may be used to specify the time and date that the article + was first published. If present, the <code + title="attr-article-pubdate">pubdate</code> attribute must be a + <span>valid global date and time string</span> value.</p> + <div class="impl"> <p>The <dfn title="dom-article-cite"><code>cite</code></dfn> DOM attribute must <span>reflect</span> the element's <code - title="attr-article-cite">cite</code> content attribute.</p> + title="attr-article-cite">cite</code> content attribute. The <dfn + title="dom-article-pubDate"><code>pubDate</code></dfn> DOM attribute + must <span>reflect</span> the element's <code + title="attr-article-pubdate">pubdate</code> content attribute.</p> </div> @@ -28445,12 +28455,12 @@ rules given for URL decomposition attributes, with the <span title="concept-uda-input">input</span> being the result of <span title="resolve a url">resolving</span> the element's <code - title="attr-area-href">href</code> attribute relative to the + title="attr-hyperlink-href">href</code> attribute relative to the element, if there is such an attribute and resolving it is successful, or the empty string otherwise; and the <span title="concept-uda-setter">common setter action</span> being the same as setting the element's <code - title="attr-area-href">href</code> attribute to the new output + title="attr-hyperlink-href">href</code> attribute to the new output value.</p> </div> @@ -28480,9 +28490,9 @@ specifying a <dfn title="attr-hyperlink-usemap"><code>usemap</code></dfn> attribute on the <code>img</code> or <code>object</code> element. The <code - title="attr-area-usemap">usemap</code> attribute, if specified, must - be a <span>valid hash-name reference</span> to a <code>map</code> - element.</p> + title="attr-hyperlink-usemap">usemap</code> attribute, if specified, + must be a <span>valid hash-name reference</span> to a + <code>map</code> element.</p> <div class="example"> @@ -28515,8 +28525,8 @@ <p>If an <code>img</code> element or an <code>object</code> element representing an image has a <code - title="attr-area-usemap">usemap</code> attribute specified, user - agents must process it as follows:</p> + title="attr-hyperlink-usemap">usemap</code> attribute specified, + user agents must process it as follows:</p> <ol> @@ -49967,7 +49977,7 @@ <p>If the element is, or is a descendant of, an <code>address</code> element that <a href="#applyToSection">applies</a> to <span>the - <code>body</code> element</span>, an the <span + <code>body</code> element</span>, and the <span title="concept-item">item</span> has the type <code title="md-vcard">vcard</code>, generate the following triple:</p> @@ -51225,6 +51235,560 @@ </ol> + <h4>Atom</h4> + + <p>Given a <code>Document</code> <var title="">source</var>, a user + agent must run the following algorithm to <dfn title="extracting + Atom">extract an Atom feed</dfn>:</p> + + <ol> + + <li><p>If the <code>Document</code> <var title="">source</var> does + not contain any <code>article</code> elements, then return nothing + and abort these steps. This algorithm can only be used with + documents that contain distinct articles.</p> + + <li><p>Let <var title="">R</var> be an empty <span title="XML + documents">XML</span> <code>Document</code> object whose <span + title="the document's address">address</span> is user-agent + defined.</p></li> + + <li><p>Append a <code title="">feed</code> element in the + <span>Atom namespace</span> to <var title="">R</var>.</p></li> + + <li> + + <p>For each element <var title="">candidate</var> that is, or is a + descendant of, an <code>address</code> element that <a + href="#applyToSection">applies</a> to <span>the <code>body</code> + element</span>, and that is an <span + title="concept-item">item</span> that has the type <code + title="md-vcard">vcard</code>, if there is a property <var + title="">property</var> named <code title="md-vcard-fn">fn</code> + whose <span title="concept-item-corresponding">corresponding + item</span> is <var title="">candidate</var>, and the <span + title="concept-property-value">value</span> of <var + title="">property</var> is not an <span + title="concept-item">item</span>, then append an <code + title="">author</code> element in the <span>Atom namespace</span> + to the root element of <var title="">R</var> whose contents is a + text node with its data set to the <span + title="concept-property-value">value</span> of <var + title="">property</var>.</p> + + </li> + + <li> + + <p>If there is a <code>link</code> element whose <code + title="attr-link-rel">rel</code> attribute's value includes the + keyword <code title="rel-icon">icon</code>, and that element also + has an <code title="attr-link-href">href</code> attribute, then + append an <code title="">icon</code> element in the <span>Atom + namespace</span> to the root element of <var title="">R</var> + whose contents is a text node with its data set to the + <span>absolute URL</span> resulting from <span title="resolve a + url">resolving</span> the value of the <code + title="attr-link-href">href</code> attribute relative to the + <code>link</code> element.</p> + + <!-- could check ratio, could check type... --> + + </li> + + <li> + + <p>Append an <code title="">id</code> element in the <span>Atom + namespace</span> to the root element of <var title="">R</var> + whose contents is a text node with its data set to <span>the + document's current address</span>.</p> + + </li> + + <li> + + <p>Optionally: Let <var title="">x</var> be a <code + title="">link</code> element in the <span>Atom + namespace</span>. Add a <code title="">rel</code> attribute whose + value is the string "<code title="">self</code>" to <var + title="">x</var>. Append a text node with its data set to the + (user-agent defined) <span title="the document's + address">address</span> of <var title="">R</var> to <var + title="">x</var>. Append <var title="">x</var> to the root element + of <var title="">R</var>.</p> + + <p class="note">This step would be skipped when the document <var + title="">R</var> has no convenient <span title="the document's + address">address</span>. The presence of the <code + title="">rel="self"</code> link is a "should"-level requirement in + the Atom specification.</p> + + </li> + + <li> + + <p>Let <var title="">x</var> be a <code title="">link</code> + element in the <span>Atom namespace</span>. Add a <code + title="">rel</code> attribute whose value is the string "<code + title="">alternate</code>" to <var title="">x</var>. Add a <code + title="">type</code> attribute whose value is the string "<code + title="">text/html</code>" to <var title="">x</var>. Append a text + node with its data set to <span>the document's current + address</span> to <var title="">x</var>. Append <var + title="">x</var> to the root element of <var title="">R</var>.</p> + + </li> + + <li> + + <p>Let <var title="">x</var> be a <code title="">link</code> + element in the <span>Atom namespace</span>. Add a <code + title="">rel</code> attribute whose value is the string "<code + title="">alternate</code>" to <var title="">x</var>. If the + document being converted is an <span title="HTML documents">HTML + document</span>, add a <code title="">type</code> attribute whose + value is the string "<code title="">text/html</code>" to <var + title="">x</var>. Otherwise, the document being converted is an + <span title="XML documents">XML document</span>; add a <code + title="">type</code> attribute whose value is the string "<code + title="">application/xhtml+xml</code>" to <var + title="">x</var>. Append a text node with its data set to + <span>the document's current address</span> to <var + title="">x</var>. Append <var title="">x</var> to the root element + of <var title="">R</var>.</p> + + </li> + + <li><p>Let <var title="">subheading text</var> be the empty + string.</p></li> + + <li><p>Let <var title="">heading</var> be the first element of + <span>heading content</span> whose nearest ancestor of + <span>sectioning content</span> is the <span>the body + element</span>, if any, or null if there is none.</p></li> + + <li> + + <p>Take the appropriate action from the following list, as + determined by the type of the <var title="">heading</var> + element:</p> + + <dl> + + <dt>If <var title="">heading</var> is null</dt> + + <dd> + + <p>Let <var title="">heading text</var> be the + <code>textContent</code> of <span>the <code>title</code> + element</span>, if there is one, or the empty string + otherwise.</p> + + </dd> + + <dt>If <var title="">heading</var> is a <code>hgroup</code> element</dt> + + <dd> + + <p>If <var title="">heading</var> contains no child + <code>h1</code>–<code>h6</code> elements, let <var + title="">heading text</var> be the empty string.</p> + + <p>Otherwise, let <var title="">headings list</var> be a list of + all the <code>h1</code>–<code>h6</code> element children + of <var title="">heading</var>, sorted first by descending + <span>rank</span> and then in <span>tree order</span> (so + <code>h1</code>s first, then <code>h2</code>s, etc, with each + group in the order they appear in the document). Then, let <var + title="">heading text</var> be the <code>textContent</code> of + the first entry in <var title="">headings list</var>, and if + there are multiple entries, let <var title="">subheading + text</var> be the <code>textContent</code> of the second entry + in <var title="">headings list</var>.</p> + + </dd> + + <dt>If <var title="">heading</var> is an <code>h1</code>–<code>h6</code> element</dt> + + <dd> + + <p>Let <var title="">heading text</var> be the + <code>textContent</code> of <var title="">heading</var>.</p> + + </dd> + + </dl> + + </li> + + <li> + + <p>Append a <code title="">title</code> element in the <span>Atom + namespace</span> to the root element of <var title="">R</var> + whose contents is a text node with its data set to <var + title="">heading text</var>.</p> + + </li> + + <li> + + <p>If <var title="">subheading text</var> is not the empty string, + append a <code title="">subtitle</code> element in the <span>Atom + namespace</span> to the root element of <var title="">R</var> + whose contents is a text node with its data set to <var + title="">subheading text</var>.</p> + + </li> + + <li><p>Let <var title="">global update date</var> have no + value.</p></li> + + <li> + + <p>For each <code>article</code> element <var + title="">article</var> that does not have an ancestor + <code>article</code> element, run the following steps:</p> + + <ol> + + <li><p>Let <var title="">E</var> be an <code + title="">entry</code> element in the <span>Atom namespace</span>, + and append <var title="">E</var> to the root element of <var + title="">R</var>.</p></li> + + <li><p>Let <var title="">heading</var> be the first element of + <span>heading content</span> whose nearest ancestor of + <span>sectioning content</span> is <var title="">article</var>, + if any, or null if there is none.</p></li> + + <li> + + <p>Take the appropriate action from the following list, as + determined by the type of the <var title="">heading</var> + element:</p> + + <dl> + + <dt>If <var title="">heading</var> is null</dt> + + <dd> + + <p>Let <var title="">heading text</var> be the empty + string.</p> + + </dd> + + <dt>If <var title="">heading</var> is a <code>hgroup</code> element</dt> + + <dd> + + <p>If <var title="">heading</var> contains no child + <code>h1</code>–<code>h6</code> elements, let <var + title="">heading text</var> be the empty string.</p> + + <p>Otherwise, let <var title="">headings list</var> be a list + of all the <code>h1</code>–<code>h6</code> element + children of <var title="">heading</var>, sorted first by + descending <span>rank</span> and then in <span>tree + order</span> (so <code>h1</code>s first, then + <code>h2</code>s, etc, with each group in the order they + appear in the document). Then, let <var title="">heading + text</var> be the <code>textContent</code> of the first entry + in <var title="">headings list</var>.</p> + + </dd> + + <dt>If <var title="">heading</var> is an <code>h1</code>–<code>h6</code> element</dt> + + <dd> + + <p>Let <var title="">heading text</var> be the + <code>textContent</code> of <var title="">heading</var>.</p> + + </dd> + + </dl> + + </li> + + <li> + + <p>Append a <code title="">title</code> element in the + <span>Atom namespace</span> to <var title="">E</var> whose + contents is a text node with its data set to <var + title="">heading text</var>.</p> + + </li> + + <li> + + <p>For each element <var title="">candidate</var> that is, or is + a descendant of, an <code>address</code> element that <a + href="#applyToSection">applies</a> to <var + title="">article</var>, and that is an <span + title="concept-item">item</span> that has the type <code + title="md-vcard">vcard</code>, if there is a property <var + title="">property</var> named <code + title="md-vcard-fn">fn</code> whose <span + title="concept-item-corresponding">corresponding item</span> is + <var title="">candidate</var>, and the <span + title="concept-property-value">value</span> of <var + title="">property</var> is not an <span + title="concept-item">item</span>, then append an <code + title="">author</code> element in the <span>Atom + namespace</span> to <var title="">E</var> whose contents is a + text node with its data set to the <span + title="concept-property-value">value</span> of <var + title="">property</var>.</p> + + </li> + + <li> + + <p>Clone <var title="">article</var> and its descendants into an + environment that has <span title="concept-bc-noscript">scripting + disabled</span>, has no <span title="plugin">plugins</span>, and + fails any attempt to <span title="fetch">fetch</span> any + resources. Let <var title="">cloned article</var> be the + resulting clone <code>article</code> element.</p> + + </li> + + <li> + + <p>Remove from the subtree rooted at <var title="">cloned + article</var> any <code>article</code> elements other than the + <var title="">cloned article</var> itself, any + <code>header</code>, <code>footer</code>, or <code>nav</code> + elements whose nearest ancestor of <span>sectioning + content</span> is the <var title="">cloned article</var>, and + the first element of <span>heading content</span> whose nearest + ancestor of <span>sectioning content</span> is the <var + title="">cloned article</var>, if any.</p> + + </li> + + <li> + + <p>If <var title="">cloned article</var> contains any + <code>ins</code> or <code>del</code> elements with <code + title="attr-mod-datetime">datetime</code> attributes whose + values <span title="parse a global date and time string">parse + as global date and time strings</span> without errors, then let + <var title="">update date</var> be the value of the <code + title="attr-mod-datetime">datetime</code> attribute that parses + to the newest <span title="concept-datetime">global date and + time</span>.</p> + + <p>Otherwise, let <var title="">update date</var> have no + value.</p> + + <p class="note">This value is used below; it is calculated here + because in certain cases the next step mutates the <var + title="">cloned article</var>.</p> + + </li> + + <li> + + <p>If the document being converted is an <span title="HTML + documents">HTML document</span>, then: Let <var title="">x</var> + be a <code title="">content</code> element in the <span>Atom + namespace</span>. Add a <code title="">type</code> attribute + whose value is the string "<code title="">html</code>" to <var + title="">x</var>. Append a text node with its data set to the + result of running the <span>HTML fragment serialization + algorithm</span> on <var title="">cloned article</var> to <var + title="">x</var>. Append <var title="">x</var> to <var + title="">E</var>.</p> + + <p>Otherwise, the document being converted is an <span + title="XML documents">XML document</span>: Let <var + title="">x</var> be a <code title="">content</code> element in + the <span>Atom namespace</span>. Add a <code + title="">type</code> attribute whose value is the string "<code + title="">xml</code>" to <var title="">x</var>. Append a + <code>div</code> element to <var title="">x</var>. Move all the + child nodes of the <var title="">cloned article</var> node to + that <code>div</code> element, preserving their relative + order. Append <var title="">x</var> to <var + title="">E</var>.</p> + + </li> + + <li> + + <p>Establish the value of <var title="">id</var> and <var + title="">has-alternate</var> from the first of the following to + apply:</p> + + <dl> + + <dt>If the <var title="">article</var> node has a <code + title="attr-article-cite">cite</code> attribute</dt> + + <dd>Let <var title="">id</var> be the <span>absolute URL</span> + resulting from <span title="resolve a url">resolving</span> the + value of the <code title="attr-article-cite">cite</code> + relative to the <var title="">article</var> element. Let <var + title="">has-alternate</var> be true.</dd> + + <dt>If the <var title="">article</var> node has a descendant + <code>a</code> or <code>area</code> element with an <code + title="attr-hyperlink-href">href</code> attribute and a <code + title="attr-a-rel">rel</code> attribute whose value includes + the <code title="rel-bookmark">bookmark</code> keyword</dt> + + <dd>Let <var title="">id</var> be the <span>absolute URL</span> + resulting from <span title="resolve a url">resolving</span> the + value of the <code title="attr-hyperlink-href">href</code> + attribute of the first such <code>a</code> ot <code>area</code> + element, relative to the element. Let <var + title="">has-alternate</var> be true.</dd> + + <dt>If the <var title="">article</var> node has an <code + title="attr-id">id</code> attribute</dt> + + <dd>Let <var title="">id</var> be <span>the document's current + address</span>, with the fragment identifier (if any) removed, + and with a new fragment identifier specified, consisting of the + value of the <var title="">article</var> element's <code + title="attr-id">id</code> attribute. Let <var + title="">has-alternate</var> be false.</dd> + + <dt>Otherwise</dt> + + <dd>Let <var title="">id</var> be a user-agent defined + undereferencable yet globally unique <span>absolute + URL</span>. Let <var title="">has-alternate</var> be + false.</dd> + + </dl> + + </li> + + <li> + + <p>Append an <code title="">id</code> element in the <span>Atom + namespace</span> to <var title="">E</var> whose contents is a + text node with its data set to <var title="">id</var>.</p> + + </li> + + <li> + + <p>If <var title="">has-alternate</var> is true: Let <var + title="">x</var> be a <code title="">link</code> element in the + <span>Atom namespace</span>. Add a <code title="">rel</code> + attribute whose value is the string "<code + title="">alternate</code>" to <var title="">x</var>. Append a + text node with its data set to <var title="">id</var> to <var + title="">x</var>. Append <var title="">x</var> to <var + title="">E</var>.</p> + + </li> + + <li> + + <p>If <var title="">article</var> has a <code + title="attr-article-pubdate">pubdate</code> attribute, and <span + title="parse a global date and time string">parsing that + attribute's value as a global date and time string</span> does + not result in an error, then let <var title="">publication + date</var> be the value of that attribute.</p> + + <p>Otherwise, let <var title="">publication date</var> have no + value.</p> + + </li> + + <li> + + <p>If <var title="">update date</var> has no value but <var + title="">publication date</var> does, then let <var + title="">update date</var> have the value of <var + title="">publication date</var>.</p> + + <p>Otherwise, if <var title="">publication date</var> has no + value but <var title="">update date</var> does, then let <var + title="">publication date</var> have the value of <var + title="">update date</var>.</p> + + </li> + + <li> + + <p>If <var title="">update date</var> has a value, and <var + title="">global update date</var> has no value or is less recent + than <var title="">update date</var>, then let <var + title="">global update date</var> have the value of <var + title="">update date</var>.</p> + + </li> + + <li> + + <p>If <var title="">publication date</var> and <var + title="">update date</var> both still have no value, then let + them both value a value that is a <span>valid global date and + time string</span> representing the <span + title="concept-datetime">global date and time</span> of the + moment that this algorithm was invoked.</p> + + </li> + + <li> + + <p>Append an <code title="">published</code> element in the + <span>Atom namespace</span> to <var title="">E</var> whose + contents is a text node with its data set to <var + title="">publication date</var>.</p> + + </li> + + <li> + + <p>Append an <code title="">updated</code> element in the + <span>Atom namespace</span> to <var title="">E</var> whose + contents is a text node with its data set to <var + title="">update date</var>.</p> + + </li> + + </ol> + + </li> + + <li> + + <p>If <var title="">global update date</var> has no value, then + let it have a value that is a <span>valid global date and time + string</span> representing the <span + title="concept-datetime">global date and time</span> of the date + and time of the <code>Document</code>'s source file's last + modification, if it is known, or else of the moment that this + algorithm was invoked.</p> + + </li> + + <li> + + <p>Insert an <code title="">updated</code> element in the + <span>Atom namespace</span> into the root element of <var + title="">R</var> before the first <code title="">entry</code> in + the <span>Atom namespace</span> whose contents is a text node with + its data set to <var title="">global update date</var>.</p> + + </li> + + <li><p>Return the Atom document <var title="">R</var>.</p></li> + + </ol> + + <p>The <dfn>Atom namespace</dfn> is: <code>http://www.w3.org/2005/Atom</code></p> + + + <h2 id="browsers">Web browsers</h2>