Bugzilla – Bug 293
Turns out that Zs isn't what we want; we want White_Space
Last modified: 2009-11-23 17:16:58 CET
Modified: index =================================================================== --- index 2008-08-21 09:34:06 UTC (rev 2093) +++ index 2008-08-21 09:46:57 UTC (rev 2094) @@ -3092,6 +3092,10 @@ TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR). + <p>The <dfn id=whitespace title=White_Space>White_Space characters</dfn> + are those that have the Unicode property "White_Space". <a + href="#refsUNICODE">[UNICODE]</a> + <p>Some of the micro-parsers described below follow the pattern of having an <var title="">input</var> variable that holds the string being parsed, and having a <var title="">position</var> variable pointing at the next @@ -3126,10 +3130,11 @@ <p>The step <dfn id=skip-whitespace>skip whitespace</dfn> means that the user agent must <a href="#collect">collect a sequence of characters</a> that are <a href="#space" title="space character">space characters</a>. - The step <dfn id=skip->skip Zs characters</dfn> means that the user agent - must <a href="#collect">collect a sequence of characters</a> that are in - the Unicode character class Zs. In both cases, the collected characters - are not used. <a href="#refsUNICODE">[UNICODE]</a> + The step <dfn id=skip->skip White_Space characters</dfn> means that the + user agent must <a href="#collect">collect a sequence of characters</a> + that are <a href="#whitespace">White_Space</a> characters. In both cases, + the collected characters are not used. <a + href="#refsUNICODE">[UNICODE]</a> <h4 id=boolean><span class=secno>2.4.2 </span>Boolean attributes</h4> @@ -3485,9 +3490,9 @@ sub-algorithm in step 2. <li>Starting with the character immediately after the last one examined by - the sub-algorithm in step 2, skip any characters in the string that are - in the Unicode character class Zs (this might match zero characters). <a - href="#refsUNICODE">[UNICODE]</a> + the sub-algorithm in step 2, skip all <a + href="#whitespace">White_Space</a> characters in the string (this might + match zero characters). <li>If there are still further characters in the string, and the next character in the string is a <a href="#valid2">valid denominator @@ -3513,9 +3518,9 @@ sub-algorithm in step 9. <li>Starting with the character immediately after the last one examined by - the sub-algorithm in step 9, skip any characters in the string that are - in the Unicode character class Zs (this might match zero characters). <a - href="#refsUNICODE">[UNICODE]</a> + the sub-algorithm in step 9, skip all <a + href="#whitespace">White_Space</a> characters in the string (this might + match zero characters). <li>If there are still further characters in the string, and the next character in the string is a <a href="#valid2">valid denominator @@ -4214,9 +4219,9 @@ <!-- LEADING WHITESPACE --> <li> - <p>For the "in content" variant: <a href="#skip-">skip Zs characters</a>; - for the "in attributes" variant: <a href="#skip-whitespace">skip - whitespace</a>. + <p>For the "in content" variant: <a href="#skip-">skip White_Space + characters</a>; for the "in attributes" variant: <a + href="#skip-whitespace">skip whitespace</a>. </li> <!-- XXX skip whitespace in attribute? really? --> @@ -4321,7 +4326,7 @@ <!-- WHITESPACE --> <li> - <p>For the "in content" variant: <a href="#skip-">skip Zs + <p>For the "in content" variant: <a href="#skip-">skip White_Space characters</a>; for the "in attributes" variant: <a href="#skip-whitespace">skip whitespace</a>. @@ -4331,7 +4336,7 @@ character. <li> - <p>For the "in content" variant: <a href="#skip-">skip Zs + <p>For the "in content" variant: <a href="#skip-">skip White_Space characters</a>; for the "in attributes" variant: <a href="#skip-whitespace">skip whitespace</a>. </li> @@ -4440,7 +4445,7 @@ <ol> <li> - <p>For the "in content" variant: <a href="#skip-">skip Zs + <p>For the "in content" variant: <a href="#skip-">skip White_Space characters</a>; for the "in attributes" variant: <a href="#skip-whitespace">skip whitespace</a>. @@ -4546,9 +4551,9 @@ </ol> <li> - <p>For the "in content" variant: <a href="#skip-">skip Zs characters</a>; - for the "in attributes" variant: <a href="#skip-whitespace">skip - whitespace</a>. + <p>For the "in content" variant: <a href="#skip-">skip White_Space + characters</a>; for the "in attributes" variant: <a + href="#skip-whitespace">skip whitespace</a>. <li> <p>If <var title="">position</var> is <em>not</em> past the end of <var @@ -26219,9 +26224,8 @@ href="#empty0" title="empty data cell">empty data cells</a>. <p>A data cell is said to be an <dfn id=empty0>empty data cell</dfn> if it - contains no elements and its text content, if any, consists only of - characters in the Unicode character class Zs. <a - href="#refsUNICODE">[UNICODE]</a> + contains no elements and its text content, if any, consists only of <a + href="#whitespace">White_Space</a> characters. <p>User agents may remove <a href="#empty0" title="empty data cell">empty data cells</a> when analyzing data in a <a href="#table1" Modified: source =================================================================== --- source 2008-08-21 09:34:06 UTC (rev 2093) +++ source 2008-08-21 09:46:57 UTC (rev 2094) @@ -1044,6 +1044,10 @@ TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).</p> + <p>The <dfn title="White_Space">White_Space characters</dfn> are + those that have the Unicode property "White_Space". <a + href="#refsUNICODE">[UNICODE]</a></p> + <p>Some of the micro-parsers described below follow the pattern of having an <var title="">input</var> variable that holds the string being parsed, and having a <var title="">position</var> variable @@ -1077,10 +1081,10 @@ <p>The step <dfn>skip whitespace</dfn> means that the user agent must <span>collect a sequence of characters</span> that are <span title="space character">space characters</span>. The step <dfn>skip - Zs characters</dfn> means that the user agent must <span>collect a - sequence of characters</span> that are in the Unicode character - class Zs. In both cases, the collected characters are not used. <a - href="#refsUNICODE">[UNICODE]</a></p> + White_Space characters</dfn> means that the user agent must + <span>collect a sequence of characters</span> that are + <span>White_Space</span> characters. In both cases, the collected + characters are not used. <a href="#refsUNICODE">[UNICODE]</a></p> <h4>Boolean attributes</h4> @@ -1464,9 +1468,9 @@ sub-algorithm in step 2.</li> <li>Starting with the character immediately after the last one - examined by the sub-algorithm in step 2, skip any characters in the - string that are in the Unicode character class Zs (this might match - zero characters). <a href="#refsUNICODE">[UNICODE]</a></li> + examined by the sub-algorithm in step 2, skip all + <span>White_Space</span> characters in the string (this might match + zero characters).</li> <li>If there are still further characters in the string, and the next character in the string is a <span>valid denominator @@ -1493,9 +1497,9 @@ sub-algorithm in step 9.</li> <li>Starting with the character immediately after the last one - examined by the sub-algorithm in step 9, skip any characters in the - string that are in the Unicode character class Zs (this might match - zero characters). <a href="#refsUNICODE">[UNICODE]</a></li> + examined by the sub-algorithm in step 9, skip all + <span>White_Space</span> characters in the string (this might match + zero characters).</li> <li>If there are still further characters in the string, and the next character in the string is a <span>valid denominator @@ -2237,7 +2241,7 @@ returned as the result of the algorithm.</p></li> <!-- LEADING WHITESPACE --> - <li><p>For the "in content" variant: <span>skip Zs + <li><p>For the "in content" variant: <span>skip White_Space characters</span>; for the "in attributes" variant: <span>skip whitespace</span>.</p></li><!-- XXX skip whitespace in attribute? really? --> @@ -2331,7 +2335,7 @@ error, with just a date. --> <!-- WHITESPACE --> - <li><p>For the "in content" variant: <span>skip Zs + <li><p>For the "in content" variant: <span>skip White_Space characters</span>; for the "in attributes" variant: <span>skip whitespace</span>.</p></li> @@ -2339,7 +2343,7 @@ LATIN CAPITAL LETTER T, then move <var title="">position</var> forwards one character.</p></li> - <li><p>For the "in content" variant: <span>skip Zs + <li><p>For the "in content" variant: <span>skip White_Space characters</span>; for the "in attributes" variant: <span>skip whitespace</span>.</p></li> @@ -2440,7 +2444,7 @@ <ol> - <li><p>For the "in content" variant: <span>skip Zs + <li><p>For the "in content" variant: <span>skip White_Space characters</span>; for the "in attributes" variant: <span>skip whitespace</span>.</p></li> @@ -2541,7 +2545,7 @@ </li> - <li><p>For the "in content" variant: <span>skip Zs + <li><p>For the "in content" variant: <span>skip White_Space characters</span>; for the "in attributes" variant: <span>skip whitespace</span>.</p></li> @@ -23618,8 +23622,7 @@ <p>A data cell is said to be an <dfn>empty data cell</dfn> if it contains no elements and its text content, if any, consists only of - characters in the Unicode character class Zs. <a - href="#refsUNICODE">[UNICODE]</a></p> + <span>White_Space</span> characters.</p> <p>User agents may remove <span title="empty data cell">empty data cells</span> when analyzing data in a <span