Bug 293 – Turns out that Zs isn't what we want; we want White_Space

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 293 - Turns out that Zs isn't what we want; we want White_Space


Summary:	Turns out that Zs isn't what we want; we want White_Space

Status:	NEW

Product:	Validator.nu
Classification:	Unclassified
Component:	Non-schema checkers
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Nobody

URL:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-08-21 15:49 CEST by Henri Sivonen
Modified:	2009-11-23 17:16 CET (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henri Sivonen 2008-08-21 15:49:46 CEST

Modified: index
===================================================================
--- index	2008-08-21 09:34:06 UTC (rev 2093)
+++ index	2008-08-21 09:46:57 UTC (rev 2094)
@@ -3092,6 +3092,10 @@
   TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D
   CARRIAGE RETURN (CR).

+  <p>The <dfn id=whitespace title=White_Space>White_Space characters</dfn>
+   are those that have the Unicode property "White_Space". <a
+   href="#refsUNICODE">[UNICODE]</a>
+
  <p>Some of the micro-parsers described below follow the pattern of having
   an <var title="">input</var> variable that holds the string being parsed,
   and having a <var title="">position</var> variable pointing at the next
@@ -3126,10 +3130,11 @@
  <p>The step <dfn id=skip-whitespace>skip whitespace</dfn> means that the
   user agent must <a href="#collect">collect a sequence of characters</a>
   that are <a href="#space" title="space character">space characters</a>.
-   The step <dfn id=skip->skip Zs characters</dfn> means that the user agent
-   must <a href="#collect">collect a sequence of characters</a> that are in
-   the Unicode character class Zs. In both cases, the collected characters
-   are not used. <a href="#refsUNICODE">[UNICODE]</a>
+   The step <dfn id=skip->skip White_Space characters</dfn> means that the
+   user agent must <a href="#collect">collect a sequence of characters</a>
+   that are <a href="#whitespace">White_Space</a> characters. In both cases,
+   the collected characters are not used. <a
+   href="#refsUNICODE">[UNICODE]</a>

  <h4 id=boolean><span class=secno>2.4.2 </span>Boolean attributes</h4>

@@ -3485,9 +3490,9 @@
    sub-algorithm in step 2.

   <li>Starting with the character immediately after the last one examined by
-    the sub-algorithm in step 2, skip any characters in the string that are
-    in the Unicode character class Zs (this might match zero characters). <a
-    href="#refsUNICODE">[UNICODE]</a>
+    the sub-algorithm in step 2, skip all <a
+    href="#whitespace">White_Space</a> characters in the string (this might
+    match zero characters).

   <li>If there are still further characters in the string, and the next
    character in the string is a <a href="#valid2">valid denominator
@@ -3513,9 +3518,9 @@
    sub-algorithm in step 9.

   <li>Starting with the character immediately after the last one examined by
-    the sub-algorithm in step 9, skip any characters in the string that are
-    in the Unicode character class Zs (this might match zero characters). <a
-    href="#refsUNICODE">[UNICODE]</a>
+    the sub-algorithm in step 9, skip all <a
+    href="#whitespace">White_Space</a> characters in the string (this might
+    match zero characters).

   <li>If there are still further characters in the string, and the next
    character in the string is a <a href="#valid2">valid denominator
@@ -4214,9 +4219,9 @@
   <!-- LEADING WHITESPACE -->

   <li>
-    <p>For the "in content" variant: <a href="#skip-">skip Zs characters</a>;
-     for the "in attributes" variant: <a href="#skip-whitespace">skip
-     whitespace</a>.
+    <p>For the "in content" variant: <a href="#skip-">skip White_Space
+     characters</a>; for the "in attributes" variant: <a
+     href="#skip-whitespace">skip whitespace</a>.
   </li>
   <!-- XXX skip whitespace in attribute?
   really? -->
@@ -4321,7 +4326,7 @@
     <!-- WHITESPACE -->

     <li>
-      <p>For the "in content" variant: <a href="#skip-">skip Zs
+      <p>For the "in content" variant: <a href="#skip-">skip White_Space
       characters</a>; for the "in attributes" variant: <a
       href="#skip-whitespace">skip whitespace</a>.

@@ -4331,7 +4336,7 @@
       character.

     <li>
-      <p>For the "in content" variant: <a href="#skip-">skip Zs
+      <p>For the "in content" variant: <a href="#skip-">skip White_Space
       characters</a>; for the "in attributes" variant: <a
       href="#skip-whitespace">skip whitespace</a>.
     </li>
@@ -4440,7 +4445,7 @@

    <ol>
     <li>
-      <p>For the "in content" variant: <a href="#skip-">skip Zs
+      <p>For the "in content" variant: <a href="#skip-">skip White_Space
       characters</a>; for the "in attributes" variant: <a
       href="#skip-whitespace">skip whitespace</a>.

@@ -4546,9 +4551,9 @@
    </ol>

   <li>
-    <p>For the "in content" variant: <a href="#skip-">skip Zs characters</a>;
-     for the "in attributes" variant: <a href="#skip-whitespace">skip
-     whitespace</a>.
+    <p>For the "in content" variant: <a href="#skip-">skip White_Space
+     characters</a>; for the "in attributes" variant: <a
+     href="#skip-whitespace">skip whitespace</a>.

   <li>
    <p>If <var title="">position</var> is <em>not</em> past the end of <var
@@ -26219,9 +26224,8 @@
   href="#empty0" title="empty data cell">empty data cells</a>.

  <p>A data cell is said to be an <dfn id=empty0>empty data cell</dfn> if it
-   contains no elements and its text content, if any, consists only of
-   characters in the Unicode character class Zs. <a
-   href="#refsUNICODE">[UNICODE]</a>
+   contains no elements and its text content, if any, consists only of <a
+   href="#whitespace">White_Space</a> characters.

  <p>User agents may remove <a href="#empty0" title="empty data cell">empty
   data cells</a> when analyzing data in a <a href="#table1"

Modified: source
===================================================================
--- source	2008-08-21 09:34:06 UTC (rev 2093)
+++ source	2008-08-21 09:46:57 UTC (rev 2094)
@@ -1044,6 +1044,10 @@
  TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and
  U+000D CARRIAGE RETURN (CR).</p>

+  <p>The <dfn title="White_Space">White_Space characters</dfn> are
+  those that have the Unicode property "White_Space". <a
+  href="#refsUNICODE">[UNICODE]</a></p>
+
  <p>Some of the micro-parsers described below follow the pattern of
  having an <var title="">input</var> variable that holds the string
  being parsed, and having a <var title="">position</var> variable
@@ -1077,10 +1081,10 @@
  <p>The step <dfn>skip whitespace</dfn> means that the user agent
  must <span>collect a sequence of characters</span> that are <span
  title="space character">space characters</span>. The step <dfn>skip
-  Zs characters</dfn> means that the user agent must <span>collect a
-  sequence of characters</span> that are in the Unicode character
-  class Zs. In both cases, the collected characters are not used. <a
-  href="#refsUNICODE">[UNICODE]</a></p>
+  White_Space characters</dfn> means that the user agent must
+  <span>collect a sequence of characters</span> that are
+  <span>White_Space</span> characters. In both cases, the collected
+  characters are not used. <a href="#refsUNICODE">[UNICODE]</a></p>


  <h4>Boolean attributes</h4>
@@ -1464,9 +1468,9 @@
   sub-algorithm in step 2.</li>

   <li>Starting with the character immediately after the last one
-   examined by the sub-algorithm in step 2, skip any characters in the
-   string that are in the Unicode character class Zs (this might match
-   zero characters). <a href="#refsUNICODE">[UNICODE]</a></li>
+   examined by the sub-algorithm in step 2, skip all
+   <span>White_Space</span> characters in the string (this might match
+   zero characters).</li>

   <li>If there are still further characters in the string, and the
   next character in the string is a <span>valid denominator
@@ -1493,9 +1497,9 @@
   sub-algorithm in step 9.</li>

   <li>Starting with the character immediately after the last one
-   examined by the sub-algorithm in step 9, skip any characters in the
-   string that are in the Unicode character class Zs (this might match
-   zero characters). <a href="#refsUNICODE">[UNICODE]</a></li>
+   examined by the sub-algorithm in step 9, skip all
+   <span>White_Space</span> characters in the string (this might match
+   zero characters).</li>

   <li>If there are still further characters in the string, and the
   next character in the string is a <span>valid denominator
@@ -2237,7 +2241,7 @@
   returned as the result of the algorithm.</p></li>

   <!-- LEADING WHITESPACE -->
-   <li><p>For the "in content" variant: <span>skip Zs
+   <li><p>For the "in content" variant: <span>skip White_Space
   characters</span>; for the "in attributes" variant: <span>skip
   whitespace</span>.</p></li><!-- XXX skip whitespace in attribute?
   really? -->
@@ -2331,7 +2335,7 @@
     error, with just a date. -->

     <!-- WHITESPACE -->
-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
     characters</span>; for the "in attributes" variant: <span>skip
     whitespace</span>.</p></li>

@@ -2339,7 +2343,7 @@
     LATIN CAPITAL LETTER T, then move <var title="">position</var>
     forwards one character.</p></li>

-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
     characters</span>; for the "in attributes" variant: <span>skip
     whitespace</span>.</p></li>

@@ -2440,7 +2444,7 @@

    <ol>

-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
     characters</span>; for the "in attributes" variant: <span>skip
     whitespace</span>.</p></li>

@@ -2541,7 +2545,7 @@

   </li>

-   <li><p>For the "in content" variant: <span>skip Zs
+   <li><p>For the "in content" variant: <span>skip White_Space
   characters</span>; for the "in attributes" variant: <span>skip
   whitespace</span>.</p></li>

@@ -23618,8 +23622,7 @@

  <p>A data cell is said to be an <dfn>empty data cell</dfn> if it
  contains no elements and its text content, if any, consists only of
-  characters in the Unicode character class Zs. <a
-  href="#refsUNICODE">[UNICODE]</a></p>
+  <span>White_Space</span> characters.</p>

  <p>User agents may remove <span title="empty data cell">empty data
  cells</span> when analyzing data in a <span