Bugzilla – Bug 890
Should be a conformance error if a UTF-16LE/BE file begins with the BOM
Last modified: 2011-12-29 07:21:56 CET
Created attachment 215 [details] File with HTTP level charset=utf-16be plus the "BOM". PROPOSAL: To include the BOM in a file sent (at HTTP level) with either the UTF-16LE label or the UTF-16BE label, should be seen as an error. The validator should inform about the error and recommend using 'utf-16' instead. (Theoretically, one could recommend removing the BOM too, but I am reluctant to do that.) BACKGROUND: The UTF-16 specs states: <http://tools.ietf.org/html/rfc2781#section-3.3> 'Systems labelling UTF-16LE text MUST NOT prepend a BOM to the text.' 'Systems labelling UTF-16BE text MUST NOT prepend a BOM to the text.' (Note that the above does not mean that the *text content* cannot begin with the BOM - it only means that one must add a BOM *before* the content.) As matter of fact, however, Web browsers do accept files which contains the BOM before the DOCTYPE even if the file, at HTTP level, is labelled UTF-16LE/BE. Some browsers (IE/Webkit) do *read* the BOM of such files - before removing it from the output (to avoid quirks-mode). While other browsers (Opera/Firefox) simply removes it from the output, without reading it. Both methods indicates that the characters is treated as a BOM since, per the UTF-16, the Effectively, this means that browsers treat UTF-16LE/BE as mislabeled UTF-16 - since otherwise, it would not be justified to remove the BOM. And the disadvantages to not doing it this way are that the page ought to be placed in quirks-mode due to the illegal BOM character before the DOCTYPE.
(In reply to comment #0) > - it only means that one must add a BOM *before* the content.) Sorry. Typo. Meant: "one must NOT add a BOM *before* the content.