NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 1003 - Why is Unit Separator x1F FORBIDDEN?
Why is Unit Separator x1F FORBIDDEN?
Status: RESOLVED INTENTIONAL
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Nobody
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-17 05:32 CEST by jc
Modified: 2014-09-22 11:25 CEST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jc 2014-09-17 05:32:27 CEST
This may be related to the bug reports 311 and 520. They are too cryptic for me to understand. Please understand this is not any accusation of any error on your part. I just want to know if we are doing something terrible harming someone else.

When I validate the page http://ahangama.com/liyamu/ (and many others)
I get this error:  Forbidden code point U+001f.

We have a very successful project where a complex script is romanized and displayed using an orthographic smartfont. The font has many ligatures whose formation has to be prevented in certain contexts. For a few years we used ZWNJ for this purpose. Now as projects expand, quite a few people such as Buddhist monks in remote places without modern communication means do digitizing work on very old texts. They find it confusing when the computer saves some Notepad files just fine while it issues an error saying the file has Unicode characters in another and that they will lose information if they save the file the usual way. What is the correct file type, will the files not match with each other, am I ruining this project etc. are thoughts in these people and they stop. The problem is double-byte ZWNJ.

So, we changed the keyboard to replace ZWNJ with Unit Separator, US (x1F). Everybody is pleased with it. Now I find this ominous warning, FORBIDDEN, as if a terrible sin was committed. When I searched I found this page:
http://www.fileformat.info/info/unicode/char/1f/index.htm

Their report gives US a clear certificate. Probably, they are not as legitimate as you.

May I know why this character is forbidden and what is the damage it is causing? A 'validated' certification from you would be very reassuring to people if this is possible.

Thank you for the great public service.

JC
Comment 1 Michael[tm] Smith 2014-09-22 11:25:49 CEST
U+001f is a control character. It's not intended to be used in Web documents. The HTML spec requires conformance checkers to report an error for it. So if you want it to be allowed, you need to report a bug against the HTML spec itself:

  https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WHATWG&component=HTML