NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 622 - Reduce amount of NamedCharacters generated code
Reduce amount of NamedCharacters generated code
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 enhancement
Assigned To: Ben Newman
Depends on: 623
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-21 23:27 CEST by Ben Newman
Modified: 2009-08-07 17:10 CEST (History)
1 user (show)

See Also:


Attachments
Generating and #including nsHtml5NamedCharactersInclude.h four times (9.23 KB, patch)
2009-07-21 23:33 CEST, Ben Newman
Details
Updated to reflect changes from bug 623 (11.03 KB, patch)
2009-07-22 02:05 CEST, Ben Newman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Newman 2009-07-21 23:27:09 CEST
The C++ code generated by GenerateNamedCharactersCpp.java contains four similar segments of 2100+ lines (initializing NAME_*, VALUE_*, NAMES, and VALUES).  We can reduce the repetition between these segments by consolidating all the named character reference data into a single file that can be #included four times.  This would make the structure of nsHtml5NamedCharacters.cpp easier to understand, and reduce C++ code size by many thousands of lines.
Comment 1 Ben Newman 2009-07-21 23:33:13 CEST
Created attachment 110 [details]
Generating and #including nsHtml5NamedCharactersInclude.h four times

Each line of nsHtml5NamedCharactersInclude.h is a macro call that expands according to how the #including file has defined NAMED_CHARACTER_REFERENCE.  This file can be #included four different times with four different interpretations of NAMED_CHARACTER_REFERENCE.

According to my measurements, the generated-code savings are huge:
~/dev/debug/parser/html % hg diff . | diffstat
 nsHtml5NamedCharacters.cpp      |17121 ----------------------------------------
 nsHtml5NamedCharactersInclude.h | 2162 +++++
 2 files changed, 2184 insertions(+), 17099 deletions(-)

The LINE_PATTERN-related changes were necessary because the HTML content of

  http://www.w3.org/TR/html5/named-character-references.html

seems to have changed a bit (no more newlines between <tr>s, slightly different spacing).
Comment 2 Ben Newman 2009-07-22 02:05:01 CEST
Created attachment 112 [details]
Updated to reflect changes from bug 623

This patch needs to be applied in translator-src with patch level -p1.

It should apply cleanly on top of http://bugzilla.validator.nu/attachment.cgi?id=111&action=edit
Comment 3 Henri Sivonen 2009-08-07 17:09:40 CEST
Checked in. Thanks!