Closed Bug 501082 Opened 15 years ago Closed 15 years ago

[HTML5] MSVC takes forever to optimize nsHtml5NamedCharacters::initializeStatics

Categories

(Core :: DOM: HTML Parser, defect, P4)

All
Windows XP
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: Gavin, Assigned: dbaron)

References

()

Details

Attachments

(3 files, 3 obsolete files)

I landed a workaround: https://hg.mozilla.org/mozilla-central/rev/f6333b700d60

We could experiment with disabling specific optimizations (per http://msdn.microsoft.com/en-us/library/chh3fb0k.aspx ) and seeing which one is to blame.
Summary: [HTML 5] MSVC takes forever to optimize nsHtml5NamedCharacters::initializeStatics → [HTML5] MSVC takes forever to optimize nsHtml5NamedCharacters::initializeStatics
this is owning me in scratchbox as well.  This file is pretty gross. :(
URL points to callgrind profile of gcc 4.3 compiling this file.

Maybe a gcc expert can infer what bad behavior this file triggers?
This makes compilation of this file much much faster with gcc, and I presume it does the same with MSVC.  It also makes the resulting code significantly smaller.  (Storing data in the form of code is generally a bad idea.)

The approach taken here, though, has some costs in terms of library loading, since the big array is full of pointers that need to be relocated.  One could write a slightly more complicated (and also more space-efficient and time-efficient) patch that put all the strings in one big buffer and stored indices into that buffer.

I might do that, but before I set out to do that, I'd want to know whether these classes required their strings to be null-terminated.  (It looks like they might not, which would mean we wouldn't need to waste all that space on nulls.)


I just now found the code that generates this code.
parser/html/java/htmlparser/translator-src/nu/validator/htmlparser/generator/GenerateNamedCharactersCpp.java
But, boy, it should really print a message at the top of the generated file saying what generated it.

If you're ok with these code changes, I can attack the generator next, although maybe I should just go straight to the fancier patch if I'm going to have a fight with the generator.
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
Attachment #429642 - Flags: review?
Attachment #429642 - Flags: review? → review?(hsivonen)
(In reply to comment #3)
> I just now found the code that generates this code.
> parser/html/java/htmlparser/translator-src/nu/validator/htmlparser/generator/GenerateNamedCharactersCpp.java
> But, boy, it should really print a message at the top of the generated file
> saying what generated it.
> 
> If you're ok with these code changes, I can attack the generator next, although
> maybe I should just go straight to the fancier patch if I'm going to have a
> fight with the generator.

If you decide to make changes to the generator, be aware that the now-canonical version of GenerateNamedCharactersCpp.java is

http://hg.mozilla.org/projects/htmlparser/file/546412142175/translator-src/nu/validator/htmlparser/generator/GenerateNamedCharactersCpp.java

You can re-run the generator by cd'ing to parser/html/java in a mozilla-central tree and typing

  make sync
  make named_characters
(In reply to comment #4)
> If you decide to make changes to the generator, be aware that the now-canonical
> version of GenerateNamedCharactersCpp.java is
> 
> http://hg.mozilla.org/projects/htmlparser/file/546412142175/translator-src/nu/validator/htmlparser/generator/GenerateNamedCharactersCpp.java

Please ignore that part of my comment.  We're talking about the same thing.  I misread your comment.
Blocking 506090 since dbaron's compilation/initialization optimizations might interfere with hsivonen's runtime optimizations.
Blocks: 506090
This should be considerably more efficient than the previous patch, both at startup time (no relocations within const arrays) and because we're storing less data (no useless null terminators, and all information about an entity fitting within 64 bits since we can just store 4 16-bit integers).

One disadvantage of this approach is that it doesn't do any correctness checking on the LEN and SIZE arguments of the macro, but presumably test failures would catch that.


I still haven't attempted to attack the generator (and I'd welcome somebody else doing it, too).
Attachment #429642 - Attachment is obsolete: true
Attachment #429650 - Flags: review?(hsivonen)
Attachment #429642 - Flags: review?(hsivonen)
Oh, and I realize the patch I attached does all sorts of wacky things regarding style of variable names.  I was having trouble deciding what the surrounding style was.  Advice welcome.
(In reply to comment #6)
> Blocking 506090 since dbaron's compilation/initialization optimizations might
> interfere with hsivonen's runtime optimizations.

They interfere with the generator changes made for the runtime optimizations.

(In reply to comment #8)
> Oh, and I realize the patch I attached does all sorts of wacky things regarding
> style of variable names.  I was having trouble deciding what the surrounding
> style was.  Advice welcome.

The surrounding style is:
If a variable name was originally written as Java code, it follows the Java conventions (i.e. static final is uppercase with underscores), because making the translation change naming style was considered unnecessary. And if the code was originally written as C++, it follows C++ conventions (i.e. macros are uppercase with underscores).

(In reply to comment #9)
> And I missed two occurrences of |static| for ALL_NAMES and ALL_VALUES; fixed in
> http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/f2e7a20db535/html5-named-characters

The approach taken for the names looks good to me. Thank you.

I'm not marking this as r+, though, because I think the value approach in bug 506090 is simpler and any approach needs to go into the generator (without conflicting with bug 506090) instead of going into the generated file.

(In reply to comment #7)
> I still haven't attempted to attack the generator (and I'd welcome somebody
> else doing it, too).

I'll do it.
Merged to tip of mozilla-central.
Attachment #429650 - Attachment is obsolete: true
Attachment #429650 - Flags: review?(hsivonen)
Comment on attachment 434608 [details] [diff] [review]
store data as data, don't store data as code (approach without relocations)

r=hsivonen in the sense that this is desirable as the output of the generator.

I started changing the generator to incorporate these changes, but in the process I noticed that I can no longer compile NamedCharacters.java using javac, so I need to sort that out first. (The named character data is so fun with compilers. I changed the code in Eclipse and ecj optimized the static initializer to under 2^16 bytes but javac doesn't, so the output isn't a legal .class file in the javac case.)
Attachment #434608 - Flags: review?(hsivonen) → review+
Comment on attachment 434608 [details] [diff] [review]
store data as data, don't store data as code (approach without relocations)

I made the generator produce output like attachment 434608 [details] [diff] [review]. Then I ran Minefield and noticed that the parser no longer expanded named character references like & and ä although > and < still worked.

Then I unapplied the generator result and verified that without the changes, the parser worked. Then I applied attachment 434608 [details] [diff] [review] and verified that it broke the parser, too, so I didn't break things when editing the generator.
Attachment #434608 - Flags: review+ → review-
Is the enum mechanism unable to handle zero-length names? As of bug 506090 landing, names like "gt" and "lt" (the variants without semicolon) have zero-length names, because all names were shortened by two because the first two letters are now baked into the HILO_ACCEL table.
I see the problem now. The file as of bug 506090 wouldn't compile without zero-length names having something in their array part, so zero-length names have CHARS set to one item and LEN set to 0. The enum building macro assumes that LEN is actually the length of CHARS.
(In reply to comment #13)
> (From update of attachment 434608 [details] [diff] [review])
> I made the generator produce output like attachment 434608 [details] [diff] [review]. Then I ran
> Minefield and noticed that the parser no longer expanded named character
> references like & and ä although > and < still worked.

Er, yikes, I forgot to flip the html5.enable pref when I was testing the revised version of the patch.  I assume you have it fixed now, though.
Or if you want me to fix it, let me know.
This wastes as many bytes as there are two-letter names (i.e. &gt without the semicolon, etc.) in order to avoid having to do uglier macro tricks. I added a 0 or 1 flag for the names that had become zero-length. LEN + FLAG is therefore always the number of actual array elements even if the number of array elements doesn't equal LEN.
Attachment #434608 - Attachment is obsolete: true
Attachment #435873 - Flags: review?(bnewman)
Filed bug 555940 about a potential follow-up optimization.
Comment on attachment 435873 [details] [diff] [review]
The mozilla-central diff

Looks good.
Attachment #435873 - Flags: review?(bnewman) → review+
Comment on attachment 435874 [details] [diff] [review]
The generator diff

Generates what it should.
Attachment #435874 - Flags: review?(bnewman) → review+
This unit needs about 2G free RAM for compilation on AIX which made my build host swap to death. 
With this subtle patch you can build on a 1G RAM machine/partition without loosing compiler optimization.
Attachment #501614 - Flags: review?(mozilla+ben)
Blocks: 618660
(In reply to comment #25)
> Created attachment 501614 [details] [diff] [review]
> limit optimizer resource usage on AIX

Let's get a new bug number for the AIX fix. I filed bug 623478. Let's continue there.

The fix should go into the generator--just patching generated code is not good. I'll write the generator patch (hopefully even today).

It's unclear if Ben is still actively paying attention to his review queue.
Attachment #501614 - Flags: review?(mozilla+ben)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: