Closed Bug 1351857 Opened 7 years ago Closed 7 years ago

[meta] nsHtml5ElementName::elementNameByBuffer and nsHTML5AttributeName::nameByBuffer are too slow

Categories

(Core :: DOM: HTML Parser, enhancement, P3)

50 Branch
enhancement

Tracking

()

RESOLVED FIXED
Performance Impact ?

People

(Reporter: smaug, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: meta)

In innerHTML case, nsHtml5ElementName::elementNameByBuffer shows up in profiles.
I have a profile where it takes 7% or innerHTML parsing time
(of stuff under FragmentOrElement::SetInnerHTMLInternal).
Profiler hints that binarySearch is taking some of that time (in local opt build)
Could we possibly use compile time generated (perfect) hashtable there.

When using Nightly, profile is a bit different, string finalization and localEqualsBuffer under elementNameByBuffer showing up.


Could we do some local caching or something here?
(similar-ish to bug 1351303)
Did your profiling indicate whether the problem was the hash function computation or the subsequent binary search?

In the hash function computation, it seems that the number of branches could be reduced by unrolling the loop manually, and data parallelism on CPUs with multiple ALUs could be improved by computing the components of the hash independently first and then ORing them together instead of shifting the immediate result left at each step.

The binary search part could be made faster by removing the elements that don't need to be in the array from the array, which would result in a shorter array to search.
The profile I'm now looking at just shows 7% of innerHTML time in elementNameByBuffer
(and 5.3% in nsHtml5AttributeName::nameByBuffer).
Since the code seems to be rather inlined, I don't really trust where the time is spent there, but
profiler hints in both cases that it is BinarySearchDefaultComparator.

Could we possibly have some better datastructure for nsHtml5ElementName::ELEMENT_HASHES?
Perfect static hashtable? Or cache like in bug 1351303?
Same with nsHtml5AttributeName::ATTRIBUTE_HASHES
Different profiles hint about different things.
Profiling Nightly (and not local opt build) hints that time is spent in
nsDependentSubstring ctor and in nsStringRepr::Equals.
In this profile 9.6% of the time under FragmentOrElement::SetInnerHTMLInternal is spent inside
elementNameByBuffer. nameByBuffer is 3.8%.
Bug 1352734 will fix most of elementNameByBuffer, but nsHtml5AttributeName::nameByBuffer uses time in getting nsIAtom.
Bug 1351303 will help with atoms in main thread when using a main thread only method, but here we're dealing with parser atoms. Not sure how to fix that.
Depends on: 1352734
Summary: nsHtml5ElementName::elementNameByBuffer is too slow → nsHtml5ElementName::elementNameByBuffer and nsHTML5AttributeName::nameByBuffer are too slow
Depends on: 1352874
nsHTML5AttributeName::nameByBuffer is now mostly malloc.
I guess nsHtml5ReleasableAttributeName should be handled similarly to NodeInfos, sharing the same instance.
Depends on: 1352978
Whiteboard: [qf]
Flags: needinfo?(wchen)
Priority: -- → P3
I guess this is now a meta bug.
Fixing bug 1352978 would be nice.
Will nsHtml5ElementName::elementNameByBuffer become particularly slow if/when people start to use custom element names more? The binary search will fail always and we create a new nsHtml5ReleasableElementName all the time. (or are custom element names handled specially elsewhere?)

A cache on top of the current static stuff might be quite useful. Something which in normal case
optimizes out the whole binary cache and just return some recently used nsHtml5(Releasable)ElementName instance.
Thinking something similar to 
http://searchfox.org/mozilla-central/rev/78cefe75fb43195e7f5aee1d8042b8d8fc79fc70/xpcom/ds/nsAtomTable.cpp#739-751
Flags: needinfo?(hsivonen)
(In reply to Olli Pettay [:smaug] (review queue closed. Please ask reviews from other DOM peers) from comment #7)
> Will nsHtml5ElementName::elementNameByBuffer become particularly slow
> if/when people start to use custom element names more? The binary search
> will fail always and we create a new nsHtml5ReleasableElementName all the
> time.

Correct.

> (or are custom element names handled specially elsewhere?)

They aren't.

> A cache on top of the current static stuff might be quite useful. Something
> which in normal case
> optimizes out the whole binary cache and just return some recently used
> nsHtml5(Releasable)ElementName instance.
> Thinking something similar to 
> http://searchfox.org/mozilla-central/rev/
> 78cefe75fb43195e7f5aee1d8042b8d8fc79fc70/xpcom/ds/nsAtomTable.cpp#739-751

This makes sense. The main complication is handling the freeing of nsHtml5ReleasableElementName correctly. (Currently, they aren't refcounted, so we'd need to add a way to account for the cache holding them still.)
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> The main complication is handling the freeing of
> nsHtml5ReleasableElementName correctly. (Currently, they aren't refcounted,
> so we'd need to add a way to account for the cache holding them still.)

There can currently be at most one live nsHtml5ReleasableElementName at a time. Instead of allocating and deleting them for each unknown tag, I suggest we
 1) Remove the "Releasable" distinction.
 2) Keep around one nsHtml5ElementName per nsHtml5Tokenizer instance for unknown tags and change the fields of the single nsHtml5ElementName as needed.

This would remove the malloc/free for nsHtml5ElementName (except, for simplicity, when allocating nsHtml5Tokenizer itself), but would put more strain on the atom lookup in the case of unknown tags. smaug, does that tradeoff seem OK?

(The same doesn't work for nsHtml5AttributeName, though.)

Additionally, I'm planning to make the tokenizer keep track of whether it has seen a hyphen in the tag name and skip the lookup for known element names that don't have a hyphen if it has seen a hyphen.
Flags: needinfo?(bugs)
mRecentlyUsedParserAtoms hopefully caches atoms quite well, so sounds reasonable.


Unrelated to that, need to still figure out how to make nsHtml5ElementName::ELEMENT_HASHES usage less slow. (I'm surprised it shows up in the profiles, but that is what profiles seem to hint)
Flags: needinfo?(bugs)
(It is after all still more important to optimize the common cases first)
(In reply to Henri Sivonen (:hsivonen) from comment #9)
> (The same doesn't work for nsHtml5AttributeName, though.)

The same approach can be used for attributes if we no longer hold an nsHtml5AttributeName but instead put its fields (and the attribute value) in a new kind of struct held directly (not behind a pointer) in an AutoTArray in nsHtml5HtmlAttributes.
(In reply to Olli Pettay [:smaug] from comment #11)
> (It is after all still more important to optimize the common cases first)

How about we stop treating MathML as a "common" case even though it is, in fact, a set of known-ahead-of-time names? That should cut down the search space by quite a bit.
Blocks: 1355472
Depends on: 1355493
Whiteboard: [qf] → [qf:meta]
Depends on: 1355769
Depends on: 1355779
Depends on: 1358095
The dependencies getting fixed should have improved things. If these are still too slow, the next logical step would be using a perfect has function (one for attributes and another for elements).

So far I haven't found a suitable function generator by searching the Web. (Ideally, the generator would emit a Java method that takes an array of char, returns and int and performs common ALU operations on data from the array without applying GPL to the code and without relying an particular Java-isms like a lot of objects and methods in place of inline ALU ops.)

smaug, are these still too slow?
Flags: needinfo?(bugs)
BinarySearchDefaultComparator or some such did show up a bit in some profile, but given that I marked bug 1347525 fixed, I think we're good with this too. Thanks for all the work done here.

If bug 1364861 reveals some new information, we can open new bugs.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(bugs)
Resolution: --- → FIXED
elementNameByBuffer takes 4.9% of innerHTML now. That is a lot. 
I wonder if we should consider reopening this after all.
As a comparison NS_AtomizeMainThread takes 1.1%, yet both elementNameByBuffer and NS_AtomizeMainThread do somewhat similar work and are both called about as frequently, I think.
If nothing else, elementNameByBuffer could perhaps have a local cache for recently used values.
(In reply to Olli Pettay [:smaug] from comment #18)
> If nothing else, elementNameByBuffer could perhaps have a local cache for
> recently used values.

Have you checked that the benchmarks that we want to optimize for use the same names repeatedly during a single parser invocation?
I haven't yet, but given that adding NS_AtomizeMainThread was rather effective, I'd assume so.
A testcase is in bug 1347525.
The testcase is silly, and seems to trigger
li
div
input
label
label
button
button
div
li
calls to the method.
Let me reopen this after all.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1366241
Marking fixed. If there is still something to do, we can always reopen.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Performance Impact: --- → ?
Keywords: meta
Whiteboard: [qf:meta]
Summary: nsHtml5ElementName::elementNameByBuffer and nsHTML5AttributeName::nameByBuffer are too slow → [meta] nsHtml5ElementName::elementNameByBuffer and nsHTML5AttributeName::nameByBuffer are too slow
You need to log in before you can comment on or make changes to this bug.