Open Bug 1843951 Opened 2 years ago Updated 2 years ago

The indirect call in PLDHashTable::ComputeKeyHash to mOps->hashKey shows up on speedometer3

Tracking

()

Status:

NEW

People

(Reporter: mstange, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(1 file)

Bug 1843951 - Prehash keys in nsTHashTable in order to avoid indirect calls to the hash function. r=nika 2 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review

Markus Stange [:mstange]

Reporter

Description

•

2 years ago

•

Edited

Auditing indirect calls on Speedometer 3 (by looking at the callers of LdrpDispatchUserCallTarget on Windows) turned up the call to mOps->hashKey as one of the most common indirect calls.

I think it's worth doing the hashing inside nsTHashTable, in templated code where we know the hashing function.

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-456

Markus Stange [:mstange]

Reporter

Comment 1

•

2 years ago

Attached file Bug 1843951 - Prehash keys in nsTHashTable in order to avoid indirect calls to the hash function. r=nika — Details

nsTHashTable is templated on the EntryType so it can direct-call (or even inline)
the call to EntryType::HashKey. PLDHashTable would need to go through mOps->hashKey,
i.e. call through a function pointer.

Andrew McCreight [:mccr8]

Comment 2

•

2 years ago

Thanks for working on this.

What's the impact on the binary size from this change? IIRC the reason nsTHashTable is implemented this way is to avoid binary bloat.

Are there measurable performance improvements from this change?

If the binary bloat is really bad, it looks like only using inlining for specific hash tables (eg RestyleManager::mSnapshots, gfxDWriteFont::mGlyphWidths) might get us the performance benefit without as much bloat. It looks like just inlining for the top 5 most common hash tables would eliminate about 38% of the indirect calls, if I'm reading the profile correctly (we'd probably want to do more, but I was feeling lazy so I just added up the top 5). Although I'm not sure off the top of my head how to implement that easily. Maybe an optional boolean template parameter to specify whether we inline.

Flags: needinfo?(mstange.moz)

Markus Stange [:mstange]

Reporter

Comment 3

•

2 years ago

(In reply to Andrew McCreight [:mccr8] from comment #2)

What's the impact on the binary size from this change?

It's not clear to me:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&newProject=try&newRevision=8ebd1045baf3447513696d233dda1e489060ed78&framework=2&originalRevision=c7b6669f2fdde9337f235e262d3f35eb1bfef865&page=1&filter=XUL+section+nightly

This shows a 300KB increase on Windows, and a 100KB decrease on Linux. I'm not sure how high the PGO-induced variability in these metrics is.

Are there measurable performance improvements from this change?

I'm going to collect a profile later today with the PGO builds from try, and count the number of samples in hash-table related functions. The impact on speedometer3 score overall appears to be too low to be detectable; based on the numbers in the profile I would expect a 0.1% improvement. (LdrpDispatchUserCallTarget is 2% of overall speedometer3, and this particular call is 5% of LdrpDispatchUserCallTarget, and 2% * 5% = 0.1%.)

In the perfherder speedometer3 subtest comparison, none of the subtests have a change that's detected with "high" confidence. (Except for one on Linux but it's garbage - this patch is definitely not causing a 6% improvement.)

Windows
Linux
macOS

Overall I'd say the case for this patch is not very strong. A 0.1% improvement is something we'd want to take if it's cheap to make and doesn't cause adverse effects in other ways; I'm not sure if this one qualifies.

Flags: needinfo?(mstange.moz)

Markus Stange [:mstange]

Reporter

Comment 4

•

2 years ago

So the profiles confirm that the indirect call is gone, at least.
Before: https://share.firefox.dev/3NXPkQD
After: https://share.firefox.dev/3Q45PgI

Also, the number of samples containing "nsTHashtable" functions is reduced:
Before: https://share.firefox.dev/44OuxFN (3398 of 199950 = 1.69%)
After: https://share.firefox.dev/3NWrFjw (3046 of 193884 = 1.57%)
This suggests that the patch makes hash tables 7.7% faster.
(compute_part_relative_fraction(3398 / 199950, 3046 / 193884) == 0.923 == 1 - 0.077, see this document for justification)

Andrew McCreight [:mccr8]

Comment 5

•

2 years ago

I don't know anything about CPU branch prediction, but I wonder if maybe these indirect calls are so well predicted if they are called frequently that there's not much overhead.

Jonathan Kew [:jfkthame]

Comment 6

•

2 years ago

(In reply to Andrew McCreight [:mccr8] from comment #2)

If the binary bloat is really bad, it looks like only using inlining for specific hash tables (eg RestyleManager::mSnapshots, gfxDWriteFont::mGlyphWidths) might get us the performance benefit without as much bloat.

For the specific case of gfxDWriteFont::mGlyphWidths, bug 1844464 should improve things (by putting a fast MruCache in front, so we hit the hashtable much less frequently).

Andrew McCreight [:mccr8]

Comment 7

•

2 years ago

Good point. Any place we're hitting the hashtable often enough to show up in the profiles here, maybe we want to add an MruCache in front of it anyways.

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

2 years ago

The severity field is not set for this bug.
:nika, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Updated

•

2 years ago

Type: defect → enhancement

Flags: needinfo?(nika)

Markus Stange [:mstange]

Reporter

Updated

•

2 years ago

Blocks: 1870279

Jeff Muizelaar [:jrmuizel]

Updated

•

2 years ago

No longer blocks: 1870279

Jeff Muizelaar [:jrmuizel]

Updated

•

2 years ago

Blocks: 1870279

Dianna Smith [:diannaS]

Comment 9

•

2 years ago

:mstange any update on this patch? Wondering based on https://bugzilla.mozilla.org/show_bug.cgi?id=1870279#c15

Flags: needinfo?(mstange.moz)

Markus Stange [:mstange]

Reporter

Comment 10

•

2 years ago

No update and I haven't been working on it recently.

The next step here would be to fix the gtest failures that you see when you run the job test-linux1804-64-shippable-qr/opt-gtest-1proc . Unfortunately the log has expired so I can no longer say which test was failing.

I've proposed a different approach in bug 1870279 to fix that particular regression - we can try mozilla::HashSet instead of nsTHashtable there.

Assignee: mstange.moz → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(mstange.moz)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

The indirect call in PLDHashTable::ComputeKeyHash to mOps->hashKey shows up on speedometer3

Categories

(Core :: XPCOM, enhancement)

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Updated

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type