Atomizing strings needs to be faster

RESOLVED FIXED in Firefox 55

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 months ago
5 months ago

People

(Reporter: Ehsan, Assigned: jandem)

Tracking

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(6 attachments)

I've seen this come up in various profiles for different use cases, and one recent example in session restore with a lot of tabs.  See the profile in bug 1312373.  Also, https://perfht.ml/2nqpMAu.

Can we somehow avoid locking here, for example by using some kind of lock free hash table to store the atoms?
Blocks: 875125
(Assignee)

Comment 1

5 months ago
If I'm reading that profile correctly, there are many atomize calls under XPC_WN_GetterSetter -> ... -> nsXPCWrappedJSClass::CallMethod -> JS_GetProperty.

It's unfortunate nsXPCWrappedJSClass::CallMethod has a |const char*| and doesn't receive/store a jsid or JSAtom*. Also, is this class something we could convert to new DOM bindings?
(Assignee)

Comment 2

5 months ago
I'm also seeing this in profiles and have a stack of patches to optimize various things here.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
(Assignee)

Updated

5 months ago
Summary: js::Atomize() needs to be faster → Atomizing strings needs to be faster
(Assignee)

Comment 3

5 months ago
Created attachment 8852457 [details] [diff] [review]
Part 1 - Optimize AtomMarkingRuntime::markAtom

The atom marking code takes a TenuredCell* argument and has to check whether it's an atom/string, is non-null, etc.

It's faster (and cleaner IMO) to templatize this so we can pass JSAtom* or JS::Symbol*. Then we can assert the thing is in the atoms zone instead of checking it at runtime, we no longer have to check the trace kind, etc.

I also pushed the null checks into the callers because all hot callers pass a non-null pointer.
Attachment #8852457 - Flags: review?(jcoppeard)
(Assignee)

Comment 4

5 months ago
Created attachment 8852461 [details] [diff] [review]
Part 2 - Inline SparseBitmap::setBit

A profile of my atomization micro-benchmark shows we spend at least 5% under SparseBitmap::setBit and SparseBitmap::getOrCreateBlock.

This patch inlines setBit and the fast path of getOrCreateBlock.
Attachment #8852461 - Flags: review?(sphink)

Comment 5

5 months ago
Comment on attachment 8852457 [details] [diff] [review]
Part 1 - Optimize AtomMarkingRuntime::markAtom

Review of attachment 8852457 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/gc/AtomMarking.cpp
@@ +270,4 @@
>          return true;
>  
> +    if (mozilla::IsSame<T, JSAtom>::value) {
> +        JSAtom* atom = reinterpret_cast<JSAtom*>(thing);

I guess we don't need this cast.
Attachment #8852457 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 6

5 months ago
Created attachment 8852490 [details] [diff] [review]
Part 3 - Add AtomMarkingRuntime::inlinedMarkAtom

I don't think we should inline AtomMarkingRuntime::markAtom everywhere, but IMO the atomization code is hot enough that it's worth it there. This patch adds gc/AtomMarkingRuntime-inl.h with AtomMarkingRuntime::inlinedMarkAtom, and calls inlinedMarkAtom in jsatom.cpp.

It improves my atomization micro-benchmark (see below) from ~1250 ms to ~1220 ms (> 1300 ms without parts 1 and 2) so it seems worth it to eliminate most of the remaining markAtom overhead.

---

function f() {
    var o = Object.create(null);
    var res = 0;
    var t = new Date;
    for (var i=0; i<10000000; i++) {
        res = o["foo" + (i % 2048)];
    }
    print(new Date - t);
    return res;
}
f();
Attachment #8852490 - Flags: review?(jcoppeard)
(Assignee)

Comment 7

5 months ago
Created attachment 8852518 [details] [diff] [review]
Part 4 - Move JSString::validateLength call

The first thing we do in Atomize and AtomizeChars is call JSString::validateLength, but we only have to do this when we allocate a new atom. The common case is looking up an existing atom and that will only succeed if the length is sane.
Attachment #8852518 - Flags: review?(luke)
(Assignee)

Comment 8

5 months ago
Created attachment 8852527 [details] [diff] [review]
Part 5 - Inline hash table lookups

Clang was not inlining some (HashSet) methods. MOZ_ALWAYS_INLINE fixes this and improves the time from ~1220 ms to ~1186 ms.

These patches combined make atomization >10% faster on OS X. We now inline everything under AtomizeString and I don't think there's much low-hanging fruit left.

It might make sense to add a per-Zone cache (purged on GC) so we can bypass markAtom, the multiple hash table lookups, and the atoms lock. This would need some careful measurements because it could easily backfire by adding more overhead when we're atomizing many different strings.
Attachment #8852527 - Flags: review?(luke)

Updated

5 months ago
Attachment #8852518 - Flags: review?(luke) → review+
Attachment #8852461 - Flags: review?(sphink) → review+

Updated

5 months ago
Attachment #8852490 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 9

5 months ago
(In reply to Jan de Mooij [:jandem] from comment #8)
> It might make sense to add a per-Zone cache (purged on GC) so we can bypass
> markAtom, the multiple hash table lookups, and the atoms lock. This would
> need some careful measurements because it could easily backfire by adding
> more overhead when we're atomizing many different strings.

I prototyped this per-Zone cache (purged on GC) and it's actually a much bigger win than I expected - it improves my atomization micro-benchmark (comment 6) from 1190 ms to 860 ms, the Kraken JSON-parsing test becomes a few ms faster and Octane-codeload seems to improve with a few hundred points (although it's noisy). We can likely improve more by inlining this part of the atomization code.

On Octane the per-zone cache has a hit rate of 96.4%. Tomorrow I'll test this on some popular websites to see what we get there. Also, if we do this we can probably drop part 3.
(Assignee)

Updated

5 months ago
Blocks: 1334672
(Assignee)

Comment 10

5 months ago
Created attachment 8852833 [details] [diff] [review]
Part 6 - Add a Zone cache

This adds the cache to Zone and purges it on GC. I tested it on some websites (Gmail, Google Docs/Sheets, Wikipedia, Twitter) and the hit rate is at least 80%. It's more effective on certain benchmarks/frameworks and I think Ehsan's profile (see comment 0 and comment 1) would also have benefited a lot from this because XPConnect keeps atomizing the same strings.

I also added a clearAndShrink() method to js::HashMap and js::HashSet. Often clear() is not great because it doesn't realloc, so you risk wasting memory, and finish() is also not great because you have to do a fallible initialization again and I'd like to avoid adding a branch to the atomization code to check initialized().

Looks green on Try so far.
Attachment #8852833 - Flags: review?(jcoppeard)

Updated

5 months ago
Attachment #8852833 - Flags: review?(jcoppeard) → review+

Updated

5 months ago
Attachment #8852527 - Flags: review?(luke) → review+

Comment 11

5 months ago
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c7b6e016f85
part 1 - Templatize and optimize AtomMarkingRuntime::markAtom. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d0df04fefbe
part 2 - Ensure SparseBitmap::setBit gets inlined. r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c682e6c1eb0
part 3 - Add AtomMarkingRuntime::inlinedMarkAtom to eliminate markAtom call overhead when atomizing. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3a6587bc94a
part 4 - Call JSString::validateLength only when we have to allocate a new atom. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ff8c5acf4ea
part 5 - Make sure various hashtable lookups get inlined when atomizing strings. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/17c436c27035
part 6 - Add a Zone cache for recently atomized strings. r=jonco
(Assignee)

Comment 12

5 months ago
(In reply to Jon Coppeard (:jonco) from comment #5)
> I guess we don't need this cast.

Unfortunately we do because T is a template parameter that can be JSAtom or JS::Symbol. Even though we check mozilla::IsSame<T, JSAtom> the compiler still wants the cast.
(Assignee)

Comment 13

5 months ago
(In reply to Jan de Mooij [:jandem] from comment #9)
> the Kraken JSON-parsing test becomes a few ms faster

AWFY is saying 15-17% and we're now faster than V8/JSC, so it's pretty sweet.

Comment 14

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/5c7b6e016f85
https://hg.mozilla.org/mozilla-central/rev/4d0df04fefbe
https://hg.mozilla.org/mozilla-central/rev/1c682e6c1eb0
https://hg.mozilla.org/mozilla-central/rev/f3a6587bc94a
https://hg.mozilla.org/mozilla-central/rev/9ff8c5acf4ea
https://hg.mozilla.org/mozilla-central/rev/17c436c27035
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.