Atomizing strings needs to be faster

RESOLVED FIXED in Firefox 55

Status

()

Core
JavaScript Engine
RESOLVED FIXED
a month ago
27 days ago

People

(Reporter: Ehsan, Assigned: jandem)

Tracking

(Blocks: 1 bug)

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(6 attachments)

I've seen this come up in various profiles for different use cases, and one recent example in session restore with a lot of tabs.  See the profile in bug 1312373.  Also, https://perfht.ml/2nqpMAu.

Can we somehow avoid locking here, for example by using some kind of lock free hash table to store the atoms?
Blocks: 875125
(Assignee)

Comment 1

a month ago
If I'm reading that profile correctly, there are many atomize calls under XPC_WN_GetterSetter -> ... -> nsXPCWrappedJSClass::CallMethod -> JS_GetProperty.

It's unfortunate nsXPCWrappedJSClass::CallMethod has a |const char*| and doesn't receive/store a jsid or JSAtom*. Also, is this class something we could convert to new DOM bindings?
(Assignee)

Comment 2

29 days ago
I'm also seeing this in profiles and have a stack of patches to optimize various things here.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
(Assignee)

Updated

29 days ago
Summary: js::Atomize() needs to be faster → Atomizing strings needs to be faster
(Assignee)

Comment 3

29 days ago
Created attachment 8852457 [details] [diff] [review]
Part 1 - Optimize AtomMarkingRuntime::markAtom

The atom marking code takes a TenuredCell* argument and has to check whether it's an atom/string, is non-null, etc.

It's faster (and cleaner IMO) to templatize this so we can pass JSAtom* or JS::Symbol*. Then we can assert the thing is in the atoms zone instead of checking it at runtime, we no longer have to check the trace kind, etc.

I also pushed the null checks into the callers because all hot callers pass a non-null pointer.
Attachment #8852457 - Flags: review?(jcoppeard)
(Assignee)

Comment 4

29 days ago
Created attachment 8852461 [details] [diff] [review]
Part 2 - Inline SparseBitmap::setBit

A profile of my atomization micro-benchmark shows we spend at least 5% under SparseBitmap::setBit and SparseBitmap::getOrCreateBlock.

This patch inlines setBit and the fast path of getOrCreateBlock.
Attachment #8852461 - Flags: review?(sphink)
Comment on attachment 8852457 [details] [diff] [review]
Part 1 - Optimize AtomMarkingRuntime::markAtom

Review of attachment 8852457 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/gc/AtomMarking.cpp
@@ +270,4 @@
>          return true;
>  
> +    if (mozilla::IsSame<T, JSAtom>::value) {
> +        JSAtom* atom = reinterpret_cast<JSAtom*>(thing);

I guess we don't need this cast.
Attachment #8852457 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 6

29 days ago
Created attachment 8852490 [details] [diff] [review]
Part 3 - Add AtomMarkingRuntime::inlinedMarkAtom

I don't think we should inline AtomMarkingRuntime::markAtom everywhere, but IMO the atomization code is hot enough that it's worth it there. This patch adds gc/AtomMarkingRuntime-inl.h with AtomMarkingRuntime::inlinedMarkAtom, and calls inlinedMarkAtom in jsatom.cpp.

It improves my atomization micro-benchmark (see below) from ~1250 ms to ~1220 ms (> 1300 ms without parts 1 and 2) so it seems worth it to eliminate most of the remaining markAtom overhead.

---

function f() {
    var o = Object.create(null);
    var res = 0;
    var t = new Date;
    for (var i=0; i<10000000; i++) {
        res = o["foo" + (i % 2048)];
    }
    print(new Date - t);
    return res;
}
f();
Attachment #8852490 - Flags: review?(jcoppeard)
(Assignee)

Comment 7

29 days ago
Created attachment 8852518 [details] [diff] [review]
Part 4 - Move JSString::validateLength call

The first thing we do in Atomize and AtomizeChars is call JSString::validateLength, but we only have to do this when we allocate a new atom. The common case is looking up an existing atom and that will only succeed if the length is sane.
Attachment #8852518 - Flags: review?(luke)
(Assignee)

Comment 8

29 days ago
Created attachment 8852527 [details] [diff] [review]
Part 5 - Inline hash table lookups

Clang was not inlining some (HashSet) methods. MOZ_ALWAYS_INLINE fixes this and improves the time from ~1220 ms to ~1186 ms.

These patches combined make atomization >10% faster on OS X. We now inline everything under AtomizeString and I don't think there's much low-hanging fruit left.

It might make sense to add a per-Zone cache (purged on GC) so we can bypass markAtom, the multiple hash table lookups, and the atoms lock. This would need some careful measurements because it could easily backfire by adding more overhead when we're atomizing many different strings.
Attachment #8852527 - Flags: review?(luke)

Updated

29 days ago
Attachment #8852518 - Flags: review?(luke) → review+
Attachment #8852461 - Flags: review?(sphink) → review+

Updated

29 days ago
Attachment #8852490 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 9

29 days ago
(In reply to Jan de Mooij [:jandem] from comment #8)
> It might make sense to add a per-Zone cache (purged on GC) so we can bypass
> markAtom, the multiple hash table lookups, and the atoms lock. This would
> need some careful measurements because it could easily backfire by adding
> more overhead when we're atomizing many different strings.

I prototyped this per-Zone cache (purged on GC) and it's actually a much bigger win than I expected - it improves my atomization micro-benchmark (comment 6) from 1190 ms to 860 ms, the Kraken JSON-parsing test becomes a few ms faster and Octane-codeload seems to improve with a few hundred points (although it's noisy). We can likely improve more by inlining this part of the atomization code.

On Octane the per-zone cache has a hit rate of 96.4%. Tomorrow I'll test this on some popular websites to see what we get there. Also, if we do this we can probably drop part 3.
(Assignee)

Updated

29 days ago
Blocks: 1334672
(Assignee)

Comment 10

29 days ago
Created attachment 8852833 [details] [diff] [review]
Part 6 - Add a Zone cache

This adds the cache to Zone and purges it on GC. I tested it on some websites (Gmail, Google Docs/Sheets, Wikipedia, Twitter) and the hit rate is at least 80%. It's more effective on certain benchmarks/frameworks and I think Ehsan's profile (see comment 0 and comment 1) would also have benefited a lot from this because XPConnect keeps atomizing the same strings.

I also added a clearAndShrink() method to js::HashMap and js::HashSet. Often clear() is not great because it doesn't realloc, so you risk wasting memory, and finish() is also not great because you have to do a fallible initialization again and I'd like to avoid adding a branch to the atomization code to check initialized().

Looks green on Try so far.
Attachment #8852833 - Flags: review?(jcoppeard)

Updated

29 days ago
Attachment #8852833 - Flags: review?(jcoppeard) → review+

Updated

28 days ago
Attachment #8852527 - Flags: review?(luke) → review+

Comment 11

28 days ago
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c7b6e016f85
part 1 - Templatize and optimize AtomMarkingRuntime::markAtom. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d0df04fefbe
part 2 - Ensure SparseBitmap::setBit gets inlined. r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c682e6c1eb0
part 3 - Add AtomMarkingRuntime::inlinedMarkAtom to eliminate markAtom call overhead when atomizing. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3a6587bc94a
part 4 - Call JSString::validateLength only when we have to allocate a new atom. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ff8c5acf4ea
part 5 - Make sure various hashtable lookups get inlined when atomizing strings. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/17c436c27035
part 6 - Add a Zone cache for recently atomized strings. r=jonco
(Assignee)

Comment 12

28 days ago
(In reply to Jon Coppeard (:jonco) from comment #5)
> I guess we don't need this cast.

Unfortunately we do because T is a template parameter that can be JSAtom or JS::Symbol. Even though we check mozilla::IsSame<T, JSAtom> the compiler still wants the cast.
(Assignee)

Comment 13

28 days ago
(In reply to Jan de Mooij [:jandem] from comment #9)
> the Kraken JSON-parsing test becomes a few ms faster

AWFY is saying 15-17% and we're now faster than V8/JSC, so it's pretty sweet.

Comment 14

27 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/5c7b6e016f85
https://hg.mozilla.org/mozilla-central/rev/4d0df04fefbe
https://hg.mozilla.org/mozilla-central/rev/1c682e6c1eb0
https://hg.mozilla.org/mozilla-central/rev/f3a6587bc94a
https://hg.mozilla.org/mozilla-central/rev/9ff8c5acf4ea
https://hg.mozilla.org/mozilla-central/rev/17c436c27035
Status: ASSIGNED → RESOLVED
Last Resolved: 27 days ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.