Closed Bug 1350760 Opened 3 years ago Closed 3 years ago

Atomizing strings needs to be faster

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan, Assigned: jandem)

References

Details

Attachments

(6 files)

I've seen this come up in various profiles for different use cases, and one recent example in session restore with a lot of tabs.  See the profile in bug 1312373.  Also, https://perfht.ml/2nqpMAu.

Can we somehow avoid locking here, for example by using some kind of lock free hash table to store the atoms?
Blocks: 875125
If I'm reading that profile correctly, there are many atomize calls under XPC_WN_GetterSetter -> ... -> nsXPCWrappedJSClass::CallMethod -> JS_GetProperty.

It's unfortunate nsXPCWrappedJSClass::CallMethod has a |const char*| and doesn't receive/store a jsid or JSAtom*. Also, is this class something we could convert to new DOM bindings?
I'm also seeing this in profiles and have a stack of patches to optimize various things here.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Summary: js::Atomize() needs to be faster → Atomizing strings needs to be faster
The atom marking code takes a TenuredCell* argument and has to check whether it's an atom/string, is non-null, etc.

It's faster (and cleaner IMO) to templatize this so we can pass JSAtom* or JS::Symbol*. Then we can assert the thing is in the atoms zone instead of checking it at runtime, we no longer have to check the trace kind, etc.

I also pushed the null checks into the callers because all hot callers pass a non-null pointer.
Attachment #8852457 - Flags: review?(jcoppeard)
A profile of my atomization micro-benchmark shows we spend at least 5% under SparseBitmap::setBit and SparseBitmap::getOrCreateBlock.

This patch inlines setBit and the fast path of getOrCreateBlock.
Attachment #8852461 - Flags: review?(sphink)
Comment on attachment 8852457 [details] [diff] [review]
Part 1 - Optimize AtomMarkingRuntime::markAtom

Review of attachment 8852457 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/gc/AtomMarking.cpp
@@ +270,4 @@
>          return true;
>  
> +    if (mozilla::IsSame<T, JSAtom>::value) {
> +        JSAtom* atom = reinterpret_cast<JSAtom*>(thing);

I guess we don't need this cast.
Attachment #8852457 - Flags: review?(jcoppeard) → review+
I don't think we should inline AtomMarkingRuntime::markAtom everywhere, but IMO the atomization code is hot enough that it's worth it there. This patch adds gc/AtomMarkingRuntime-inl.h with AtomMarkingRuntime::inlinedMarkAtom, and calls inlinedMarkAtom in jsatom.cpp.

It improves my atomization micro-benchmark (see below) from ~1250 ms to ~1220 ms (> 1300 ms without parts 1 and 2) so it seems worth it to eliminate most of the remaining markAtom overhead.

---

function f() {
    var o = Object.create(null);
    var res = 0;
    var t = new Date;
    for (var i=0; i<10000000; i++) {
        res = o["foo" + (i % 2048)];
    }
    print(new Date - t);
    return res;
}
f();
Attachment #8852490 - Flags: review?(jcoppeard)
The first thing we do in Atomize and AtomizeChars is call JSString::validateLength, but we only have to do this when we allocate a new atom. The common case is looking up an existing atom and that will only succeed if the length is sane.
Attachment #8852518 - Flags: review?(luke)
Clang was not inlining some (HashSet) methods. MOZ_ALWAYS_INLINE fixes this and improves the time from ~1220 ms to ~1186 ms.

These patches combined make atomization >10% faster on OS X. We now inline everything under AtomizeString and I don't think there's much low-hanging fruit left.

It might make sense to add a per-Zone cache (purged on GC) so we can bypass markAtom, the multiple hash table lookups, and the atoms lock. This would need some careful measurements because it could easily backfire by adding more overhead when we're atomizing many different strings.
Attachment #8852527 - Flags: review?(luke)
Attachment #8852518 - Flags: review?(luke) → review+
Attachment #8852461 - Flags: review?(sphink) → review+
Attachment #8852490 - Flags: review?(jcoppeard) → review+
(In reply to Jan de Mooij [:jandem] from comment #8)
> It might make sense to add a per-Zone cache (purged on GC) so we can bypass
> markAtom, the multiple hash table lookups, and the atoms lock. This would
> need some careful measurements because it could easily backfire by adding
> more overhead when we're atomizing many different strings.

I prototyped this per-Zone cache (purged on GC) and it's actually a much bigger win than I expected - it improves my atomization micro-benchmark (comment 6) from 1190 ms to 860 ms, the Kraken JSON-parsing test becomes a few ms faster and Octane-codeload seems to improve with a few hundred points (although it's noisy). We can likely improve more by inlining this part of the atomization code.

On Octane the per-zone cache has a hit rate of 96.4%. Tomorrow I'll test this on some popular websites to see what we get there. Also, if we do this we can probably drop part 3.
Blocks: 1334672
This adds the cache to Zone and purges it on GC. I tested it on some websites (Gmail, Google Docs/Sheets, Wikipedia, Twitter) and the hit rate is at least 80%. It's more effective on certain benchmarks/frameworks and I think Ehsan's profile (see comment 0 and comment 1) would also have benefited a lot from this because XPConnect keeps atomizing the same strings.

I also added a clearAndShrink() method to js::HashMap and js::HashSet. Often clear() is not great because it doesn't realloc, so you risk wasting memory, and finish() is also not great because you have to do a fallible initialization again and I'd like to avoid adding a branch to the atomization code to check initialized().

Looks green on Try so far.
Attachment #8852833 - Flags: review?(jcoppeard)
Attachment #8852833 - Flags: review?(jcoppeard) → review+
Attachment #8852527 - Flags: review?(luke) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c7b6e016f85
part 1 - Templatize and optimize AtomMarkingRuntime::markAtom. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d0df04fefbe
part 2 - Ensure SparseBitmap::setBit gets inlined. r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c682e6c1eb0
part 3 - Add AtomMarkingRuntime::inlinedMarkAtom to eliminate markAtom call overhead when atomizing. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3a6587bc94a
part 4 - Call JSString::validateLength only when we have to allocate a new atom. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ff8c5acf4ea
part 5 - Make sure various hashtable lookups get inlined when atomizing strings. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/17c436c27035
part 6 - Add a Zone cache for recently atomized strings. r=jonco
(In reply to Jon Coppeard (:jonco) from comment #5)
> I guess we don't need this cast.

Unfortunately we do because T is a template parameter that can be JSAtom or JS::Symbol. Even though we check mozilla::IsSame<T, JSAtom> the compiler still wants the cast.
(In reply to Jan de Mooij [:jandem] from comment #9)
> the Kraken JSON-parsing test becomes a few ms faster

AWFY is saying 15-17% and we're now faster than V8/JSC, so it's pretty sweet.
You need to log in before you can comment on or make changes to this bug.