Closed Bug 966040 Opened 6 years ago Closed 6 years ago

GGC could be saving us 649800 mallocs on regexp

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

Since Array slots are always external, we have to malloc them. Because of this, Regexp ends up making an exorbitant number of malloc calls. I'm not sure whether or how many slots MatchObject actually needs, but with GGC we should definitely be creating these in the nursery, not with malloc.

This will require a bit of annoying code reorganization, but should be worth it.
Something that's been bugging me for awhile now is how complicated NewGCThing has become: it has different static code paths for zeal, GGC, debug and dynamic paths for PJS contexts, the initial heap, and GGC. If you move up two levels it's even more complex with infallible slot allocation and initialization, sometimes in jitcode, sometimes not.

This is part 1: simplify everything until we at least have a recognizable allocation core again. This splits out the various pre and post checking, like we do for MarkInternal. It also splits up object and non-object allocation. Since non-object allocation does not incur any of the complexities of nursery or initial-heap, this is a nice simplification. Also, there is a direct mapping between C++ type and finalize kind here, so we can dump /all/ of the different args and just have the template parameter.

With this split, we should be able to push infallible extra slot allocation directly into the nursery. This is more complex and why we may want to do this may not be obvious at first. The reason is that we always allocate slots /before/ allocating the GC thing. This means that any slots/object pair in the nursery is currently laid out backwards, handily defeating most modern processor's assumptions about data access order. We're probably mostly saved here since the nursery is pretty much always going to be in L3 anyway, but it would be nice to at least not be actively hostile towards processor optimizations here.
Attachment #8368362 - Flags: review?(jcoppeard)
Seems very similar to bug 957533
Blocks: 806646
I want to raise bug 957542. In that bug I made it possible to allocate inline slots in an ArrayObject. For the result of octane-regexp that means the 2 properties are now in the fixed slots instead of dynamic slots. This also removes the mallocs and gives me a 8% increase. There is one issue blocking the landing. I'm waiting for input from Brian if we can read shapes in Off-thread MIR Generation. Else the bug should be finished.

Note: I think it would be awesome to also be able to allocate dynamic slots in the nursery!
Comment on attachment 8368362 [details] [diff] [review]
clean_up_allocator_core-v0.diff

Review of attachment 8368362 [details] [diff] [review]:
-----------------------------------------------------------------

I like the fact that we can deduce kind and size from the the type for non-objects.  This is a nice tidyup.
Attachment #8368362 - Flags: review?(jcoppeard) → review+
(In reply to Hannes Verschore [:h4writer] from comment #3)
> I want to raise bug 957542. In that bug I made it possible to allocate
> inline slots in an ArrayObject. For the result of octane-regexp that means
> the 2 properties are now in the fixed slots instead of dynamic slots. This
> also removes the mallocs and gives me a 8% increase. There is one issue
> blocking the landing. I'm waiting for input from Brian if we can read shapes
> in Off-thread MIR Generation. Else the bug should be finished.

Currently the dynamic allocations are for 8 properties, which I thought was probably excessive. If you can reduce that to 2, independent of this work, it should roughly half the number of times we fill the nursery in ggc builds, giving us a tremendous speedup there too.
Duplicate of this bug: 957533
Do the thing.
Attachment #8368679 - Flags: review?(jcoppeard)
Comment on attachment 8368679 [details] [diff] [review]
allocate_dynamic_slots_in_nursery-v0.diff

Review of attachment 8368679 [details] [diff] [review]:
-----------------------------------------------------------------

Neat, and removes some duplicated logic from JSObject::create/createArray() too.

::: js/src/gc/Nursery.cpp
@@ +132,5 @@
> +JSObject *
> +js::Nursery::allocateObject(JSContext *cx, size_t size, size_t numDynamic)
> +{
> +    /* Attempt to allocate slots contiguously after object, if possible. */
> +    if (numDynamic && numDynamic <= MaxNurserySlots) { 

Nit: space at end of line

@@ +150,5 @@
> +            return nullptr;
> +    }
> +
> +    JSObject *obj = static_cast<JSObject *>(allocate(size));
> +    

Nit: space at end of line
Attachment #8368679 - Flags: review?(jcoppeard) → review+
Whoa, thanks for the heads up! I haven't imported my vim config onto this computer yet, so trailing whitespace isn't getting highlighted in my editor like I'm used to.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/56d9e75b36b4
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/2815473b057b
This appears to have regressed octane-Splay and octane-SplayLatency test on AWFY.
https://hg.mozilla.org/mozilla-central/rev/56d9e75b36b4
https://hg.mozilla.org/mozilla-central/rev/2815473b057b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
(In reply to mayankleoboy1 from comment #11)
> This appears to have regressed octane-Splay and octane-SplayLatency test on
> AWFY.

I'm not seeing anything like that on machine 12. What platform are you looking at?
Flags: needinfo?(mayankleoboy1)
machine11
Flags: needinfo?(mayankleoboy1)
Thanks! There's definitely something strange going on on 32bit.

I still don't see any movement on SplayLatency, but there is certainly a major regression on Splay. It's weird though: there's a ton of stuff in the first, small regression (including this) and not anything in the range of the larger regression. I'll bisect to see what's going on as soon as I get my primary dev machine back online.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Splay is in a whole different ballback performance-wise now and splay latency numbers appear to have recovered and then followed m-i numbers.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Blocks: 1507445
You need to log in before you can comment on or make changes to this bug.