Closed Bug 1268501 Opened 8 years ago Closed 8 years ago

Taking the GC lock can cause severe slowdowns

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: h4writer, Unassigned)

Details

Attachments

(1 file)

When landing a patch for the recent regexp regression we experienced we noticed the following behaviour:

Machine 1: 3% improvement
https://arewefastyet.com/#machine=31&view=single&suite=octane&subtest=RegExp&start=1461091501&end=1461798084

Machine 2: 31% regression
https://arewefastyet.com/#machine=28&view=single&suite=octane&subtest=RegExp&start=1461091501&end=1461798084

After adding enough TraceLogger log points I saw the following:
http://imgur.com/gBV0rKF
> - Internal is TraceLogger time needed to dump. (tldr: ignore)
> - Tenure: Creating of Tenured regexp object: https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/js/src/vm/RegExpObject.cpp#1026
>           Can be ignored. Most time is spend when ion is still warming up
> - lock 2: https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/js/src/gc/Allocator.cpp#336

If you look at the graph, you'll see that as  soon as we are going fast, lock2 is taking large chunks out of the graph (4%). It is taking longer than MinorGC (0.6%) and GC (1%).

Zooming and only showing the last few iterations shows this even more:
http://imgur.com/Cs6cCtf
(Showing in green IonMonkey, grey is GC and the lightgreen is the lock.)


For the last few iterations I included also a view with all threads:
http://imgur.com/xKsjJaa

Every line is a thread. The first one is the mainthread.  Every other line is an background thread. With grey being GC again. You'll see that every lock2 corresponds with a big GC chunk happening on the offthread. Possibly contending for the lock.

This is a bad case for regexp currently in some special cases (but not blocking, other options are viable to improve the situation. Also not easy!). But I don't think this behaviour is desirable for a GC. Therefore opening a bug to further investigate what could be done to mitigate this.
Just to be very specific "lock 2" corresponds to exactly:

    // Parallel threads have their own ArenaLists, but chunks are shared;
    // if we haven't already, take the GC lock now to avoid racing.
    if (maybeLock.isNothing())
        maybeLock.emplace(rt);

Nothing else out of that function!
From the tracelogging it looks like a background thread is holding this lock for a long time and delaying the main thread.

It's possible that this is caused by background sweeping which actually holds the lock for an unbounded length of time while releasing empty arenas.

Here's a patch to periodically drop and reacquire the lock while doing this, which should let the main thread get a chance to take it.
(In reply to Jon Coppeard (:jonco) from comment #2)
> Created attachment 8746555 [details] [diff] [review]
> bug1268501-gc-lock
> 
> From the tracelogging it looks like a background thread is holding this lock
> for a long time and delaying the main thread.
> 
> It's possible that this is caused by background sweeping which actually
> holds the lock for an unbounded length of time while releasing empty arenas.
> 
> Here's a patch to periodically drop and reacquire the lock while doing this,
> which should let the main thread get a chance to take it.

That is definitely already an improvement (again the last few iterations):
http://imgur.com/06m9cmk

Now the issue is that we have to call that lock so many times. I assume since we have not aggregated enough free allocation place for it to not request more memory?
Comment on attachment 8746555 [details] [diff] [review]
bug1268501-gc-lock

Review of attachment 8746555 [details] [diff] [review]:
-----------------------------------------------------------------

Requesting review as it seem this helps.

I made up the number 32, and possibly it is too small.  However it will only slow down the background thread a little so maybe we don't care anyway.
Attachment #8746555 - Flags: review?(terrence)
Comment on attachment 8746555 [details] [diff] [review]
bug1268501-gc-lock

Review of attachment 8746555 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense to me. Great work tracking this down, Hannes!
Attachment #8746555 - Flags: review?(terrence) → review+
Thanks for the quick responses! The graphs look already a little bit bitter!

Not sure if I need to open a new bug, but I'm still seeing similar/related issues:
http://imgur.com/sB0lCAA

(1) Seems like there are still some long taking locks not solved by this?
(2) Not sure how this works, but why do we get into "refillFreeListFromMainThread" the whole time? Would it be possible to allocate a little bit more place, making sure that we can go on for some time before the need to enter refillFreeListFromMainThread again?
(In reply to Hannes Verschore [:h4writer] from comment #6)
> (2) Not sure how this works, but why do we get into
> "refillFreeListFromMainThread" the whole time? Would it be possible to
> allocate a little bit more place, making sure that we can go on for some
> time before the need to enter refillFreeListFromMainThread again?

Typed arenas are 4KiB in size, and we use a different list of arenas for each kind. That means we can allocate roughly 4KiB / sizeof(thing) in each arena, and after that we have to 'refill' the 'free list'. So long as we aren't in the middle of background finalization and there are non-full arenas left, we can just grab the next non-full arena and start allocating out of it (non-full arenas might only have space left for a single object though).

Dropping out of the JITs for this doesn't seem to hurt performance generally speaking - we just refill the free list, and then the JITs can use it until it's empty again. Things only get slow if we have to take a contested lock (either while background finalization may be modifying the arena lists, or if we need to allocate a new arena from our chunk pool).

I guess there are two ways to make taking the lock less frequent:
1) Increase the size of arenas. This would increase the average free space per arena, but it wouldn't eliminate near-full arenas. I'm not sure what the trade-offs of doing this would be, but it's probably not something to do lightly.
2) Allocate multiple new arenas from our chunk pool at a time. This would mean that the next time we have to refill our free list, there's a good chance that we can just use an empty arena we already allocated previously, and we won't have to take the lock. Of course, it does add a bit of extra memory usage (but the GC could just free any empty arenas left over when it runs), and we might exhaust our chunk pool more quickly this way.
https://hg.mozilla.org/mozilla-central/rev/a16e3700894b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: