Closed
Bug 711623
Opened 13 years ago
Closed 13 years ago
rt->gcNumFreeArenas is not updated properly
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla11
People
(Reporter: igor, Assigned: igor)
References
Details
Attachments
(1 file, 2 obsolete files)
11.52 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
The bug 700357 has added the rt->gcNumFreeArenas counter that is used for GC scheduling. However, currently the modifications to the counter can race. In particular, the counter is modified in Chunk::init.
That function is called outside the GC lock by the background thread. So the increase rt->gcNumFreeArenas += ArenasPerChunk in the init method can race with a counter decrease on the allocation thread. That could lead to the counter wrapping around 0 and becoming a positive number around UINT_MAX. That in turn permanently trigger periodic GC each 20 seconds.
That could be the reason for the bug 707162.
Assignee | ||
Comment 1•13 years ago
|
||
AFAICS there is another bug with rt->gcNumFreeArenas that has more chances to lead to the bug 707162. When the chunk becomes empty and is moved to the GC free list and the later destroyed, rt->gcNumFreeArenas is not decreased by ArenasPerChunk amount. So whenever we release the first empty chunk rt->gcNumFreeArenas becomes permanently positive.
Summary: modifications of rt->gcNumFreeArenas can race → rt->gcNumFreeArenas is not updated properly
Comment 2•13 years ago
|
||
Good catch. I think it is fine to have the race as long as we reliably reset the value to something valid occasionally. I am not sure why I did not implement it like this in the first place: as is, it is clearly buggy.
Attachment #582415 -
Flags: review?(igor)
Comment 3•13 years ago
|
||
Do we have telemetry set up so that, when this change is checked in, we'll be able to see whether it solves bug 707162? If not, we should set up that telemetry ASAP.
Assignee | ||
Comment 4•13 years ago
|
||
For the bug 702251 and related bugs it is a must to avoid the race as with bugs the reset point may not be reached.
The fix here solves that and yet another problem (but it is mostly theoretical) with counter desynchronization when the counter is not reset after PickChunk failed to add the chunk to the set and calls Chunk::release(rt, chunk).
With the fix the counter reflects the number of free committed arenas in non-empty chunks. I also renamed it to match ChunkInfo::numArenasFreeCommitted.
Attachment #582415 -
Attachment is obsolete: true
Attachment #582415 -
Flags: review?(igor)
Attachment #582428 -
Flags: review?(terrence)
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #3)
> Do we have telemetry set up so that, when this change is checked in, we'll
> be able to see whether it solves bug 707162? If not, we should set up that
> telemetry ASAP.
I suppose GC_REASON histogram should show a drop in frequency for MAYBEGC (1).
Comment 6•13 years ago
|
||
(In reply to Igor Bukanov from comment #5)
> (In reply to Justin Lebar [:jlebar] from comment #3)
> > Do we have telemetry set up so that, when this change is checked in, we'll
> > be able to see whether it solves bug 707162? If not, we should set up that
> > telemetry ASAP.
>
> I suppose GC_REASON histogram should show a drop in frequency for MAYBEGC
> (1).
Where can I see this histogram?
Comment 7•13 years ago
|
||
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #7)
> https://metrics.mozilla.com/pentaho/content/pentaho-cdf-dd/
> Render?solution=metrics2&path=telemetry&file=telemetryHistogram.wcdf
That does not show GC_REASON for some reasons...
(In reply to Igor Bukanov from comment #8)
> That does not show GC_REASON for some reasons...
Ehsan recently pointed out that telemetry histograms are not allowed to be defined with a range that starts at 0. (Why this is I have no idea; it seems insane.) Because of this, the GC_REASON histogram was getting ignored.
Assignee | ||
Comment 10•13 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #9)
> Ehsan recently pointed out that telemetry histograms are not allowed to be
> defined with a range that starts at 0. (Why this is I have no idea; it seems
> insane.) Because of this, the GC_REASON histogram was getting ignored.
Is it ignored during submission so we have absolutely no data?
The bug is when the histogram data structure is created on the client side.
http://hg.mozilla.org/mozilla-central/file/cbb0233c7ba8/toolkit/components/telemetry/Telemetry.cpp#l135
Unfortunately, Telemetry::Accumulate drops errors on the floor.
Ehsan fixed it so that the range starts at 1. However, we're stilling adding 0 values to the histogram, so I'm not actually sure what will happen.
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #11)
> The bug is when the histogram data structure is created on the client side.
>
> http://hg.mozilla.org/mozilla-central/file/cbb0233c7ba8/toolkit/components/
> telemetry/Telemetry.cpp#l135
> Unfortunately, Telemetry::Accumulate drops errors on the floor.
>
> Ehsan fixed it so that the range starts at 1. However, we're stilling adding
> 0 values to the histogram, so I'm not actually sure what will happen.
Do we have a bug to either add 1 to histogram values or remap 0 into some other unused value to preserve submitted data if any?
Comment 13•13 years ago
|
||
Comment on attachment 582428 [details] [diff] [review]
full fix
Review of attachment 582428 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks! That is a huge improvement.
Attachment #582428 -
Flags: review?(terrence) → review+
Assignee | ||
Comment 14•13 years ago
|
||
Comment on attachment 582428 [details] [diff] [review]
full fix
My patch has a bad bug - when the chunk is returned to the empty pool the global counter should decrease not by ArenasPerChunk, but rather by the number of committed arenas in the chunk that is put to the pool.
Attachment #582428 -
Attachment is obsolete: true
Assignee | ||
Comment 15•13 years ago
|
||
The new version hopefully fixes all the issues with the counter update. As before the patch the runtime counter accounts for decommitted arenas both in non-empty chunks and in empty chunks. If we have a lot of empty chunks, but only few arenas in them are committed, it should not represent the memory pressure.
Attachment #582667 -
Flags: review?(terrence)
Comment 16•13 years ago
|
||
Comment on attachment 582667 [details] [diff] [review]
v2
Review of attachment 582667 [details] [diff] [review]:
-----------------------------------------------------------------
I like the name changes and I especially like that this removes the rt from Chunk::init. However, this does introduce one potential problem: we include FreeCommitted from the ChunkPool in the count, but don't decommit them in DecommitFreePages, leaving us with a persistent positive number. With the current constants it is impossible to get into a loop where we GC_SHRINK at every MaybeGC, but this is a subtle interaction for anyone tweaking those constants to keep in mind. I'll give this an r+ given the addition of sufficient static asserts to ensure that the maximum ChunkPool size and the FreeCommittedArenaThreshhold have a sane relation.
Attachment #582667 -
Flags: review?(terrence) → review+
Assignee | ||
Comment 17•13 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #16)
> However, this does introduce one potential problem: we include
> FreeCommitted from the ChunkPool in the count, but don't decommit them in
> DecommitFreePages, leaving us with a persistent positive number.
I am not sure what do you mean here. Surely we skip empty chunks when doing the decommit, but when we do decommit, we remove later on the background thread all the empty chunks. As the new GC always waits until the backgrounw sweep f
With the
> current constants it is impossible to get into a loop where we GC_SHRINK at
> every MaybeGC, but this is a subtle interaction for anyone tweaking those
> constants to keep in mind. I'll give this an r+ given the addition of
> sufficient static asserts to ensure that the maximum ChunkPool size and the
> FreeCommittedArenaThreshhold have a sane relation.
Assignee | ||
Comment 18•13 years ago
|
||
Sorry for incomplete comment, I meant:
(In reply to Terrence Cole [:terrence] from comment #16)
> However, this does introduce one potential problem: we include
> FreeCommitted from the ChunkPool in the count, but don't decommit them in
> DecommitFreePages, leaving us with a persistent positive number.
I am not sure what do you mean here. Surely we skip empty chunks when doing the decommit, but when we do decommit, we remove later on the background thread all the empty chunks. As the new GC always waits until the background sweep finishes, we cannot have several GC with decommits without releasing all empty chunks.
Also note that in the bug 702251 I wull implement decommit together with the chunk release on the background thread, so there would be no arbitrary pause between decoomit and release,
Comment 19•13 years ago
|
||
(In reply to Igor Bukanov from comment #18)
> Sorry for incomplete comment, I meant:
>
> (In reply to Terrence Cole [:terrence] from comment #16)
> > However, this does introduce one potential problem: we include
> > FreeCommitted from the ChunkPool in the count, but don't decommit them in
> > DecommitFreePages, leaving us with a persistent positive number.
>
> I am not sure what do you mean here. Surely we skip empty chunks when doing
> the decommit, but when we do decommit, we remove later on the background
> thread all the empty chunks. As the new GC always waits until the background
> sweep finishes, we cannot have several GC with decommits without releasing
> all empty chunks.
Ah, okay, I did not know that we freed the ChunkPool on GC_SHRINK, but it makes perfect sense. In that case, this is fine. It would be nice if we could also add an assert at the end of GC_SHRINK that the counter is always 0, but that would require us to decide that the Decommit is infallible, which I'm not sure we want to guarantee yet.
> Also note that in the bug 702251 I wull implement decommit together with
> the chunk release on the background thread, so there would be no arbitrary
> pause between decoomit and release,
I'm looking forward to seeing that -- particularly how you will avoid taking a lock at arena allocation if we may be decommitting at the same time.
Assignee | ||
Comment 20•13 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #19)
> I'm looking forward to seeing that -- particularly how you will avoid taking
> a lock at arena allocation if we may be decommitting at the same time.
That was not the most difficult part. The bigger complexity comes from avoiding racing with the allocation thread and decommitting from the tail of the available chunk lists, not from the start - I need to fix few issues with the patch before asking for a review...
Assignee | ||
Comment 21•13 years ago
|
||
Comment 22•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
You need to log in
before you can comment on or make changes to this bug.
Description
•