Closed Bug 1371728 Opened 2 years ago Closed 2 years ago

Intermittent leakcheck | default process: 557416 bytes leaked (CancelableRunnable, CondVar, IdlePeriod, IdleRunnable, Mutex, ...)

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 + fixed
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ehsan)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, memory-leak, regressionwindow-wanted, Whiteboard: [MemShrink:P2][stockwell fixed:product])

Attachments

(1 file)

We're leaking 2017 timers and 2017 IdleRunnables.
Component: DOM: Content Processes → XPCOM
Possibly a regression from bug 1361461, which I expect affected how a lot of idle runnables were handled.
Blocks: 1361461
[Tracking Requested - why for this release]: It would be really bad if we've started leaking thousands of runnables sometimes. We should either make sure this goes away, or bisect down to figure out what has caused it. Or confirm that it isn't a new issue.
Whiteboard: [MemShrink]
Depends on: 933741
Blocks: 933741
No longer depends on: 933741
It would be nice to confirm if this is a real leak or a leak reported by the reftest harness.  IIRC I was seeing intermittent leaks like this before I added the 1 second timeout, so it could be that there are a number of these runnables that aren't processed at the time we shut down the reftest harness.
2000 is a lot! Could it that the idle queue is starved out by the absurd number of pages opened by reftests? Could we flush it on shutdown (at least in debug builds) before shutdown GCs happen?
(In reply to Andrew McCreight [:mccr8] from comment #5)
> 2000 is a lot! Could it that the idle queue is starved out by the absurd
> number of pages opened by reftests?

Yeah, that could certainly happen.

(Not sure where the 2000 number comes from, FWIW.  I'm assuming you meant 1000ms!)

> Could we flush it on shutdown (at least
> in debug builds) before shutdown GCs happen?

Can you point me to some place in the code where this needs to happen?  I don't know exactly where the right place before which all of these need to run is...
Flags: needinfo?(continuation)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #7)
> (Not sure where the 2000 number comes from, FWIW.  I'm assuming you meant
> 1000ms!)

I don't mean 1000ms. I mean 2000 different runnables are leaking.

If you look through the log, you'll see a little ASCII art chart of what is leaking:

      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
      |                                      | Per-Inst   Leaked|   Total      Rem|
...
  349 |IdleRunnable                          |       56   101360|   12743     1810|

This means that we are leaking 1810 different runnables.

> Can you point me to some place in the code where this needs to happen?  I
> don't know exactly where the right place before which all of these need to
> run is...

You could probably register an observer for "xpcom-shutdown". I'm not sure how shutting down of the idle queue is handled right now.
Flags: needinfo?(continuation)
Though really, if we're really letting the idle queue get that backed up in reftests, maybe something should be added to the reftest harness to let the idle queue run a bit in between tests? We do something like that for GC/CC already.
(In reply to Andrew McCreight [:mccr8] from comment #8)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #7)
> > (Not sure where the 2000 number comes from, FWIW.  I'm assuming you meant
> > 1000ms!)
> 
> I don't mean 1000ms. I mean 2000 different runnables are leaking.

Oh, I see, thanks for the explanation!

(In reply to Andrew McCreight [:mccr8] from comment #9)
> Though really, if we're really letting the idle queue get that backed up in
> reftests, maybe something should be added to the reftest harness to let the
> idle queue run a bit in between tests? We do something like that for GC/CC
> already.

I was asking about the place in the reftest harness where this is already done for GC/CC...

We could just choose to process all of the idle queue before dispatching xpcom-shutdown in debug builds, for example.  Or we can add a hack specific to these WindowDestroyEvent runnables.  I'm not really sure which one would be preferrable...

Olli, do you have any preferences which way to go?
Flags: needinfo?(bugs)
Why aren't we processing the idle runnables the same way as other runnables during shutdown?
That is how they should work, I think.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #11)
> Why aren't we processing the idle runnables the same way as other runnables
> during shutdown?
> That is how they should work, I think.

As far as I can tell, this is the last place where we process any runnables at shutdown time: <https://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/xpcom/build/XPCOMInit.cpp#913>

And a bit later we call NS_LogTerm() here <https://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/xpcom/build/XPCOMInit.cpp#1077> which is the code that triggers leak checking.

NS_ProcessPendingEvents() just calls ProcessNextEvent() in a loop, so if we end up being in an idle period within this time, eventually GetEvent() may return null while there are runnables in our idle queue.  Then NS_ProcessPendingEvents() would return, and we finish without having exhausted everything in our idle queue (and of course we leak them all.)

Is there a specific mechanism that should be processing these runnables at shutdown besides the above?
Flags: needinfo?(bugs)
So sounds like we should just skip idle detection if we're shutting down a thread and process all idle events.
nsThread::GetIdleEvent could check ShuttingDown() and if that returns null, bypass idleness checks.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #13)
> So sounds like we should just skip idle detection if we're shutting down a
> thread and process all idle events.
> nsThread::GetIdleEvent could check ShuttingDown() and if that returns null,
> bypass idleness checks.

That sounds like a good idea.  It should be possible to implement that by making GetIdleDeadline() return TimeStamp::Now() when we're shutting down, I think.  I'll give that a shot on the try server to see what happens.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #15)
> (In reply to Olli Pettay [:smaug] from comment #13)
> > So sounds like we should just skip idle detection if we're shutting down a
> > thread and process all idle events.
> > nsThread::GetIdleEvent could check ShuttingDown() and if that returns null,
> > bypass idleness checks.
> 
> That sounds like a good idea.  It should be possible to implement that by
> making GetIdleDeadline() return TimeStamp::Now() when we're shutting down, I
> think.  I'll give that a shot on the try server to see what happens.

That approach seems promising! https://treeherder.mozilla.org/#/jobs?repo=try&revision=de54d4b9e4d60d50d55241f3914930f3f06598a1
Assignee: nobody → ehsan
Attachment #8877385 - Flags: review?(bugs)
this seems worth tracking and making the leakcheck complaints go away for 55
Attachment #8877385 - Flags: review?(bugs) → review+
(In reply to Julien Cristau [:jcristau] from comment #19)
> this seems worth tracking and making the leakcheck complaints go away for 55

Yes, we should also backport this to ensure that we do in fact run all of these idle runnables when shutting down even more importantly.
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f65461f10e0a
Don't honor the idle period during shutdown; r=smaug
Comment on attachment 8877385 [details] [diff] [review]
Don't honor the idle period during shutdown

Approval Request Comment
[Feature/Bug causing the regression]: The idle queue splitting work, spread across many bugs.
[User impact if declined]: We may not run some idle runnables at shutdown time.
[Is this code covered by automated tests?]: Yes, we see intermittent leaks when this happens.
[Has the fix been verified in Nightly?]: It has been fixed on the try server.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Not very risky.
[Why is the change risky/not risky?]: It's well understood.
[String changes made/needed]: None.
Attachment #8877385 - Flags: approval-mozilla-beta?
Bug 1358898 is related to this, though I think the patch here won't help that, because the issue there is we're spending too much time on idle runnables already.
Whiteboard: [MemShrink] → [MemShrink:P2]
https://hg.mozilla.org/mozilla-central/rev/f65461f10e0a
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8877385 [details] [diff] [review]
Don't honor the idle period during shutdown

process idle queue on shutdown, beta55+
Attachment #8877385 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify-
Whiteboard: [MemShrink:P2] → [MemShrink:P2][stockwell fixed:product]
You need to log in before you can comment on or make changes to this bug.