Closed Bug 1380619 Opened 3 years ago Closed 3 years ago

Intermittent LeakSanitizer | leak at mozilla::SchedulerGroup::LabeledDispatch, Dispatch, Dispatch, mozilla::BackgroundHangThread::ReportHang

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: froydnj)

Details

(Keywords: intermittent-failure, memory-leak, Whiteboard: [stockwell fixed:product])

Attachments

(1 file)

Keywords: mlk
this started July 13th and has 38 failures since then (7 days):
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1380619&startday=2017-07-10&endday=2017-07-20&tree=all

all on linux64-asan in e10s mode after  running the dom/base/test directory of tests.

here is a related log file:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=115803366

and the related leak from that log:
task 2017-07-13T11:07:24.995622Z] 11:07:24     INFO - GECKO(2074) | =================================================================
[task 2017-07-13T11:07:24.997767Z] 11:07:24     INFO - GECKO(2074) | ==2152==ERROR: LeakSanitizer: detected memory leaks
[task 2017-07-13T11:07:25.000382Z] 11:07:24     INFO - GECKO(2074) | Direct leak of 56 byte(s) in 1 object(s) allocated from:
[task 2017-07-13T11:07:25.005355Z] 11:07:25     INFO - GECKO(2074) |     #0 0x4bb9ec in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64:3
[task 2017-07-13T11:07:25.008706Z] 11:07:25     INFO - GECKO(2074) |     #1 0x4ecf0d in moz_xmalloc /home/worker/workspace/build/src/memory/mozalloc/mozalloc.cpp:83:17
[task 2017-07-13T11:07:25.015830Z] 11:07:25     INFO - GECKO(2074) |     #2 0x7f688c6efbdc in operator new /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/mozalloc.h:194:12
[task 2017-07-13T11:07:25.035596Z] 11:07:25     INFO - GECKO(2074) |     #3 0x7f688c6efbdc in mozilla::SchedulerGroup::LabeledDispatch(char const*, mozilla::TaskCategory, already_AddRefed<nsIRunnable>&&) /home/worker/workspace/build/src/xpcom/threads/SchedulerGroup.cpp:316
[task 2017-07-13T11:07:25.056256Z] 11:07:25     INFO - GECKO(2074) |     #4 0x7f688c6e44a0 in Dispatch /home/worker/workspace/build/src/xpcom/threads/SchedulerGroup.cpp:228:10
[task 2017-07-13T11:07:25.060463Z] 11:07:25     INFO - GECKO(2074) |     #5 0x7f688c6e44a0 in Dispatch /home/worker/workspace/build/src/xpcom/threads/SystemGroup.cpp:92
[task 2017-07-13T11:07:25.064424Z] 11:07:25     INFO - GECKO(2074) |     #6 0x7f688c6e44a0 in mozilla::BackgroundHangThread::ReportHang(unsigned int) /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:637
[task 2017-07-13T11:07:25.068050Z] 11:07:25     INFO - GECKO(2074) |     #7 0x7f688c6e304a in ReportPermaHang /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:675:3
[task 2017-07-13T11:07:25.070271Z] 11:07:25     INFO - GECKO(2074) |     #8 0x7f688c6e304a in mozilla::BackgroundHangManager::RunMonitorThread() /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:368
[task 2017-07-13T11:07:25.072305Z] 11:07:25     INFO - GECKO(2074) | -----------------------------------------------------



I did some retriggers to see if there is a pattern when this started:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=asan%20browser-chrome-e10s-3&tochange=31973778f0ed30ddde8b8aafe15ebf3c1dbe65d2&fromchange=3fe4adc63baf237235f439667af42cc5f9d460f9&selectedJob=113901509

ideally we will see results when those test jobs finish up.
Whiteboard: [stockwell needswork]
I am assuming this leak is related to BackgroundHangMonitor.cpp, :mystor, I see you have edited this file recently and often in the past, could you help determine if this leak we are seeing is related to BackgroundHangMonitor.cpp?
Flags: needinfo?(michael)
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #4)
> I am assuming this leak is related to BackgroundHangMonitor.cpp, :mystor, I
> see you have edited this file recently and often in the past, could you help
> determine if this leak we are seeing is related to BackgroundHangMonitor.cpp?

Yes, I imagine that it is as well. I have some idea of how this would have happened, and I don't have a good way to deal with it right now. Basically SchedulerGroup allocates a wrapping runnable which we don't know about, and then intentionally leaks it if we are trying to dispatch during shutdown. We don't really have a good way to deal with that unfortunately.

Right now the code has a nasty hack to try to get around the leaking of the internal runnable, but I have no way to get my hands on the wrapping runnable.

This might be made unnecessary by bug 1380081 which removes the codepath which is leaking.
Flags: needinfo?(michael)
(In reply to Michael Layzell [:mystor] from comment #6)
> This might be made unnecessary by bug 1380081 which removes the codepath
> which is leaking.

It looks like that bug is still progressing, but there is a lot going on there...it might take a while.
SchedulerGroup dispatch needs to replicate all the quirks of dispatching
directly to threads, which means we need to handle cases where dispatch
might have failed and we have resources that we don't want to leak.

Not 100% sure this solves the leaks, but I have dozens of asan brower-chrome
test retriggers running at:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=98b5a247fab54156c7e604226eaf9a7b597c605f

and I haven't seen this failure come up yet, which I think is a good sign.
Attachment #8895448 - Flags: review?(michael)
Attachment #8895448 - Flags: review?(michael) → review+
Assignee: nobody → nfroyd
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/187e6f6cfba7
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown; r=mystor
https://hg.mozilla.org/mozilla-central/rev/187e6f6cfba7
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please nominate this for Beta approval when you get a chance.
Flags: needinfo?(nfroyd)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)
> Please nominate this for Beta approval when you get a chance.

Will do.  I'm going to wait until Monday; brasstacks shows no intermittents yesterday, but I want to give it today and the weekend to make sure that wasn't a fluke.
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment on attachment 8895448 [details] [diff] [review]
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown

Approval Request Comment
[Feature/Bug causing the regression]: The scheduler/BHR.
[User impact if declined]: None
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Insofar as the intermittent oranges have stopped, yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This code is just preventing shutdown memory leaks, and the cases that it handles are well-understood cases that occur in other shutdown leaks that we have fixed.
[String changes made/needed]: None.
Flags: needinfo?(nfroyd)
Attachment #8895448 - Flags: approval-mozilla-beta?
Comment on attachment 8895448 [details] [diff] [review]
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown

Fixes leaks, and an intermittent orange - let's uplift for beta 3.
Attachment #8895448 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Nathan Froyd [:froydnj] from comment #25)
> [Is this code covered by automated tests?]: Yes.
> [Has the fix been verified in Nightly?]: Insofar as the intermittent oranges
> have stopped, yes.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on Nathan Froyd's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.