Intermittent LeakSanitizer | leak at mozilla::SchedulerGroup::LabeledDispatch, Dispatch, Dispatch, mozilla::BackgroundHangThread::ReportHang

RESOLVED FIXED in Firefox 56

Status

()

defect
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: intermittent-bug-filer, Assigned: froydnj)

Tracking

({intermittent-failure, memory-leak})

unspecified
mozilla57
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

Details

(Whiteboard: [stockwell fixed:product])

Attachments

(1 attachment)

Keywords: mlk
Comment hidden (Intermittent Failures Robot)
this started July 13th and has 38 failures since then (7 days):
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1380619&startday=2017-07-10&endday=2017-07-20&tree=all

all on linux64-asan in e10s mode after  running the dom/base/test directory of tests.

here is a related log file:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=115803366

and the related leak from that log:
task 2017-07-13T11:07:24.995622Z] 11:07:24     INFO - GECKO(2074) | =================================================================
[task 2017-07-13T11:07:24.997767Z] 11:07:24     INFO - GECKO(2074) | ==2152==ERROR: LeakSanitizer: detected memory leaks
[task 2017-07-13T11:07:25.000382Z] 11:07:24     INFO - GECKO(2074) | Direct leak of 56 byte(s) in 1 object(s) allocated from:
[task 2017-07-13T11:07:25.005355Z] 11:07:25     INFO - GECKO(2074) |     #0 0x4bb9ec in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64:3
[task 2017-07-13T11:07:25.008706Z] 11:07:25     INFO - GECKO(2074) |     #1 0x4ecf0d in moz_xmalloc /home/worker/workspace/build/src/memory/mozalloc/mozalloc.cpp:83:17
[task 2017-07-13T11:07:25.015830Z] 11:07:25     INFO - GECKO(2074) |     #2 0x7f688c6efbdc in operator new /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/mozalloc.h:194:12
[task 2017-07-13T11:07:25.035596Z] 11:07:25     INFO - GECKO(2074) |     #3 0x7f688c6efbdc in mozilla::SchedulerGroup::LabeledDispatch(char const*, mozilla::TaskCategory, already_AddRefed<nsIRunnable>&&) /home/worker/workspace/build/src/xpcom/threads/SchedulerGroup.cpp:316
[task 2017-07-13T11:07:25.056256Z] 11:07:25     INFO - GECKO(2074) |     #4 0x7f688c6e44a0 in Dispatch /home/worker/workspace/build/src/xpcom/threads/SchedulerGroup.cpp:228:10
[task 2017-07-13T11:07:25.060463Z] 11:07:25     INFO - GECKO(2074) |     #5 0x7f688c6e44a0 in Dispatch /home/worker/workspace/build/src/xpcom/threads/SystemGroup.cpp:92
[task 2017-07-13T11:07:25.064424Z] 11:07:25     INFO - GECKO(2074) |     #6 0x7f688c6e44a0 in mozilla::BackgroundHangThread::ReportHang(unsigned int) /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:637
[task 2017-07-13T11:07:25.068050Z] 11:07:25     INFO - GECKO(2074) |     #7 0x7f688c6e304a in ReportPermaHang /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:675:3
[task 2017-07-13T11:07:25.070271Z] 11:07:25     INFO - GECKO(2074) |     #8 0x7f688c6e304a in mozilla::BackgroundHangManager::RunMonitorThread() /home/worker/workspace/build/src/xpcom/threads/BackgroundHangMonitor.cpp:368
[task 2017-07-13T11:07:25.072305Z] 11:07:25     INFO - GECKO(2074) | -----------------------------------------------------



I did some retriggers to see if there is a pattern when this started:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=asan%20browser-chrome-e10s-3&tochange=31973778f0ed30ddde8b8aafe15ebf3c1dbe65d2&fromchange=3fe4adc63baf237235f439667af42cc5f9d460f9&selectedJob=113901509

ideally we will see results when those test jobs finish up.
Whiteboard: [stockwell needswork]
I am assuming this leak is related to BackgroundHangMonitor.cpp, :mystor, I see you have edited this file recently and often in the past, could you help determine if this leak we are seeing is related to BackgroundHangMonitor.cpp?
Flags: needinfo?(michael)
Comment hidden (Intermittent Failures Robot)
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #4)
> I am assuming this leak is related to BackgroundHangMonitor.cpp, :mystor, I
> see you have edited this file recently and often in the past, could you help
> determine if this leak we are seeing is related to BackgroundHangMonitor.cpp?

Yes, I imagine that it is as well. I have some idea of how this would have happened, and I don't have a good way to deal with it right now. Basically SchedulerGroup allocates a wrapping runnable which we don't know about, and then intentionally leaks it if we are trying to dispatch during shutdown. We don't really have a good way to deal with that unfortunately.

Right now the code has a nasty hack to try to get around the leaking of the internal runnable, but I have no way to get my hands on the wrapping runnable.

This might be made unnecessary by bug 1380081 which removes the codepath which is leaking.
Flags: needinfo?(michael)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
(In reply to Michael Layzell [:mystor] from comment #6)
> This might be made unnecessary by bug 1380081 which removes the codepath
> which is leaking.

It looks like that bug is still progressing, but there is a lot going on there...it might take a while.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 19

2 years ago
SchedulerGroup dispatch needs to replicate all the quirks of dispatching
directly to threads, which means we need to handle cases where dispatch
might have failed and we have resources that we don't want to leak.

Not 100% sure this solves the leaks, but I have dozens of asan brower-chrome
test retriggers running at:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=98b5a247fab54156c7e604226eaf9a7b597c605f

and I haven't seen this failure come up yet, which I think is a good sign.
Attachment #8895448 - Flags: review?(michael)
Attachment #8895448 - Flags: review?(michael) → review+
Assignee: nobody → nfroyd

Comment 20

2 years ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/187e6f6cfba7
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown; r=mystor

Comment 21

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/187e6f6cfba7
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please nominate this for Beta approval when you get a chance.
Flags: needinfo?(nfroyd)
Assignee

Comment 23

2 years ago
(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)
> Please nominate this for Beta approval when you get a chance.

Will do.  I'm going to wait until Monday; brasstacks shows no intermittents yesterday, but I want to give it today and the weekend to make sure that wasn't a fluke.
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 25

2 years ago
Comment on attachment 8895448 [details] [diff] [review]
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown

Approval Request Comment
[Feature/Bug causing the regression]: The scheduler/BHR.
[User impact if declined]: None
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Insofar as the intermittent oranges have stopped, yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This code is just preventing shutdown memory leaks, and the cases that it handles are well-understood cases that occur in other shutdown leaks that we have fixed.
[String changes made/needed]: None.
Flags: needinfo?(nfroyd)
Attachment #8895448 - Flags: approval-mozilla-beta?
Comment on attachment 8895448 [details] [diff] [review]
avoid unnecessary content process leaks in SchedulerGroup dispatch during shutdown

Fixes leaks, and an intermittent orange - let's uplift for beta 3.
Attachment #8895448 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Nathan Froyd [:froydnj] from comment #25)
> [Is this code covered by automated tests?]: Yes.
> [Has the fix been verified in Nightly?]: Insofar as the intermittent oranges
> have stopped, yes.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on Nathan Froyd's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.