Closed
Bug 1089816
Opened 10 years ago
Closed 9 years ago
Get LeakSanitizer working on e10s
Categories
(Testing :: General, defect)
Testing
General
Tracking
(e10s+, firefox42 fixed)
RESOLVED
FIXED
mozilla42
People
(Reporter: mccr8, Assigned: mccr8)
References
(Blocks 1 open bug)
Details
(Whiteboard: [MemShrink])
Attachments
(3 files, 3 obsolete files)
35.10 KB,
text/plain
|
Details | |
5.91 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
1.12 KB,
patch
|
Details | Diff | Splinter Review |
Our ASan builds, which run LSan, are opt builds, so they may call exit(0) in the child process, and not go through the regular shutdown. This probably means we're not running LSan in the child process.
LSan mostly detects leaks of buffers of stuff, which can be particularly important for media code. We probably don't want to bother with this until the regular leak checking is working, or we'll get a lot of failures.
Assignee | ||
Updated•10 years ago
|
tracking-e10s:
--- → ?
Summary: Figure out if LSan is working on e10s → Figure out if LeakSanitizer is working on e10s
Updated•10 years ago
|
Assignee | ||
Comment 1•10 years ago
|
||
I did a hacky try run that removes the QuickExit() from ContentChild::ActorDestroy entirely, and it looks like we are in fact not running LSan on the child process right now. That will be easy enough to fix, by changing the #ifndef DEBUG to be something that doesn't apply to ASAN builds either. The problem will just be fixing up stuff so that it is green.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=7965a2cdeb12
Ideally, we'd also have some kind of check that all processes produce LSan reports, like we do for regular leak logs, but that's probably something for a separate bug. I filed bug 1094988 for that, but I think it isn't too critical.
Summary: Figure out if LeakSanitizer is working on e10s → Get LeakSanitizer working on e10s
Assignee | ||
Comment 2•10 years ago
|
||
This is the leak log I get when I run this:
./mach mochitest-plain --e10s --this-chunk=20 --total-chunks=100
A lot of those look like the same thing as bug 1116261. I suspect that we could clean a lot of this up by fixing bug 1090570.
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #3)
> A lot of those look like the same thing as bug 1116261.
Yeah, I figured that looking at LSan output would be easier than trying to instrument all of chromium IPC with annotations.
> I suspect that we could clean a lot of this up by fixing bug 1090570.
Ok, interesting.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → continuation
Assignee | ||
Comment 5•10 years ago
|
||
This is the minimum patch needed, and it works locally for short Mochitest runs. Unfortunately, it does not produce a LeakSanitizer log on try. Presumably we're silently exiting somewhere else. I guess I'll have to add a bunch of printfs and try to figure out where that is failing.
Assignee | ||
Comment 6•10 years ago
|
||
I haven't figured out anything here, but I should try disabling sandboxing and see if that helps.
Assignee | ||
Comment 7•9 years ago
|
||
Ok, so there's actually no sandboxing. The issue seems to be how ContentParent kills off the child process when it takes too long. It turns out this was an issue back in 2010 for XPCOM shutdown leak checking (bug 539295), so now that I know what is happening it is easy to fix.
Assignee | ||
Comment 8•9 years ago
|
||
We don't really need to fix the existing Chromium IPC leaks before this lands. I can just suppression list my way to glory.
Assignee | ||
Comment 9•9 years ago
|
||
Attachment #8550291 -
Attachment is obsolete: true
Assignee | ||
Comment 10•9 years ago
|
||
This is closer to being ready. I filed a number of bugs for the leaks this testing uncovers (bug 1187424, bug 1187518, bug 1187421 and bug 1187410, in addition to the previously filed bug 1090570 and bug 1122045). Mostly I am adding suppressions for these leaks.
Beyond making sure that my tweaked version of the suppression file still works, I need to fix the permaorange in browser/base/content/test/social/browser_social_multiworker.js that my patch causes, which is some kind of timeout. This is already failing intermittently, as bug 964669. The test runs some tests in a child process, then uses a wonky looking timeout mechanism to wait for it to finish, and hits this timeout if it doesn't finish fast enough. Presumably I'm making shutdown take a lot longer. My plan is to just bump up the timeout...
Attachment #8638239 -
Attachment is obsolete: true
Assignee | ||
Comment 11•9 years ago
|
||
non-ASan try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9cbdc870e3a
Assignee | ||
Comment 12•9 years ago
|
||
There are a variety of ways that the parent and child process ensure that
the child process exits quickly in opt builds, but for AddressSanitizer
builds we want to let the child process to run to completion, so that we
can get a LeakSanitizer report.
This requires adding some addition LSan suppressions, because running
LSan in child processes detects some new leaks.
asan try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=124f60a5b19b
Assignee | ||
Updated•9 years ago
|
Attachment #8638831 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8639109 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 13•9 years ago
|
||
In addition to the green try run, I also did a run with an intentional child process only leak to see if we actually are producing results a reasonable amount of the time (there are additional leaks because this did not contain all of the suppressions in the attached patch).
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae1d658e6cb3
Attachment #8639109 -
Flags: review?(wmccloskey) → review+
Comment 14•9 years ago
|
||
Comment 15•9 years ago
|
||
Assignee | ||
Comment 16•9 years ago
|
||
There was Windows bustage. I accidentally used "and" instead of "&&" in ContentChild. Apparently they mean the same thing in C++, so my ASan try runs should be valid, but Visual Studio doesn't support "and" so it produced a warning because there was some extra junk it didn't understand at the end of the line.
Comment 17•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/260370115e37
https://hg.mozilla.org/mozilla-central/rev/f504cbcaefa9
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Assignee | ||
Updated•9 years ago
|
Whiteboard: [MemShrink]
Assignee | ||
Comment 18•9 years ago
|
||
This patch adds an intentional leak to ContentChild, which is very useful when investigating whether e10s LSan is working or not.
You need to log in
before you can comment on or make changes to this bug.
Description
•