Closed Bug 1089816 Opened 10 years ago Closed 9 years ago

Get LeakSanitizer working on e10s

Tracking

(e10s+, firefox42 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla42

Tracking Flags:

Tracking

Status

e10s

---

firefox42

---

fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink])

Attachments

(3 files, 3 obsolete files)

LSan output 10 years ago Andrew McCreight [:mccr8] 35.10 KB, text/plain		Details
Don't exit early during shutdown in ASan content processes. 10 years ago Andrew McCreight [:mccr8] 2.04 KB, patch		Details \| Diff \| Splinter Review
Let ASan content processes run to completion. WIP. 9 years ago Andrew McCreight [:mccr8] 4.96 KB, patch		Details \| Diff \| Splinter Review
Let ASan content processes run to completion. 9 years ago Andrew McCreight [:mccr8] 5.79 KB, patch		Details \| Diff \| Splinter Review
Let ASan content processes run to completion. 9 years ago Andrew McCreight [:mccr8] 5.91 KB, patch	billm : review+	Details \| Diff \| Splinter Review
Add intentional contentchild leak. NOT FOR LANDING. 9 years ago Andrew McCreight [:mccr8] 1.12 KB, patch		Details \| Diff \| Splinter Review

Andrew McCreight [:mccr8]

Assignee

Description

•

10 years ago

Our ASan builds, which run LSan, are opt builds, so they may call exit(0) in the child process, and not go through the regular shutdown. This probably means we're not running LSan in the child process. LSan mostly detects leaks of buffers of stuff, which can be particularly important for media code. We probably don't want to bother with this until the regular leak checking is working, or we'll get a lot of failures.

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

tracking-e10s: --- → ?

Summary: Figure out if LSan is working on e10s → Figure out if LeakSanitizer is working on e10s

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: ? → +

Andrew McCreight [:mccr8]

Assignee

Comment 1

•

10 years ago

I did a hacky try run that removes the QuickExit() from ContentChild::ActorDestroy entirely, and it looks like we are in fact not running LSan on the child process right now. That will be easy enough to fix, by changing the #ifndef DEBUG to be something that doesn't apply to ASAN builds either. The problem will just be fixing up stuff so that it is green. https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=7965a2cdeb12 Ideally, we'd also have some kind of check that all processes produce LSan reports, like we do for regular leak logs, but that's probably something for a separate bug. I filed bug 1094988 for that, but I think it isn't too critical.

Summary: Figure out if LeakSanitizer is working on e10s → Get LeakSanitizer working on e10s

Andrew McCreight [:mccr8]

Assignee

Comment 2

•

10 years ago

Attached file LSan output — Details

This is the leak log I get when I run this: ./mach mochitest-plain --e10s --this-chunk=20 --total-chunks=100

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 3

•

10 years ago

A lot of those look like the same thing as bug 1116261. I suspect that we could clean a lot of this up by fixing bug 1090570.

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #3) > A lot of those look like the same thing as bug 1116261. Yeah, I figured that looking at LSan output would be easier than trying to instrument all of chromium IPC with annotations. > I suspect that we could clean a lot of this up by fixing bug 1090570. Ok, interesting.

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Depends on: 1090570

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Depends on: 1122045

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

No longer depends on: 1051230

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Assignee

Comment 5

•

10 years ago

Attached patch Don't exit early during shutdown in ASan content processes. (obsolete) — Details — Splinter Review

This is the minimum patch needed, and it works locally for short Mochitest runs. Unfortunately, it does not produce a LeakSanitizer log on try. Presumably we're silently exiting somewhere else. I guess I'll have to add a bunch of printfs and try to figure out where that is failing.

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

10 years ago

I haven't figured out anything here, but I should try disabling sandboxing and see if that helps.

Andrew McCreight [:mccr8]

Assignee

Comment 7

•

9 years ago

Ok, so there's actually no sandboxing. The issue seems to be how ContentParent kills off the child process when it takes too long. It turns out this was an issue back in 2010 for XPCOM shutdown leak checking (bug 539295), so now that I know what is happening it is easy to fix.

Andrew McCreight [:mccr8]

Assignee

Comment 8

•

9 years ago

We don't really need to fix the existing Chromium IPC leaks before this lands. I can just suppression list my way to glory.

No longer depends on: 1090570, 1122045

Andrew McCreight [:mccr8]

Assignee

Comment 9

•

9 years ago

Attached patch Let ASan content processes run to completion. WIP. (obsolete) — Details — Splinter Review

Attachment #8550291 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

9 years ago

Depends on: 1187410

Andrew McCreight [:mccr8]

Assignee

Comment 10

•

9 years ago

Attached patch Let ASan content processes run to completion. (obsolete) — Details — Splinter Review

This is closer to being ready. I filed a number of bugs for the leaks this testing uncovers (bug 1187424, bug 1187518, bug 1187421 and bug 1187410, in addition to the previously filed bug 1090570 and bug 1122045). Mostly I am adding suppressions for these leaks. Beyond making sure that my tweaked version of the suppression file still works, I need to fix the permaorange in browser/base/content/test/social/browser_social_multiworker.js that my patch causes, which is some kind of timeout. This is already failing intermittently, as bug 964669. The test runs some tests in a child process, then uses a wonky looking timeout mechanism to wait for it to finish, and hits this timeout if it doesn't finish fast enough. Presumably I'm making shutdown take a lot longer. My plan is to just bump up the timeout...

Attachment #8638239 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

9 years ago

Depends on: 964669

Andrew McCreight [:mccr8]

Assignee

Comment 11

•

9 years ago

non-ASan try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9cbdc870e3a

Andrew McCreight [:mccr8]

Assignee

Comment 12

•

9 years ago

Attached patch Let ASan content processes run to completion. — Details — Splinter Review

There are a variety of ways that the parent and child process ensure that the child process exits quickly in opt builds, but for AddressSanitizer builds we want to let the child process to run to completion, so that we can get a LeakSanitizer report. This requires adding some addition LSan suppressions, because running LSan in child processes detects some new leaks. asan try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=124f60a5b19b

Andrew McCreight [:mccr8]

Assignee

Updated

•

9 years ago

Attachment #8638831 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

9 years ago

Attachment #8639109 - Flags: review?(wmccloskey)

Andrew McCreight [:mccr8]

Assignee

Comment 13

•

9 years ago

In addition to the green try run, I also did a run with an intentional child process only leak to see if we actually are producing results a reasonable amount of the time (there are additional leaks because this did not contain all of the suppressions in the attached patch). https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae1d658e6cb3

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

9 years ago

Attachment #8639109 - Flags: review?(wmccloskey) → review+

Pulsebot

Comment 14

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/260370115e37

Pulsebot

Comment 15

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f504cbcaefa9

Andrew McCreight [:mccr8]

Assignee

Comment 16

•

9 years ago

There was Windows bustage. I accidentally used "and" instead of "&&" in ContentChild. Apparently they mean the same thing in C++, so my ASan try runs should be valid, but Visual Studio doesn't support "and" so it produced a warning because there was some extra junk it didn't understand at the end of the line.

Carsten Book [:Tomcat]

Comment 17

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/260370115e37 https://hg.mozilla.org/mozilla-central/rev/f504cbcaefa9

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Andrew McCreight [:mccr8]

Assignee

Updated

•

9 years ago

Whiteboard: [MemShrink]

Andrew McCreight [:mccr8]

Assignee

Comment 18

•

9 years ago

Attached patch Add intentional contentchild leak. NOT FOR LANDING. — Details — Splinter Review

This patch adds an intentional leak to ContentChild, which is very useful when investigating whether e10s LSan is working or not.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

9 years ago

Depends on: 1215914

You need to log in before you can comment on or make changes to this bug.