Closed Bug 1067958 Opened 7 years ago Closed 7 years ago

Intermittent leakcheck | 16244 bytes leaked (AsyncPanZoomController, Composer2D, CompositorParent, CondVar, CrossProcessMutex, ...)

Categories

(Core :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbook, Assigned: mccr8)

References

()

Details

(Keywords: intermittent-failure, memory-leak)

Attachments

(1 file)

b2g_emulator_vm mozilla-inbound debug test mochitest-debug-9 on 2014-09-16 01:35:07 PDT for push 8b60403ad250

slave: tst-linux64-spot-847

https://tbpl.mozilla.org/php/getParsedLog.php?id=48175564&tree=Mozilla-Inbound



TEST-UNEXPECTED-FAIL | leakcheck | 16244 bytes leaked (AsyncPanZoomController, Composer2D, CompositorParent, CondVar, CrossProcessMutex, ...)
seems quite frequent but not permament somehow, andrew do you know what pushes us here on b2g ice debug over the cliff ?
Flags: needinfo?(continuation)
(In reply to Carsten Book [:Tomcat] from comment #2)
> seems quite frequent but not permament somehow, andrew do you know what
> pushes us here on b2g ice debug over the cliff ?

We have a set of known leaks on B2G, with a threshold of 5116 bytes.  The leaks here are 16244 bytes.  I looked at a number of the leaking classes involved, and none of them have changed in the last 5 days, so I'm not sure what is going on.  Probably the next step is to compare what is leaking in the failure run vs. not the failing run.  I can do that later today.

Kyle may have some ideas, too.
Flags: needinfo?(continuation) → needinfo?(khuey)
We're leaking a bunch of AZPC stuff.  This started recently and seems to be high frequency.  Can we identify some candidates for backout?
Flags: needinfo?(khuey) → needinfo?(ryanvm)
This is mochitest-9 only if that's any help.
Component: General → Panning and Zooming
Component: Panning and Zooming → General
I compared a run that exceeds the leak threshold with one that does not.  They both have the exact same APZC leaks, so that's not the issue.  In fact, the "non-leaking run" leaks _strictly more_ objects.  Specifically:

09:49:47     INFO -  TEST-INFO | leakcheck | leaked 1 Mutex (20 bytes)
09:49:47     INFO -  TEST-INFO | leakcheck | leaked 1 ReentrantMonitor (24 bytes)
09:49:47     INFO -  TEST-INFO | leakcheck | leaked 1 XPTInterfaceInfoManager (132 bytes)
09:49:47     INFO -  TEST-INFO | leakcheck | leaked 1 nsStringBuffer (8 bytes)
09:49:47     INFO -  TEST-INFO | leakcheck | leaked 7 nsTArray_base (28 bytes)
09:49:47     INFO -  TEST-INFO | leakcheck | leaked 1 xptiWorkingSet (96 bytes)

So, maybe some of those are being computed to have negative size or something?

When we do leak those objects, and the total size is 308 bytes.  When we don't leak those objects, the total size is 16244 bytes.
Huh, that set of objects I just posted is from a second BloatView for another default process.  So that's odd...
My hunch about what is going on here is that in our "non-leaking" runs we have two default processes, and the code that computes the size of the leak is somehow only looking at the size of things from the second bloat view.  In the "leaking runs", we don't have that second smaller leak, so we see the real leak.
With some further digging, it appears that leak checking was just broken on B2G M9 (and only M9, I looked at every other one of M1 to M15), and that push probably just fixed it.  M9 previous had:
  WARNING | leakcheck | missing output line for total leaks!
Flags: needinfo?(ryanvm)
The middle push ( https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Emulator&rev=c0fcbcc8b789 ) has the weird behavior, it just got the double-default process so the leak didn't show up.

JW Wang's try run in bug 962871 exhibits the odd double default behavior, so I think that is what is causing this.

(For the rest: Milan's patch has a B2G try push, and it has the "missing output" line, so that's not the problem.  Jason's patch does not have a try run.  Michael's patch has a try run that does not exhibit the behavior.)

Our choices are to:
1. Back out bug 962871, but then we'll probably continue to have the larger leak in M9, we just won't see it.
2. Increase the leak threshold to 16kb.  This is crummy because we only need it for M9.

Issues in this bug:
1. The tree does not go orange when a test fails to produce a log (bug 831223).  I think I should land a version of this patch that does that check for the default (and maybe plugin process) so we at least won't break this again.
2. M9 is leaking APZC stuff.  A bug needs to be filed on that, blocking bug 1038943.
3. When we have multiple default processes, one leak log ends up "shadowing" the other.  I'm not sure what is going on there, but I think they must be dumping into the same leak log somehow.  We need to fix whatever is causing this, and add some checks so the tree goes orange if it happens again.
4. It would be good if we could set per-suite leak thresholds, though that would cause mysterious leaks if whatever test is causing the leak shifts from one suite to another, so maybe that's not a good idea.
Assignee: nobody → continuation
Comment on attachment 8490305 [details] [diff] [review]
Increase B2G leak threshold to paper over newly visible M9 leaks.

Review of attachment 8490305 [details] [diff] [review]:
-----------------------------------------------------------------

This lets us leak another 11K everywhere else.
Attachment #8490305 - Flags: review?(khuey) → review-
Blocks: 962871
I filed:
  - bug 1068285 for investigating why we have multiple default processes
  - bug 1068280 for dealing with shadowing in the leak reporting code
  - bug 1068276 for failing when a default or plugin processs fails to report a leak
  - bug 1068268 for the APZC leak
  - bug 1068289 for allowing per-test-chunk leak thresholds
Fixed by backout of bug 962871.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Filed bug 1068869 for not dumping the leak logs for multiple default processes into the same log.
You need to log in before you can comment on or make changes to this bug.