Closed Bug 1398563 Opened 4 years ago Closed 2 years ago
Intermittent leakcheck | tab process: 306380 bytes leaked (APZEvent
State, Active Element Manager, Async Latency Logger, Backstage Pass, CSPService, ...)
Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=129811318&repo=autoland https://mozilla-releng-blobs.s3.amazonaws.com/blobs/autoland/sha512/1b45816ae54e5e6ecfecd07e7fe30f8a6518f49ac23c8e5addae0d2fe555776c641c4e94a9a9687b70f3cc92bd500b02079da155b23cd41672e0f9e1fc5584ae
There have been 34 failures in the last 7 days and it has started occurring more frequently since 24 October. Most of the failures are on Windows 7 and Windows 10 x64, but there have also been some occurrences on Linux. All of them occurred on debug builds. The failures appear on the following test suites: mochitest-clipboard-e10s and mochitest-webgl-e10s Here's an example of a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=139828088&lineNumber=9906 And a snippet of the test error: 11:45:19 INFO - TEST-INFO | leakcheck | tab process: leaked 13 xpc::CompartmentPrivate 9904 11:45:19 INFO - TEST-INFO | leakcheck | tab process: leaked 2 xpcJSWeakReference 9905 11:45:19 INFO - TEST-INFO | leakcheck | tab process: leaked 70 xptiInterfaceInfo 9906 11:45:19 ERROR - 791 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: 930064 bytes leaked (APZEventState, ActiveElementManager, AsyncLatencyLogger, BackstagePass, CSPService, ...) 9907 11:45:19 INFO - runtests.py | Running tests: end. 9908 11:45:20 INFO - Buffered messages finished 9909 11:45:20 INFO - 0 INFO TEST-START | Shutdown 9910 11:45:20 INFO - 1 INFO Passed: 86258 9911 11:45:20 INFO - 2 INFO Failed: 0 9912 11:45:20 INFO - 3 INFO Todo: 0 9913 11:45:20 INFO - 4 INFO Mode: e10s 9914 11:45:20 INFO - 5 INFO SimpleTest FINISHED 9915 11:45:20 INFO - Buffered messages finished 9916 11:45:20 INFO - SUITE-END | took 1225s 9917 11:45:20 INFO - Return code: 0 9918 11:45:20 INFO - TinderboxPrint: mochitest-mochitest-gl<br/>86258/0/0 9919 11:45:20 ERROR - # TBPL FAILURE # 9920 11:45:20 WARNING - setting return code to 2 9921 11:45:20 ERROR - The mochitest suite: mochitest-gl ran with return status: FAILURE 9922 11:45:20 INFO - Running post-action listener: _package_coverage_data 9923 11:45:20 INFO - Running post-action listener: _resource_record_post_action :selena, Could you please take a look?
Whiteboard: [stockwell needswork]
Milan -- I'm seeing "(APZEventState, ActiveElementManager, AsyncLatencyLogger, BackstagePass, CSPService, ...)" in most of these leak reports. Can you investigate?
Flags: needinfo?(sdeckelmann) → needinfo?(milan)
4 years ago
Flags: needinfo?(milan) → needinfo?(bugmail)
(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #8) > Milan -- I'm seeing "(APZEventState, ActiveElementManager, > AsyncLatencyLogger, BackstagePass, CSPService, ...)" in most of these leak > reports. Can you investigate? This is because the leaked things are listed in alphabetical order. I can fix this by renaming the classes to start with "Z" :) I looked at a couple of the reports and it does look like a legitimate leak where an entire TabChild and everything hanging off it gets leaked. :mccr8 do we have a mechanism to debug these on try without local reproduction?
Flags: needinfo?(bugmail) → needinfo?(continuation)
Yeah, I need to come up with some mechanism to report a better error when we leak an nsGlobalWindow and everything that hangs off of them. The leakcheck output seen in TreeHerder is useless in that case. As noted in comment 2, these are happening in two different test suites, mochitest-webgl-e10s and mochitest-clipboard-e10s. I think there are two different leaks involved. I have a very simple script ( https://github.com/amccreight/mochitest-logs/blob/master/plusplus.py ) that analyzes the ++DOMWINDOW and --DOMWINDOWs in the log to figure out which windows aren't cleaned up. The output of this script is a number of lines that look like this: [pid = 5984] [serial = 1] You can then go search in the log for ++DOMWINDOW lines match that that do not have a --DOMWINDOW after it, then see where in the log that is to figure out which test it is happening during. I looked at two logs in each of the two test suites that are failing. For the WebGL test suite, the leaked windows are being created during this test: dom/canvas/test/webgl-conf/generated/test_2_conformance2__misc__expando-loss-2.html Expandos are a place where we have to deal with tricky memory management, so perhaps that is related to why it is leaking? This test has not changed recently, but some other tests were disabled in the directory recently. Comment 2 said that this started happening more frequently on Oct 24, which is when bug 1410306 landed. If we were timing out before then, maybe we wouldn't have gotten leak reports, or something. For the clipboard test suite, the leaked windows are being created during this test: dom/events/test/test_bug1327798.html This test was changed in bug 1199729, but in early September so maybe this isn't the cause of the increase on October 24.
The WebGL failures seem to have stopped on Oct 26th, which is why the overall rate has dropped so much. I have no idea why that would be. Bug 1371474 shows a similar pattern.
While trying to land bug 1415692, my patch there caused this leak to be much more prevalent. It seems like the problem there is related to Places as the only code change was calling a Places method during browser startup. Maybe that will help someone narrow down the exact cause.
See Also: → 1415692
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Status: REOPENED → RESOLVED
Closed: 4 years ago → 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.