Windows gamepad mochitests perma-fail with leaks, but do not turn the job orange

RESOLVED FIXED in Firefox 52

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: gbrown, Assigned: cleu)

Tracking

52 Branch
mozilla53
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox51 unaffected, firefox52+ fixed, firefox53+ fixed)

Details

Attachments

(2 attachments)

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-debug/1482159878/mozilla-central_win7_vm-debug_test-mochitest-3-bm129-tests1-windows-build81.txt.gz

09:27:00     INFO - TEST-OK | dom/tests/mochitest/gamepad/test_navigator_gamepads.html | took 192ms
09:27:00     INFO - ++DOMWINDOW == 50 (12002800) [pid = 3512] [serial = 50] [outer = 1B3D2C00]
09:27:00     INFO - [3512] WARNING: An event was posted to a thread that will never run it (rejected): file c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/xpcom/threads/nsThread.cpp, line 695
09:27:00     INFO - TEST-START | Shutdown
09:27:00     INFO - Passed:  46
09:27:00     INFO - Failed:  0
09:27:00     INFO - Todo:    0
09:27:00     INFO - Mode:    non-e10s
09:27:00     INFO - Slowest: 1628ms - /tests/dom/tests/mochitest/gamepad/test_check_timestamp.html
09:27:00     INFO - SimpleTest FINISHED
09:27:00     INFO - TEST-INFO | Ran 1 Loops
09:27:00     INFO - SimpleTest FINISHED
...
09:27:06     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 3512
09:27:06     INFO - 
09:27:06     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
09:27:06     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
09:27:06     INFO -    0 |TOTAL                                 |       20     3572| 1782633       86|
09:27:06     INFO -   84 |CancelableRunnable                    |       24      144|    3362        6|
09:27:06     INFO -  122 |CondVar                               |       24      312|     398       13|
09:27:06     INFO -  195 |DelayedRunnable                       |       72      432|      33        6|
09:27:06     INFO -  318 |IdlePeriod                            |       12       72|      66        6|
09:27:06     INFO -  409 |Mutex                                 |       20      140|    1614        7|
09:27:06     INFO -  552 |Runnable                              |       20      360|   11984       18|
09:27:06     INFO - 1281 |nsTArray_base                         |        4       48|  419425       12|
09:27:06     INFO - 1289 |nsThread                              |      200     1200|      65        6|
09:27:06     INFO - 1293 |nsTimer                               |       16       96|     989        6|
09:27:06     INFO - 1294 |nsTimerImpl                           |      128      768|     989        6|
09:27:06     INFO - 
09:27:06     INFO - nsTraceRefcnt::DumpStatistics: 1396 entries
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 CancelableRunnable
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 13 CondVar
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 DelayedRunnable
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 IdlePeriod
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 7 Mutex
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 18 Runnable
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 12 nsTArray_base
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 nsThread
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 nsTimer
09:27:06     INFO - TEST-INFO | leakcheck | default process: leaked 6 nsTimerImpl
09:27:06  WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: 3572 bytes leaked (CancelableRunnable, CondVar, DelayedRunnable, IdlePeriod, Mutex, ...)

...
09:33:41    ERROR - Return code: 1
09:33:41     INFO - TinderboxPrint: mochitest-plain-chunked<br/>1964/0/0
09:33:41  WARNING - # TBPL SUCCESS #
09:33:41  WARNING - The mochitest suite: plain-chunked ran with return status: SUCCESS
odd, we get a return code=1 and still get tbpl success- this might be in mozharness or some regex in treeherder?
See https://bugzilla.mozilla.org/show_bug.cgi?id=1323812#c8.

:jgraham -- You seem to be able to sort out these structured logging issues better than most. Can you take a look?
Flags: needinfo?(james)
We had a similar problem (bug 1321127). Is there anything we can do to prevent problems like this from happen in the future? Otherwise we should backout bug 1261199. Unreliable test results will bring a serious quality concern into our products.
The gamepad leak started with https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=313bd77f09009341a35b45bc75866a57f02907b9, making it a regression from bug 1318839 (or, less likely, bug 1312236).

:lenzak800 -- It looks like bug 1318839 caused a leak in gamepad tests, which has been overlooked for all this time because of a problem in our test infrastructure. We'll need those leaks fixed or your patch backed out before we fix the infrastructure problem.
Flags: needinfo?(cleu)
(You can see the leaks in logs for Windows Debug mochitest-3, following the gamepad tests, but the jobs are unexpectedly green.)
OK, backout them if needed, I will try to figure out how to fix it.
Flags: needinfo?(cleu)
So the problem here is that mochitest is still logging leaks using an warning message with the string TEST-UNEXPECTED-FAIL, but mozharness no longer parses that out of the log. So we need the same fix as in the reftest assert bug where we add this to the structured logs somehow. 

I think we should add a "leak" action to the structured logs with fields like type/count/url/docshellIds (with the latter being a (pid, id) tuple). Laternatively if we don't want all the information we could just add a generic "message" field. There is also some logging that should be upgraded from "warning" to "error" about failing to parse the line.

Testing this stuff is a little complex. We could perhaps write some tests that are known to leak/assert/etc. and then ensure that mozharness signals treeherder in the correct way. But we don't have any harness for writing that kind of test so it's not a trivial project. Might be a good outreachy idea, or similar.
Flags: needinfo?(james)
Then we should audit the string TEST-UNEXPECTED-FAIL in the tree?
https://dxr.mozilla.org/mozilla-central/search?q=TEST-UNEXPECTED-FAIL&redirect=false
Yes, but it's not as simple as "that string should no longer appear in the source tree (except in mozlog formatters)" since aiui not every test harness has moved over to structured logging. But it shouldn't appear in reftests or mochitests anymore.
Oh and anything logged as error or above should turn the job orange regardless of the string, so TEST-UNEXPECTED-FAIL is OK in those cases too.
(In reply to Masatoshi Kimura [:emk] from comment #4)
> We had a similar problem (bug 1321127). Is there anything we can do to
> prevent problems like this from happen in the future? Otherwise we should
> backout bug 1261199. Unreliable test results will bring a serious quality
> concern into our products.

This is a mochitest issue, bug 1261199 is for reftest so please don't back that out :). Also, you are assuming that there was not a "serious quality concern" prior to structured logging. When we switched to using the StructuredOutputParser, we probably discovered and fixed around 5-6 issues that were failing but not turning jobs orange. While there have been some unfortunate growing pains, structured logging has actually been a large improvement over its unstructured counterpart in terms of properly flagging errors.

That being said, yes, we should have better test suites for the test harnesses themselves to catch stuff like this. As jgraham mentioned, testing this kind of stuff can be tricky.
(In reply to Michael Leu[:lenzak800](UTC+8) from comment #7)
> OK, backout them if needed, I will try to figure out how to fix it.

You have a bit of time, no need to backout until there is a patch here ready to land which likely won't happen until sometime after the holidays.
(In reply to James Graham [:jgraham] from comment #8)
> I think we should add a "leak" action to the structured logs with fields
> like type/count/url/docshellIds (with the latter being a (pid, id) tuple).
> Laternatively if we don't want all the information we could just add a
> generic "message" field. There is also some logging that should be upgraded
> from "warning" to "error" about failing to parse the line.

I agree that this probably warrants a new action. But due to the holidays and the fact that getting said action right might take a bit of time, I propose we file a follow-up and close out this bug by turning that WARNING into an ERROR. This should at least make sure we don't get any new leaks while we figure out a more proper solution.
Hi Geoff, I have submitted a new patch for bug1318839 and pushed it to try server, I didn't see the warning anymore, is that mean it mitigates the leak?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e9e42191c9ac26f3fd8b7379c303932a9924d75&selectedJob=33304458
Flags: needinfo?(gbrown)
Yes, that fixed the leak - thanks!
Flags: needinfo?(gbrown)
[Tracking Requested - why for this release]: Leaks are bad

Are we sure this is not a problem on beta?  Bug 1318839 claims it's fixed on 51.
Flags: needinfo?(cleu)
This leak exists on aurora:
https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-aurora-win32-debug/1483024040/mozilla-aurora_win7_vm-debug_test-mochitest-3-bm129-tests1-windows-build1.txt.gz

But I don't see it on beta. The leak detection in mochitest regressed Oct. 26th, which is when 52 was central. So there shouldn't be any mochitest leaks on 51.
Attachment #8822534 - Flags: review+
Here is try
https://treeherder.mozilla.org/#/jobs?repo=try&revision=81bdf744578c382f62b91da5795ffa672a6ae7e8

Want to ensure nothing is wrong, so run it again with the rebased patch.
Keywords: checkin-needed
Component: General → DOM: Device Interfaces
Product: Testing → Core
Version: Version 3 → 52 Branch
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee66c0cbd058
Use nsITimer to trigger DirectInput polling instead of DelayedDispatch. r=qdot
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ee66c0cbd058
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Does this patch need to be nominated for 52?
Yes, this needs to land on 52 as well.
Duplicate of this bug: 1323812
Michael, please fill in the template for uplift to 52.  Thanks!
Flags: needinfo?(cleu)
This also still needs the whitelisting removed, as comment 22 said.
Comment on attachment 8822534 [details] [diff] [review]
Bug1324592 - Using nsITimer to trigger DirectInput polling instead of DelayedDispatch r=qdot

Review of attachment 8822534 [details] [diff] [review]:
-----------------------------------------------------------------

[Feature/regressing bug #]: Gamepad mochitest has memory leak upon gecko shutdown under Windows backend
[User impact if declined]: Leaks are bad
[Describe test coverage new/current, TBPL]: Hand test with physical gamepad and tryserver for checking memory leak
[Risks and why]: The risk is low, it uses nsITimer to substitude nsIThread->DelayedDispatch to do same behavior
[String/UUID change made/needed]: none
Attachment #8822534 - Flags: approval-mozilla-aurora?
Flags: needinfo?(cleu)
Comment on attachment 8824392 [details] [diff] [review]
gamepad_leak.patch

Review of attachment 8824392 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm, this should also land on 52 after the fix does.
Attachment #8824392 - Flags: review?(ahalberstadt) → review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e28310fb3e5
enable leak checking for gamepad mochitests. r=ahal
Comment on attachment 8822534 [details] [diff] [review]
Bug1324592 - Using nsITimer to trigger DirectInput polling instead of DelayedDispatch r=qdot

gamepad memory leak fix, aurora52+
Attachment #8822534 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This needed a lint fix because of some rebasing between m-c and aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/9a50a9d02d35491c28b60cca9ede974e7e4da200
(In reply to Phil Ringnalda (:philor) from comment #37)
> Backed out in
> https://hg.mozilla.org/releases/mozilla-aurora/rev/
> 27fec05e14c908974d37eb9529e2abb6057c3057 for Windows leaking in the gamepad
> directory,
> https://treeherder.mozilla.org/logviewer.html#?job_id=67475301&repo=mozilla-
> aurora
Flags: needinfo?(cleu)
That's quite strange, why same patch leak in only 52 instead of both 52 and 53....

I'll try to figure it out.
Flags: needinfo?(cleu)
(In reply to Michael Leu[:lenzak800](UTC+8) from comment #39)
> That's quite strange, why same patch leak in only 52 instead of both 52 and 53....

The problem is that the leak fix did not land on Aurora, only the un-whitelisting:
  https://hg.mozilla.org/releases/mozilla-aurora/rev/d1dd912cac5b

Both patches need to land there.
Keywords: checkin-needed
Whiteboard: [checkin-needed Aurora][land both patches]
Duplicate of this bug: 1336902
You need to log in before you can comment on or make changes to this bug.