Closed Bug 1324592 Opened 8 years ago Closed 8 years ago

Windows gamepad mochitests perma-fail with leaks, but do not turn the job orange

Categories

(Core :: DOM: Device Interfaces, defect)

52 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox51 --- unaffected
firefox52 + fixed
firefox53 + fixed

People

(Reporter: gbrown, Assigned: cleu)

References

Details

Attachments

(2 files)

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-debug/1482159878/mozilla-central_win7_vm-debug_test-mochitest-3-bm129-tests1-windows-build81.txt.gz 09:27:00 INFO - TEST-OK | dom/tests/mochitest/gamepad/test_navigator_gamepads.html | took 192ms 09:27:00 INFO - ++DOMWINDOW == 50 (12002800) [pid = 3512] [serial = 50] [outer = 1B3D2C00] 09:27:00 INFO - [3512] WARNING: An event was posted to a thread that will never run it (rejected): file c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/xpcom/threads/nsThread.cpp, line 695 09:27:00 INFO - TEST-START | Shutdown 09:27:00 INFO - Passed: 46 09:27:00 INFO - Failed: 0 09:27:00 INFO - Todo: 0 09:27:00 INFO - Mode: non-e10s 09:27:00 INFO - Slowest: 1628ms - /tests/dom/tests/mochitest/gamepad/test_check_timestamp.html 09:27:00 INFO - SimpleTest FINISHED 09:27:00 INFO - TEST-INFO | Ran 1 Loops 09:27:00 INFO - SimpleTest FINISHED ... 09:27:06 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 3512 09:27:06 INFO - 09:27:06 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 09:27:06 INFO - | | Per-Inst Leaked| Total Rem| 09:27:06 INFO - 0 |TOTAL | 20 3572| 1782633 86| 09:27:06 INFO - 84 |CancelableRunnable | 24 144| 3362 6| 09:27:06 INFO - 122 |CondVar | 24 312| 398 13| 09:27:06 INFO - 195 |DelayedRunnable | 72 432| 33 6| 09:27:06 INFO - 318 |IdlePeriod | 12 72| 66 6| 09:27:06 INFO - 409 |Mutex | 20 140| 1614 7| 09:27:06 INFO - 552 |Runnable | 20 360| 11984 18| 09:27:06 INFO - 1281 |nsTArray_base | 4 48| 419425 12| 09:27:06 INFO - 1289 |nsThread | 200 1200| 65 6| 09:27:06 INFO - 1293 |nsTimer | 16 96| 989 6| 09:27:06 INFO - 1294 |nsTimerImpl | 128 768| 989 6| 09:27:06 INFO - 09:27:06 INFO - nsTraceRefcnt::DumpStatistics: 1396 entries 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 CancelableRunnable 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 13 CondVar 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 DelayedRunnable 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 IdlePeriod 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 7 Mutex 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 18 Runnable 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 12 nsTArray_base 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 nsThread 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 nsTimer 09:27:06 INFO - TEST-INFO | leakcheck | default process: leaked 6 nsTimerImpl 09:27:06 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: 3572 bytes leaked (CancelableRunnable, CondVar, DelayedRunnable, IdlePeriod, Mutex, ...) ... 09:33:41 ERROR - Return code: 1 09:33:41 INFO - TinderboxPrint: mochitest-plain-chunked<br/>1964/0/0 09:33:41 WARNING - # TBPL SUCCESS # 09:33:41 WARNING - The mochitest suite: plain-chunked ran with return status: SUCCESS
odd, we get a return code=1 and still get tbpl success- this might be in mozharness or some regex in treeherder?
See https://bugzilla.mozilla.org/show_bug.cgi?id=1323812#c8. :jgraham -- You seem to be able to sort out these structured logging issues better than most. Can you take a look?
Flags: needinfo?(james)
We had a similar problem (bug 1321127). Is there anything we can do to prevent problems like this from happen in the future? Otherwise we should backout bug 1261199. Unreliable test results will bring a serious quality concern into our products.
The gamepad leak started with https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=313bd77f09009341a35b45bc75866a57f02907b9, making it a regression from bug 1318839 (or, less likely, bug 1312236). :lenzak800 -- It looks like bug 1318839 caused a leak in gamepad tests, which has been overlooked for all this time because of a problem in our test infrastructure. We'll need those leaks fixed or your patch backed out before we fix the infrastructure problem.
Flags: needinfo?(cleu)
(You can see the leaks in logs for Windows Debug mochitest-3, following the gamepad tests, but the jobs are unexpectedly green.)
OK, backout them if needed, I will try to figure out how to fix it.
Flags: needinfo?(cleu)
So the problem here is that mochitest is still logging leaks using an warning message with the string TEST-UNEXPECTED-FAIL, but mozharness no longer parses that out of the log. So we need the same fix as in the reftest assert bug where we add this to the structured logs somehow. I think we should add a "leak" action to the structured logs with fields like type/count/url/docshellIds (with the latter being a (pid, id) tuple). Laternatively if we don't want all the information we could just add a generic "message" field. There is also some logging that should be upgraded from "warning" to "error" about failing to parse the line. Testing this stuff is a little complex. We could perhaps write some tests that are known to leak/assert/etc. and then ensure that mozharness signals treeherder in the correct way. But we don't have any harness for writing that kind of test so it's not a trivial project. Might be a good outreachy idea, or similar.
Flags: needinfo?(james)
Then we should audit the string TEST-UNEXPECTED-FAIL in the tree? https://dxr.mozilla.org/mozilla-central/search?q=TEST-UNEXPECTED-FAIL&redirect=false
Yes, but it's not as simple as "that string should no longer appear in the source tree (except in mozlog formatters)" since aiui not every test harness has moved over to structured logging. But it shouldn't appear in reftests or mochitests anymore.
Oh and anything logged as error or above should turn the job orange regardless of the string, so TEST-UNEXPECTED-FAIL is OK in those cases too.
(In reply to Masatoshi Kimura [:emk] from comment #4) > We had a similar problem (bug 1321127). Is there anything we can do to > prevent problems like this from happen in the future? Otherwise we should > backout bug 1261199. Unreliable test results will bring a serious quality > concern into our products. This is a mochitest issue, bug 1261199 is for reftest so please don't back that out :). Also, you are assuming that there was not a "serious quality concern" prior to structured logging. When we switched to using the StructuredOutputParser, we probably discovered and fixed around 5-6 issues that were failing but not turning jobs orange. While there have been some unfortunate growing pains, structured logging has actually been a large improvement over its unstructured counterpart in terms of properly flagging errors. That being said, yes, we should have better test suites for the test harnesses themselves to catch stuff like this. As jgraham mentioned, testing this kind of stuff can be tricky.
(In reply to Michael Leu[:lenzak800](UTC+8) from comment #7) > OK, backout them if needed, I will try to figure out how to fix it. You have a bit of time, no need to backout until there is a patch here ready to land which likely won't happen until sometime after the holidays.
(In reply to James Graham [:jgraham] from comment #8) > I think we should add a "leak" action to the structured logs with fields > like type/count/url/docshellIds (with the latter being a (pid, id) tuple). > Laternatively if we don't want all the information we could just add a > generic "message" field. There is also some logging that should be upgraded > from "warning" to "error" about failing to parse the line. I agree that this probably warrants a new action. But due to the holidays and the fact that getting said action right might take a bit of time, I propose we file a follow-up and close out this bug by turning that WARNING into an ERROR. This should at least make sure we don't get any new leaks while we figure out a more proper solution.
Hi Geoff, I have submitted a new patch for bug1318839 and pushed it to try server, I didn't see the warning anymore, is that mean it mitigates the leak? https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e9e42191c9ac26f3fd8b7379c303932a9924d75&selectedJob=33304458
Flags: needinfo?(gbrown)
Yes, that fixed the leak - thanks!
Flags: needinfo?(gbrown)
[Tracking Requested - why for this release]: Leaks are bad Are we sure this is not a problem on beta? Bug 1318839 claims it's fixed on 51.
Flags: needinfo?(cleu)
This leak exists on aurora: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-aurora-win32-debug/1483024040/mozilla-aurora_win7_vm-debug_test-mochitest-3-bm129-tests1-windows-build1.txt.gz But I don't see it on beta. The leak detection in mochitest regressed Oct. 26th, which is when 52 was central. So there shouldn't be any mochitest leaks on 51.
Attachment #8822534 - Flags: review+
Here is try https://treeherder.mozilla.org/#/jobs?repo=try&revision=81bdf744578c382f62b91da5795ffa672a6ae7e8 Want to ensure nothing is wrong, so run it again with the rebased patch.
Keywords: checkin-needed
Component: General → DOM: Device Interfaces
Product: Testing → Core
Version: Version 3 → 52 Branch
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ee66c0cbd058 Use nsITimer to trigger DirectInput polling instead of DelayedDispatch. r=qdot
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Does this patch need to be nominated for 52?
Yes, this needs to land on 52 as well.
Michael, please fill in the template for uplift to 52. Thanks!
Flags: needinfo?(cleu)
This also still needs the whitelisting removed, as comment 22 said.
Comment on attachment 8822534 [details] [diff] [review] Bug1324592 - Using nsITimer to trigger DirectInput polling instead of DelayedDispatch r=qdot Review of attachment 8822534 [details] [diff] [review]: ----------------------------------------------------------------- [Feature/regressing bug #]: Gamepad mochitest has memory leak upon gecko shutdown under Windows backend [User impact if declined]: Leaks are bad [Describe test coverage new/current, TBPL]: Hand test with physical gamepad and tryserver for checking memory leak [Risks and why]: The risk is low, it uses nsITimer to substitude nsIThread->DelayedDispatch to do same behavior [String/UUID change made/needed]: none
Attachment #8822534 - Flags: approval-mozilla-aurora?
Flags: needinfo?(cleu)
Comment on attachment 8824392 [details] [diff] [review] gamepad_leak.patch Review of attachment 8824392 [details] [diff] [review]: ----------------------------------------------------------------- Lgtm, this should also land on 52 after the fix does.
Attachment #8824392 - Flags: review?(ahalberstadt) → review+
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1e28310fb3e5 enable leak checking for gamepad mochitests. r=ahal
Comment on attachment 8822534 [details] [diff] [review] Bug1324592 - Using nsITimer to trigger DirectInput polling instead of DelayedDispatch r=qdot gamepad memory leak fix, aurora52+
Attachment #8822534 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Phil Ringnalda (:philor) from comment #37) > Backed out in > https://hg.mozilla.org/releases/mozilla-aurora/rev/ > 27fec05e14c908974d37eb9529e2abb6057c3057 for Windows leaking in the gamepad > directory, > https://treeherder.mozilla.org/logviewer.html#?job_id=67475301&repo=mozilla- > aurora
Flags: needinfo?(cleu)
That's quite strange, why same patch leak in only 52 instead of both 52 and 53.... I'll try to figure it out.
Flags: needinfo?(cleu)
(In reply to Michael Leu[:lenzak800](UTC+8) from comment #39) > That's quite strange, why same patch leak in only 52 instead of both 52 and 53.... The problem is that the leak fix did not land on Aurora, only the un-whitelisting: https://hg.mozilla.org/releases/mozilla-aurora/rev/d1dd912cac5b Both patches need to land there.
Keywords: checkin-needed
Whiteboard: [checkin-needed Aurora][land both patches]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: