Closed Bug 1458940 Opened 8 years ago Closed 7 years ago

Intermittent dom/serviceworkers/test/browser_storage_permission.js | leaked 2 window(s) until shutdown [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission]

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell unknown]DWS_NEXT)

Hi! Did some retriggers to find where this fail originated. Worked on this range http://tinyurl.com/yb3gb22m After seeing the results I think this push might be the culprit http://tinyurl.com/yddm2k2r with this changeset in particular I suppose https://hg.mozilla.org/integration/mozilla-inbound/rev/c86679a9bf8360dd95790c63768a70913467c46a This so far fails only on windows10-64 debug with a failure rate of 3-4 fails from 40 jobs. Ben, could you please take a look at this? Thank you.
Flags: needinfo?(bkelly)
Whiteboard: [retriggered]
From the log: 01:44:52 INFO - GECKO(2972) | Completed ShutdownLeaks collections in process 2092 01:44:52 INFO - GECKO(2972) | --DOMWINDOW == 1 (000002338ACC4400) [pid = 2092] [serial = 40] [outer = 0000000000000000] [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission] 01:44:55 INFO - GECKO(2972) | --DOMWINDOW == 0 (000002338ACBB000) [pid = 2092] [serial = 42] [outer = 0000000000000000] [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission] 01:48:01 ERROR - 699 ERROR TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/browser_storage_permission.js | leaked 2 window(s) until shutdown [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission] 01:48:01 INFO - TEST-INFO | dom/serviceworkers/test/browser_storage_permission.js | windows(s) leaked: [pid = 2092] [serial = 42], [pid = 2092] [serial = 40] This shows that the windows are in fact deallocated, but just after the leak checking runs. So this seems like a timing issue to me. I'll try adding a GC to the cleanup of the test on Monday.
There is a real leak here. I can reproduce locally with --verify. I wonder if this would get any better with bug 1451381 fixed.
Depends on: 1451381
Flags: needinfo?(bkelly)
Priority: -- → P2
Over the last 7 days there are 31 failures on this bug. These happen on windows7-32 and windows10-64 Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=181255393&repo=autoland&lineNumber=17682 01:00:33 INFO - WARNING | IO Completion Port failed to signal process shutdown 01:00:33 INFO - Parent process 4648 exited with children alive: 01:00:33 INFO - PIDS: 5180 01:00:33 INFO - Attempting to kill them, but no guarantee of success 01:00:33 INFO - TEST-INFO | Main app process: exit 0 01:00:33 ERROR - 910 ERROR TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/browser_storage_permission.js | leaked 2 window(s) until shutdown [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission]
Flags: needinfo?(mdaly)
I had hoped that the dependent worker leak bug would have made some progress by now given bug 1451381 comment 9.
(In reply to Stefan Hindli [:stefan_hindli] from comment #9) > Over the last 7 days there are 31 failures on this bug. These happen on > windows7-32 and windows10-64 > > Here is the most recent log example: > https://treeherder.mozilla.org/logviewer. > html#?job_id=181255393&repo=autoland&lineNumber=17682 > > 01:00:33 INFO - WARNING | IO Completion Port failed to signal process > shutdown > 01:00:33 INFO - Parent process 4648 exited with children alive: > 01:00:33 INFO - PIDS: 5180 > 01:00:33 INFO - Attempting to kill them, but no guarantee of success > 01:00:33 INFO - TEST-INFO | Main app process: exit 0 > 01:00:33 ERROR - 910 ERROR TEST-UNEXPECTED-FAIL | > dom/serviceworkers/test/browser_storage_permission.js | leaked 2 window(s) > until shutdown [url = > http://mochi.test:8888/browser/dom/serviceworkers/test/empty. > html?storage_permission] Noted. Thank you.
Flags: needinfo?(mdaly)
Hi mdaly: Have you had a chance to take a look at this bug?
Flags: needinfo?(mdaly)
(In reply to Dorel Luca [:dluca] from comment #13) > Hi mdaly: Have you had a chance to take a look at this bug? WE haven't yet, it's in our backlog though.
Flags: needinfo?(mdaly)
Update: In the last 7 days, there have been 32 failures. All of them occur on the debug build type and on the following platforms: - windows10-64 - 24 - windows7-32 - 8 Here is a recent relevant log file and a snippet with the failure: https://treeherder.mozilla.org/logviewer.html#?job_id=185207748&repo=mozilla-inbound&lineNumber=15839 17:29:57 INFO - GECKO(7784) | => mAllocCount: 118761 17:29:57 INFO - GECKO(7784) | => mReallocCount: 10329 17:29:57 INFO - GECKO(7784) | => mFreeCount: 118761 17:29:57 INFO - GECKO(7784) | => mShareCount: 133013 17:29:57 INFO - GECKO(7784) | => mAdoptCount: 3989 17:29:57 INFO - GECKO(7784) | => mAdoptFreeCount: 4230 17:29:57 INFO - GECKO(7784) | => Process ID: 7784, Thread ID: 2880 17:33:02 INFO - WARNING | IO Completion Port failed to signal process shutdown 17:33:02 INFO - Parent process 7784 exited with children alive: 17:33:02 INFO - PIDS: 4484 17:33:02 INFO - Attempting to kill them, but no guarantee of success 17:33:02 INFO - TEST-INFO | Main app process: exit 0 17:33:02 ERROR - 776 ERROR TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/browser_storage_permission.js | leaked 2 window(s) until shutdown [url = http://mochi.test:8888/browser/dom/serviceworkers/test/empty.html?storage_permission] 17:33:02 INFO - TEST-INFO | dom/serviceworkers/test/browser_storage_permission.js | windows(s) leaked: [pid = 2204] [serial = 40], [pid = 2204] [serial = 42] 17:33:02 INFO - runtests.py | Application ran for: 0:03:26.580000 17:33:02 INFO - zombiecheck | Reading PID log: c:\users\task_1530119374\appdata\local\temp\tmppwr6cwpidlog
Update: this has a total of 31 failures in the last 7 days, 11 on windows10-64 debug and 9 on windows7-32 debug. :bkelly, :mdaly is there any progress on this? Thanks.
Flags: needinfo?(mdaly)
Flags: needinfo?(ben)
The dependant bug isn't fixed yet. I thought Andrea was working on that. In any case, I have no plans to work on this. Sorry.
Flags: needinfo?(ben)
(In reply to Andreea Pavel [:apavel] from comment #21) > Update: this has a total of 31 failures in the last 7 days, 11 on > windows10-64 debug and 9 on windows7-32 debug. > :bkelly, :mdaly is there any progress on this? > > Thanks. No progress at this moment, this is a medium-high priority bug for us. Putting on Blake's radar.
Flags: needinfo?(mdaly) → needinfo?(mrbkap)
(In reply to Marion Daly [:mdaly] from comment #24) > (In reply to Andreea Pavel [:apavel] from comment #21) > > Update: this has a total of 31 failures in the last 7 days, 11 on > > windows10-64 debug and 9 on windows7-32 debug. > > :bkelly, :mdaly is there any progress on this? > > > > Thanks. > > No progress at this moment, this is a medium-high priority bug for us. > Putting on Blake's radar. Great to hear that, thanks for answering.
There have been 36 failures in the last 7 days. :bkelly, :mdaly are there any updates?
Flags: needinfo?(mdaly)
Flags: needinfo?(ben)
(In reply to Cristina Coroiu [:ccoroiu] from comment #27) > There have been 36 failures in the last 7 days. > :bkelly, :mdaly are there any updates? We haven't had a chance to address this yet.
Flags: needinfo?(mdaly)
Flags: needinfo?(ben)
:mdaly purely informative, most likely you will be asked again about the progress here, as if this bug continues to appear and has a failure rate above 30 failures in 7 days, we need to ask for an update.
(In reply to Andreea Pavel [:apavel] from comment #29) > :mdaly purely informative, most likely you will be asked again about the > progress here, as if this bug continues to appear and has a failure rate > above 30 failures in 7 days, we need to ask for an update. I understand entirely, I marked it as a high P2 accordingly. If the crash spikes up more we'll make it a P1 and address right away. Sorry that we cannot address it right away, nature of our resources at the moment.
I pinged baku about bug 1451381, and that patch might fix this bug.
We are trying to build a tool to automatically classify intermittent failures, which would provide a starting point for fixing the bug, reducing the manual work for the developers. We are collecting some feedback on the results, to see if they’re good enough and where we need to improve. For this bug, the tool says that the intermittent failure is most likely a: Resource Leak: This includes tests which have memory leaks, garbage collection issues, other memory allocation issues, pointer (de-) referencing issues or any kind of resource management issue which was not covered by another test case. E.g., a test fails because it holds a pointer to a resource which has been garbage collected or the system crashes because it ran out of memory. Once you’re done investigating and/or fixing the bug, could you tell me: - Did the tool correctly recognize the type of intermittent failure? - Did the information from the tool help your analysis, the bug fixing process, or anything in the process? (please also let us know how the tool was useful and/or what would improve the tool's usefulness for you)
removing old NI, adding to DWS_NEXT
Flags: needinfo?(mrbkap)
Whiteboard: [retriggered][stockwell unknown] → [retriggered][stockwell unknown]DWS_NEXT
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.