Intermittent Windows TC TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks!
Categories
(Core :: IPC, defect, P3)
Tracking
()
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Filed by: rvandermeulen [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=102148508&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/Ad5MSyVwTk6nl6ZqOe1OVQ/runs/0/artifacts/public/logs/live_backing.log
Comment 1•7 years ago
|
||
01:19:34 INFO - Traceback (most recent call last): 01:19:34 INFO - File "Z:\task_1496625151\build\tests\mochitest\runtests.py", line 2521, in doTests 01:19:34 INFO - marionette_args=marionette_args, 01:19:34 INFO - File "Z:\task_1496625151\build\tests\mochitest\runtests.py", line 2118, in runApp 01:19:34 INFO - self.marionette.start_session(timeout=port_timeout) 01:19:34 INFO - File "Z:\task_1496625151\build\venv\lib\site-packages\marionette_driver\decorators.py", line 28, in _ 01:19:34 INFO - m._handle_socket_failure() 01:19:34 INFO - File "Z:\task_1496625151\build\venv\lib\site-packages\marionette_driver\decorators.py", line 23, in _ 01:19:34 INFO - return func(*args, **kwargs) 01:19:34 INFO - File "Z:\task_1496625151\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1320, in start_session 01:19:34 INFO - self.protocol, _ = self.client.connect() 01:19:34 INFO - File "Z:\task_1496625151\build\venv\lib\site-packages\marionette_driver\transport.py", line 223, in connect 01:19:34 INFO - self.sock.connect((self.addr, self.port)) 01:19:34 INFO - File "c:\mozilla-build\python\Lib\socket.py", line 228, in meth 01:19:34 INFO - return getattr(self._sock,name)(*args) 01:19:34 INFO - error: [Errno 10061] No connection could be made because the target machine actively refused it
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 6•7 years ago
|
||
this failure started up on July 11th, it is occurring often enough, but not a high priority right now.
Comment 7•7 years ago
|
||
This is the log spew from the process that doesn't make a log: 08:31:22 INFO - GECKO(3852) | [Child 3348] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 08:31:22 INFO - GECKO(3852) | ###!!! [Child][RunMessage] Error: Channel error: cannot send/recv 08:31:22 INFO - GECKO(3852) | [Child 3348] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 08:31:22 INFO - GECKO(3852) | [Child 3348] WARNING: MsgDropped in ContentChild: file c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/dom/ipc/ContentChild.cpp, line 2184 08:31:22 INFO - GECKO(3852) | [Child 3348] WARNING: shutting down early because of crash!: file c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/dom/ipc/ContentChild.cpp, line 2145 We might be able to change shutdown in leak checking builds so it doesn't ever do that quick exit, but maybe we'd get so many leaks it wouldn't matter. But the basic problem seems to be the pipe error. I'm not sure what that is from.
Comment 8•7 years ago
|
||
I also see this a few times near there, which may or may not be related: [Parent 3852] WARNING: Created child without a matching parent?: file c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/gfx/layers/ipc/CrossProcessCompositorBridgeParent.cpp, line 92
Comment 9•7 years ago
|
||
If this started on the 11th, maybe this is related to bug 1373660? Comment 7 is about a beta crash, but I looked at one on trunk, and the process kind of started up then didn't do anything, which sounds kind of like the preallocated process.
Comment 10•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #9) > If this started on the 11th, maybe this is related to bug 1373660? Comment 7 > is about a beta crash, but I looked at one on trunk, and the process kind of > started up then didn't do anything, which sounds kind of like the > preallocated process. Looks suspicious. That patch was backed out first, and I think that date checks out as well... Not sure about the failure on beta though, I don't think this patch landed there, and I don't think the preallocated process manager is active on try on the beta channel either (the default process count is still 1 there and that disables the ppm). (In reply to Andrew McCreight [:mccr8] from comment #8) > I also see this a few times near there, which may or may not be related: > [Parent 3852] WARNING: Created child without a matching parent?: file > c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/gfx/layers/ipc/ > CrossProcessCompositorBridgeParent.cpp, line 92 I don't know what this warning stands for, so I'm not sure. David, do you think an empty content process (one without any tab) can trigger something like this? Or could you explain to me what this warning means and when can it happen?
Comment 11•7 years ago
|
||
I saw the compositor message on beta, but not inbound. Maybe there are multiple issues.
That warning can happen ifthe UI process opens a remote tab, shows it, and then immediately closes it. By the time the content process gets the "open tab" message, it tries to create a layer tree, but the compositor can't find any window to attach it to because the tab has been closed.
Note that it should be a harmless message, it's only bad if we expect a tab to be visible and it's not.
Comment hidden (Intermittent Failures Robot) |
Comment 15•7 years ago
|
||
So the preallocated process manager was turned off on nightly in bug 1376895 on 18th of July. So my patch was probably a no-op, which makes it less likely that this problem was caused by it... Unless if that one tests which turns it on causes the leak.
Comment 16•7 years ago
|
||
I looked at a couple of logs, and the content process was just started by random tests that don't seem to do anything specifically related to the preallocated process manager.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 28•7 years ago
|
||
we have 10+ failures in this bug and 30+ failures in bug 1281306, we should take a look at this. :billm, I see you are the triage owner here- is this something you could help find the right person to start looking at these leaks?
I looked at two of these failures. They're Windows-only and they seem to be caused by a content process printing "shutting down early because of crash!" and then dying. As a result, the content process doesn't dump out leak information. We also get a MsgDropped error in some of the failures. And there's always a slew of "pipe error: 109" messages, but that seems to happen for all the content processes. I'm going to forward this to Jim since he's really the triage owner for IPC. Maybe someone like Bob could look at this?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 55•6 years ago
|
||
Over the last 7 days there are 32 failures present on this bug. These happen on Windows 7, windows10-64 and windows10-64-ccov. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=168143419&lineNumber=12636 Here is a relevant part of that log: 06:51:22 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 4784 06:51:22 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:51:22 INFO - | | Per-Inst Leaked| Total Rem| 06:51:22 INFO - 0 |TOTAL | 77 0| 49784 0| 06:51:22 INFO - nsTraceRefcnt::DumpStatistics: 1049 entries 06:51:22 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:51:22 ERROR - 776 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 06:51:22 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\genericworker\appdata\local\temp\tmpbqabpe.mozrunner\runtests_leaks_tab_pid5904.log 06:51:22 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 7012
Comment hidden (Intermittent Failures Robot) |
Comment 57•6 years ago
|
||
I wonder if this is related to bug 1348361. The last time this spiked up it seemed like the preallocated process was involved. The timing isn't quite right, though. This started happening more on maybe the 11th or 13th, and bug 1348361 didn't land until the 16th. Bug 1445249 (one of the predecessor bugs to that) did land on the 13th. Maybe that's a red herring.
Comment hidden (Intermittent Failures Robot) |
Comment 59•6 years ago
|
||
Over the last 7 days there are 41 failures on this bug. These happen on windows10-64 and Windows 7 Here is a recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=170657195&lineNumber=15114 Relevant part of the log: 20:55:41 INFO - nsTraceRefcnt::DumpStatistics: 926 entries 20:55:41 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 20:55:41 ERROR - 619 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 20:55:41 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\genericworker\appdata\local\temp\tmp6vih7r.mozrunner\runtests_leaks_tab_pid8336.log
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 63•6 years ago
|
||
There are 63 failures in the past 7 days, all of them on Windows 7 and Windows10-64 debug. Recent log failure: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=172030105&lineNumber=24311 Relevant part of the log: 06:43:27 INFO - Stopping web server 06:43:27 INFO - Stopping web socket server 06:43:27 INFO - Stopping ssltunnel 06:43:27 INFO - TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes 06:43:27 INFO - TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes 06:43:27 INFO - TEST-INFO | leakcheck | tab process: leak threshold set at 1000 bytes 06:43:27 INFO - TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes 06:43:27 INFO - TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes 06:43:27 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 8136 06:43:27 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:43:27 INFO - | | Per-Inst Leaked| Total Rem| 06:43:27 INFO - 0 |TOTAL | 41 0| 3384286 0| 06:43:27 INFO - nsTraceRefcnt::DumpStatistics: 1866 entries 06:43:27 INFO - TEST-PASS | leakcheck | default process: no leaks detected! 06:43:27 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 3320 06:43:27 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:43:27 INFO - | | Per-Inst Leaked| Total Rem| 06:43:27 INFO - 0 |TOTAL | 77 0| 36226 0| 06:43:27 INFO - nsTraceRefcnt::DumpStatistics: 822 entries 06:43:27 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:43:27 ERROR - 754 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 06:43:27 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1522909241\appdata\local\temp\tmp8igu5i.mozrunner\runtests_leaks_tab_pid3844.log 06:43:27 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 4684 06:43:27 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:43:27 INFO - | | Per-Inst Leaked| Total Rem| 06:43:27 INFO - 0 |TOTAL | 72 0| 30565 0| 06:43:27 INFO - nsTraceRefcnt::DumpStatistics: 693 entries 06:43:27 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:43:27 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 4852 06:43:27 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:43:27 INFO - | | Per-Inst Leaked| Total Rem| 06:43:27 INFO - 0 |TOTAL | 45 0| 166632 0| 06:43:27 INFO - nsTraceRefcnt::DumpStatistics: 1053 entries 06:43:27 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:43:27 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5292 06:43:27 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:43:27 INFO - | | Per-Inst Leaked| Total Rem| 06:43:27 INFO - 0 |TOTAL | 76 0| 52061 0| 06:43:27 INFO - nsTraceRefcnt::DumpStatistics: 1024 entries 06:43:27 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:43:27 INFO - runtests.py | Running tests: end. 06:43:27 INFO - Buffered messages finished :jimm Can you please take a look?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 69•6 years ago
|
||
:mccr8, could you take a look at this failure- it seems to be win7/win10 specific, predominately on browser-chrome (oddly we don't see this on devtools which is also browser chrome). Could this be a harness issue that we need to cleanup better or shutdown the browser differently? Maybe we need to wait longer as the majority of the failure cases the browser completes the tests and closes down in <30 seconds.
Comment 70•6 years ago
|
||
I looked through three logs. The processes that are not producing leak logs get created, then create and destroy a few windows and docshells. The windows are for URLs like about:blank and data:text/html,Test. Then there is the "*** End BrowserChrome Test Results ***" message, and a little after that there are messages like this: 12:38:21 INFO - GECKO(2832) | [Child 3504, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 12:38:21 INFO - GECKO(2832) | [Child 3504, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 12:38:21 INFO - GECKO(2832) | [Child 3504, Main Thread] WARNING: shutting down early because of crash!: file z:/build/build/src/dom/ipc/ContentChild.cpp, line 2355 I'm not sure why we're getting an abnormal shutdown. Maybe the parent shut down before the child and eventually the pipe breaks and the child shuts down? I'm afraid I don't know too much about child process creation and shutdown. You could see if it reproduces with bug 1348361 backed out. That was my only theory when this started showing up again. Blake, do you know of any other changes to how we create and destroy child processes that has landed around the 11th or 13th of March?
Comment hidden (Intermittent Failures Robot) |
Comment 72•6 years ago
|
||
trying to answer the question if bug 1348361 is the problem, Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=783a703e15002db5e2d80cfb40e67a49ab94ec66&filter-searchStr=browser-chrome backout of the 3 patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=711e4f9a38d6eb1935faf970a6ca3c4b9468584b&filter-searchStr=browser-chrome still waiting on results...
Comment 73•6 years ago
|
||
baseline had 12 instances backout had 6 instances
Comment hidden (Intermittent Failures Robot) |
Comment 75•6 years ago
|
||
:jmaher, :mccr8, :mrbkap do you guys have any updates for this bug?
Comment 76•6 years ago
|
||
thanks for checking Arthur- I haven't looked into this more and the rate is down <15/week so far this week and last week.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 83•6 years ago
|
||
My first instinct was that bug 1423261 could be related here, but the dates for checkins in that bug don't seem to line up with the dates in this bug. I don't really have any more ideas, sadly.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 88•6 years ago
|
||
Update: In the last 7 days, there have been 31 failures. All of them occur on the debug build type and on the following platforms: - windows10-64 - 15 - windows7-32 - 15 - linux32 - 1 Here is a recent relevant log file and a snippet with the failure: https://treeherder.mozilla.org/logviewer.html#?job_id=185202671&repo=mozilla-inbound&lineNumber=2467 16:50:14 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 16:50:14 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5944 16:50:14 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 16:50:14 INFO - | | Per-Inst Leaked| Total Rem| 16:50:14 INFO - 0 |TOTAL | 29 0| 348715 0| 16:50:14 INFO - nsTraceRefcnt::DumpStatistics: 1036 entries 16:50:14 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 16:50:14 ERROR - 60 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 16:50:14 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1530117000\appdata\local\temp\tmpp9mde8.mozrunner\runtests_leaks_tab_pid7148.log 16:50:14 INFO - runtests.py | Running tests: end. 16:50:14 INFO - Buffered messages finished 16:50:14 INFO - Running manifest: browser\base\content\test\tabcrashed\browser.ini 16:50:15 INFO - Z:\task_1530117000\build\tests\bin\pk12util.exe: PKCS12 IMPORT SUCCESSFUL :jmaher, this started to peak again on June 24th, with 19 failures in the last 3 days. Looking back over the last 30 days this had 73 failures. Can you please have a look?
Comment 89•6 years ago
|
||
It seems that this started to increase with this push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f0f5158db88c8a9712b7e09fccaf10d004343e85&selectedJob=184865093 Intermittent failure view: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-06-21&endday=2018-06-28&tree=trunk&bug=1368036
Comment 90•6 years ago
|
||
not sure if that is the root cause, but it does seem around that point. I find that windows7 and windows10 are the same frequency- predominately in the browser-chrome suite, but a mix of devtools and mochitest-plain are there as well. I did a bunch of retriggers to help figure this out with no luck: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=browser-chrome%20win%20debug&tochange=f0f5158db88c8a9712b7e09fccaf10d004343e85&fromchange=721354ad7b1622ad9e681d69591a991b1d45a17d lets see if there is a pattern emerging in the forthcoming week.
Comment 91•6 years ago
|
||
I did many retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=browser-chrome-e10s-2%20win%20debug&tochange=f0f5158db88c8a9712b7e09fccaf10d004343e85&fromchange=c24b5f6944bb62ba04d67af93702aed2743cecda&selectedJob=185371805 this is deceptive as we don't have a single test, but when this became more frequent, I see that we changed many of the browser_* files (which means tests shifted between chunks). I am done spending time on this bug for now, will revisit with more data.
Comment hidden (Intermittent Failures Robot) |
Comment 93•6 years ago
|
||
Over the last 7 days there are 41 failures on this bug. These happen on windows7-32, linux64, windows10-64 Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=186814999&repo=mozilla-central&lineNumber=3393 11:09:08 INFO - nsTraceRefcnt::DumpStatistics: 817 entries 11:09:08 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 11:09:08 ERROR - 169 ERROR TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 11:09:08 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1530874780\appdata\local\temp\tmplmf9cl.mozrunner\runtests_leaks_tab_pid5544.log 11:09:08 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5948 11:09:08 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 11:09:08 INFO - | | Per-Inst Leaked| Total Rem| 11:09:08 INFO - 0 |TOTAL | 42 0| 35809 0|
Comment hidden (Intermittent Failures Robot) |
Comment 95•6 years ago
|
||
Mike, do you have any idea what is causing the increase of intermittents here? The 5 logs I checked all run different test folders before the hit the issue. Thank you in advance.
Comment 96•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b260b960f8737d579b74daabc27cad3dcc878600
Comment 97•6 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #95) > Mike, do you have any idea what is causing the increase of intermittents > here? No, I don't. The "shutting down early because of crash!" message is probably what needs to be investigated here. That message gets logged when a content process (or plugin process, although I think that's less relevant here) gets ContentChild::ActorDestroy called on it with an AbnormalShutdown. AbnormalShutdown, as I understand it, occurs when there's some kind of foul-up communicating across the IPC protocol. So it might be worth understanding what's passing in the AbnormalShutdown. I've pushed a patch to try to try to get a stack when that occurs. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b260b960f8737d579b74daabc27cad3dcc878600
Comment 98•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5037c7a5d41a2ecd7831b7e52ed2131e39fceb55
Comment hidden (Intermittent Failures Robot) |
Comment 100•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1bf8bd31d3d18a6be2114728e02ba6d0b5fb5cb
Comment 101•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b3b1e73c2911f8beeee4e97c9fefcecea4df3a0
Comment hidden (Intermittent Failures Robot) |
Comment 103•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a8d17feda25a4a405177de8fba44891176db369
Comment 104•6 years ago
|
||
I did some IPC logging on a try run where the problem reproduced. Push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a8d17feda25a4a405177de8fba44891176db369 Full log: https://taskcluster-artifacts.net/VGltnHSaQrCV0-fojgGTag/0/public/logs/live_backing.log Relevant snippet: 16:35:36 INFO - GECKO(6392) | [Child 5852, Main Thread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file z:/build/build/src/xpcom/base/nsTraceRefcnt.cpp, line 196 16:35:36 INFO - GECKO(6392) | nsStringStats 16:35:36 INFO - GECKO(6392) | => mAllocCount: 7968 16:35:36 INFO - GECKO(6392) | => mReallocCount: 301 16:35:36 INFO - GECKO(6392) | => mFreeCount: 7968 16:35:36 INFO - GECKO(6392) | => mShareCount: 6221 16:35:36 INFO - GECKO(6392) | => mAdoptCount: 902 16:35:36 INFO - GECKO(6392) | => mAdoptFreeCount: 908 16:35:36 INFO - GECKO(6392) | => Process ID: 5852, Thread ID: 7976 16:35:36 INFO - GECKO(6392) | [Child 4860: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 4860: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7528: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7528: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7528: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7528: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7528: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7528: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7528: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7528: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | --DOCSHELL 000001C89DF02800 == 4 [pid = 6392] [id = {bca66967-4d55-48a5-8e5e-7299d21028e7}] 16:35:36 INFO - GECKO(6392) | --DOCSHELL 000001C89886F000 == 3 [pid = 6392] [id = {15382fcc-a583-447c-ac57-37532be31594}] 16:35:36 INFO - GECKO(6392) | --DOCSHELL 000001C89CE5F000 == 2 [pid = 6392] [id = {e0111f91-baa5-49e1-952c-97524c7bf9d6}] 16:35:36 INFO - GECKO(6392) | --DOCSHELL 000001C898870800 == 1 [pid = 6392] [id = {5d6bd7a8-febb-46a9-a589-6e669a6f4c11}] 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc OnChannelErrorFromLink 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Gecko_IOThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Parent 6392: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Parent 6392: Compositor]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 4860: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 4732: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | [Child 4860: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7528: Chrome_ChildThread]: D/ipc OnChannelErrorFromLink 16:35:36 INFO - GECKO(6392) | [Child 4732: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | [Child 7164: Chrome_ChildThread]: D/ipc Receive on link thread; seqno=0, xid=-1, shouldWakeUp=0 16:35:36 INFO - GECKO(6392) | --- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired) 16:35:36 INFO - GECKO(6392) | calling context 16:35:36 INFO - GECKO(6392) | [stack trace unavailable] 16:35:36 INFO - GECKO(6392) | [Child 7528, Main Thread] WARNING: shutting down early because of crash!: file z:/build/build/src/dom/ipc/ContentChild.cpp, line 2365 16:35:36 INFO - GECKO(6392) | [Child 7164: Main Thread]: D/ipc DispatchMessage: seqno=0, xid=-1 16:35:36 INFO - GECKO(6392) | Assertion failure: false (OH SNAP - here we go.), at z:/build/build/src/dom/ipc/ContentChild.cpp:2366 I suspect we want someone who's more familiar with IPC to look over this to see if they can make heads or tails of it. I'm afraid I can't.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 113•6 years ago
|
||
Update: There have been 93 failures in the last 7 days. Debug is the only build type. Failures per platform and build type: - windows10-64: 55 - windows7-32: 27 - windows10-64-msvc: 5 - windows10-64-ccov: 3 - windows7-32-msvc: 3 Recent log file: https://treeherder.mozilla.org/logviewer.html#?job_id=193466986&repo=mozilla-inbound&lineNumber=27334
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 119•6 years ago
|
||
Over the last 7 days there are 63 failures on this bug. These happen on windows10-64, windows10-64-msvc, windows7-32, windows7-32-msvc Here is the latest log example: https://treeherder.mozilla.org/logviewer.html#?job_id=194806861&repo=autoland&lineNumber=26098 08:46:19 INFO - 08:46:19 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 08:46:19 INFO - | | Per-Inst Leaked| Total Rem| 08:46:19 INFO - 0 |TOTAL | 42 0| 56624 0| 08:46:19 INFO - 08:46:19 INFO - nsTraceRefcnt::DumpStatistics: 835 entries 08:46:19 INFO - 08:46:19 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 08:46:19 INFO - 08:46:19 ERROR - TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 08:46:19 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1534753478\appdata\local\temp\tmplbm942.mozrunner\runtests_leaks_tab_pid6032.log 08:46:19 INFO - 08:46:19 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6084 08:46:19 INFO - 08:46:19 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 08:46:19 INFO - | | Per-Inst Leaked| Total Rem| 08:46:19 INFO - 0 |TOTAL | 36 0| 171873 0| 08:46:19 INFO -
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 125•6 years ago
|
||
Over the last 7 days there are 139 failures on this bug. These happen on windows7-32 and windows10-64. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=196142528&repo=autoland&lineNumber=32407 18:05:12 INFO - nsTraceRefcnt::DumpStatistics: 907 entries 18:05:12 INFO - 18:05:12 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 18:05:12 INFO - 18:05:12 ERROR - TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 18:05:12 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1535391153\appdata\local\temp\tmpugasq5.mozrunner\runtests_leaks_tab_pid6660.log 18:05:12 INFO - 18:05:12 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6716 18:05:12 INFO - 18:05:12 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 18:05:12 INFO - | | Per-Inst Leaked| Total Rem| 18:05:12 INFO - 0 |TOTAL | 37 0| 324885 0| 18:05:12 INFO -
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 132•6 years ago
|
||
Over the last 7 days there are 157 failures on this bug. These happen on windows7-32 and windows10-64. Log: https://treeherder.mozilla.org/logviewer.html#?job_id=197542296&repo=autoland&lineNumber=17747 INFO - 05:15:24 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 05:15:24 INFO - 05:15:24 ERROR - TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 05:15:24 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1536122844\appdata\local\temp\tmptqrz9k.mozrunner\runtests_leaks_tab_pid4892.log 05:15:24 INFO - 05:15:24 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5908 05:15:24 INFO - 05:15:24 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 05:15:24 INFO - | | Per-Inst Leaked| Total Rem| 05:15:24 INFO - 0 |TOTAL | 28 0| 34239 0| 05:15:24 INFO - 05:15:24 INFO - nsTraceRefcnt::DumpStatistics: 929 entries 05:15:24 INFO - 05:15:24 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! jimm: Can you please take a look at this bug?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 141•6 years ago
|
||
Update: There have been 165 failures in the last week. All the failures occur on the debug build type. The most affected platforms are windows7-32 and windows10-64. Summary: Intermittent Windows TC leakcheck | tab process: missing output line for total leaks! Recent relevant log file: https://treeherder.mozilla.org/logviewer.html#?job_id=199018973&repo=mozilla-central&lineNumber=33296
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 148•6 years ago
|
||
218 failures for the past week occurring on windows10-64 and windows7-32 debug. Recent log file: https://treeherder.mozilla.org/logviewer.html#?job_id=200623450&repo=autoland&lineNumber=28214
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 165•6 years ago
|
||
There are 216 failures in the last 7 days. They occur on windows10-64 and windows7-32 debug. This has a very high failure rate. As I see in IFV this started to increase from the 7th of September. Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=203954806&repo=autoland&lineNumber=28219 jimm: I see that you are the triage owner, can you please take a look into this or point to someone that can? Thanks!
Comment 166•6 years ago
|
||
Looking through the history of this bug I see that there were many attempts of finding the root cause and fixing it. The last one was from mconley, 3 months ago, Comment 104. jmaher: Can we ni someone else on this bug more familiar with IPC? Thank you!
Comment 167•6 years ago
|
||
:jimm is the right person, I would like to leave this for him, but 3 months of a needinfo seems long
Comment 168•6 years ago
|
||
:selena, do you know if :jimm is on leave or has been on pto a lot in the last 3 months- we have some of our most frequent intermittents as tracked down to IPC related. Maybe there is someone else besides :jimm who could help out?
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 170•6 years ago
|
||
From https://taskcluster-artifacts.net/eHydZWVNST6GLWnu8PNTdw/0/public/logs/live_backing.log, where pid 6032 is the one with the missing leak log: 08:44:40 INFO - GECKO(8120) | [Parent 8120, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 08:44:40 INFO - GECKO(8120) | [Child 6032, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 08:44:40 INFO - GECKO(8120) | [Child 6032, Main Thread] WARNING: shutting down early because of crash!: file z:/build/build/src/dom/ipc/ContentChild.cpp, line 2400 Normally I'd expect to see the broken pipe error from one side when the other one exits early/unexpectedly (see also comment #70) but this seems to be from both sides at once. Slightly later in the log, around 08:44:42-43, there are similar collections of pipe error messages for other child pids, but none of the “shutting down early”; these might be toplevels other than PContent. It's probably also significant that this happens only on Windows, where we use Windows named pipes instead of POSIX local-domain sockets. (I see that there's been an occasional Linux or Mac failure assigned to this bug, but I suspect those are unrelated.)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 174•6 years ago
|
||
Hi Matt, are there any updates here? Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=204656914&repo=mozilla-inbound&lineNumber=27802 00:33:40 INFO - GECKO(4852) | MEMORY STAT | vsize 1483MB | vsizeMaxContiguous 130912257MB | residentFast 131MB | heapAllocated 26MB 00:33:40 INFO - TEST-OK | toolkit/components/extensions/test/mochitest/test-oop-extensions/test_ext_storage_manager_capabilities.html | took 2727ms 00:33:40 INFO - TEST-START | Shutdown 00:33:40 INFO - Passed: 15 00:33:40 INFO - Failed: 0 00:33:40 INFO - Todo: 0 00:33:40 INFO - Mode: e10s 00:33:40 INFO - Slowest: 2727ms - /tests/toolkit/components/extensions/test/mochitest/test-oop-extensions/test_ext_storage_manager_capabilities.html 00:33:40 INFO - SimpleTest FINISHED 00:33:40 INFO - TEST-INFO | Ran 1 Loops 00:33:40 INFO - SimpleTest FINISHED 0:33:41 INFO - nsTraceRefcnt::DumpStatistics: 929 entries 00:33:41 INFO - 00:33:41 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 00:33:41 INFO - 00:33:41 ERROR - TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 00:33:41 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1539216339\appdata\local\temp\tmpuzcuui.mozrunner\runtests_leaks_tab_pid6116.log 00:33:41 INFO - 00:33:41 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 7684
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 177•6 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #170) > Normally I'd expect to see the broken pipe error from one side when the > other one exits early/unexpectedly (see also comment #70) but this seems to > be from both sides at once. It's not. I did a Try run with extra logging, and they're different channels, and also neither one is the main (PContent) channel. The channel where the child gets a pipe error immediately before or sometimes *after* the “shutting down early because of crash” is the process hang monitor; the one where the parent gets a pipe error right before that is the profiler. Also, there sometimes *isn't* a pipe error message about the PContent channel (which is definitely getting some kind of error, because that's how we end up in the “because of crash” case), but in one test run there was. I also tried changing that warning to a MOZ_CRASH, to try to reproduce this on opt builds (the early exit won't cause the leak checking to fail, because if I understand correctly we don't do that kind of leak checking on opt builds, so it makes sense that we never see it there), in the hope that I could get more repros per unit CPU time, but all that got me was a bunch of crashes with no backtrace or file/line info.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 184•6 years ago
|
||
There are 98 failures asociated to this bug in the last 7 days. These are occurring on Windows 7 and 10 on the debug builds. :jimm any updates on this?
Comment hidden (Intermittent Failures Robot) |
Comment 186•6 years ago
|
||
(In reply to Andrei Ciure[:andrei_ciure_] from comment #184) > There are 98 failures asociated to this bug in the last 7 days. These are > occurring on Windows 7 and 10 on the debug builds. > > :jimm any updates on this? Not yet, I have someone looking at it.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 192•6 years ago
|
||
In the last 7 days there have been 111 occurrences Windows 7 and Windows 10 debug.
Comment hidden (Intermittent Failures Robot) |
Comment 194•6 years ago
|
||
Jim let us know when there are any update here, this continues to have a high occurrence rate. Thank you.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 201•6 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #194) > Jim let us know when there are any update here, this continues to have a > high occurrence rate. > Thank you. We haven't made any progress on this. Reproducing seems to be a problem. I currently don't have anyone working on this.
Comment hidden (Intermittent Failures Robot) |
Comment 203•6 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #201) > (In reply to Andreea Pavel [:apavel] from comment #194) > > Jim let us know when there are any update here, this continues to have a > > high occurrence rate. > > Thank you. > > We haven't made any progress on this. Reproducing seems to be a problem. I > currently don't have anyone working on this. Thank you Jim. I looked over some failure logs, the tests where this fails differ, the only common thing is that this fails on windows debug. (failures on other platforms are missclassifications)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 212•6 years ago
|
||
This bug has failed 194 times in the last 7 days. It occurs on Windows10-64, Windows7-32 on debug build types. Log: https://treeherder.mozilla.org/logviewer.html#?job_id=210502906&repo=autoland&lineNumber=27629 TEST-PASS | leakcheck | tab process: no leaks detected! 06:16:10 INFO - 06:16:10 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5936 06:16:10 INFO - 06:16:10 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:16:10 INFO - | | Per-Inst Leaked| Total Rem| 06:16:10 INFO - 0 |TOTAL | 40 0| 39217 0| 06:16:10 INFO - 06:16:10 INFO - nsTraceRefcnt::DumpStatistics: 937 entries 06:16:10 INFO - 06:16:10 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:16:10 INFO - 06:16:10 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6012 06:16:10 INFO - 06:16:10 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:16:10 INFO - | | Per-Inst Leaked| Total Rem| 06:16:10 INFO - 0 |TOTAL | 36 0| 51765 0| 06:16:10 INFO - 06:16:10 INFO - nsTraceRefcnt::DumpStatistics: 869 entries 06:16:10 INFO - 06:16:10 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:16:10 INFO - 06:16:10 ERROR - TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks! 06:16:10 INFO - TEST-INFO | leakcheck | missing output line from log file c:\users\task_1541656647\appdata\local\temp\tmpwg7f7m.mozrunner\runtests_leaks_tab_pid6396.log 06:16:10 INFO - 06:16:10 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6408 06:16:10 INFO - 06:16:10 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:16:10 INFO - | | Per-Inst Leaked| Total Rem| 06:16:10 INFO - 0 |TOTAL | 41 0| 33637 0| 06:16:10 INFO - 06:16:10 INFO - nsTraceRefcnt::DumpStatistics: 912 entries 06:16:10 INFO - 06:16:10 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:16:10 INFO - 06:16:10 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6724 06:16:10 INFO - 06:16:10 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| 06:16:10 INFO - | | Per-Inst Leaked| Total Rem| 06:16:10 INFO - 0 |TOTAL | 41 0| 34442 0| 06:16:10 INFO - 06:16:10 INFO - nsTraceRefcnt::DumpStatistics: 915 entries 06:16:10 INFO - 06:16:10 INFO - TEST-PASS | leakcheck | tab process: no leaks detected! 06:16:10 INFO - 06:16:10 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 6772 jimm: Can you please take a look at this bug?
Comment 213•6 years ago
|
||
I did another Try push with instrumentation: https://treeherder.mozilla.org/#/jobs?repo=try&revision=833186543279cbf1a6ed1349f864ebd73abc1ed4 It doesn't tell me much, but it does show that the error is associated with reading, not writing. And, both with these as well as the pipe errors that don't cause this bug, sometimes we get error 109 as part of the OS event that prompts us to read (or however that part works), but more often that's not returned until the actual read operation; I have no idea if that's meaningful. I have a vague idea: this might be a race between the parent and child closing the channel, where if the parent closes first then the child sees a broken pipe error and assumes the browser crashed, but most of the time the timing works out so that the child closes first and the parent handles the error indication more gracefully. But that doesn't explain why we never see this on Unix, because an end-of-file on read would be reported as an error if I'm reading [1] correctly. Maybe the timing is different, but it's surprising that we *never* see this there (and also seem to get a lot less of the spurious broken pipes). And I don't yet understand how IPC channels are closed in a way that *isn't* considered an error, or if that's even possible. (As always with IPC, there isn't a lot of documentation, and the people who created it are no longer here.) [1] https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#369
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 217•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #213) > I did another Try push with instrumentation: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=833186543279cbf1a6ed1349f864ebd73abc1ed4 > > It doesn't tell me much, but it does show that the error is associated with > reading, not writing. And, both with these as well as the pipe errors that > don't cause this bug, sometimes we get error 109 as part of the OS event > that prompts us to read (or however that part works), but more often that's > not returned until the actual read operation; I have no idea if that's > meaningful. > > > I have a vague idea: this might be a race between the parent and child > closing the channel, where if the parent closes first then the child sees a > broken pipe error and assumes the browser crashed, but most of the time the > timing works out so that the child closes first and the parent handles the > error indication more gracefully. > > But that doesn't explain why we never see this on Unix, because an > end-of-file on read would be reported as an error if I'm reading [1] > correctly. Maybe the timing is different, but it's surprising that we > *never* see this there (and also seem to get a lot less of the spurious > broken pipes). > > And I don't yet understand how IPC channels are closed in a way that *isn't* > considered an error, or if that's even possible. (As always with IPC, there > isn't a lot of documentation, and the people who created it are no longer > here.) > > [1] > https://searchfox.org/mozilla-central/rev/ > d850d799a0009f851b5535580e0a8b4bb2c591d7/ipc/chromium/src/chrome/common/ > ipc_channel_posix.cc#369 Thanks for looking Jed. This still has a high occurrence rate and since the tests that fail differ, disabling this is not the way to go.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 226•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #213) > And I don't yet understand how IPC channels are closed in a way that *isn't* > considered an error, or if that's even possible. (As always with IPC, there > isn't a lot of documentation, and the people who created it are no longer > here.) We do: see bug 529005. This is GoodbyeMessage and GOODBYE_MESSAGE_TYPE, which I didn't notice at first because they're defined as part of our MessageChannel overlay[1], whereas HELLO_MESSAGE_TYPE is part of the channel code from Chromium[2]. It looks like this is taken into account in the error-handling path[3][4], but there was some uncertainty on the topic of simultaneous Close(). Also, I wonder if there might be OS-specific quirks around sending a message and then immediately closing the pipe before it's read. So, an idea for another Try run with logging: log when goodbye messages are sent and received, and when a channel error is/isn't turned into a channel close, and see if anything unusual happens here. [1] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/ipc/glue/ProtocolUtils.h#61 [2] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/ipc/chromium/src/chrome/common/ipc_channel.h#176 [3] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/ipc/glue/MessageChannel.cpp#2656 [4] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/ipc/glue/MessageChannel.cpp#2672-2673
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 234•6 years ago
|
||
There are 149 failures asociated to this bug in the last 7 days. These are occurring on Windows 7 and 10 debug only. Any updates on this?
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 236•6 years ago
|
||
There are 113 total failures in the last 7 days and 682 total failures in the last 30 days, all on windows debug. Jed are you still working on this? do you have updates?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 242•5 years ago
|
||
I tried to reproduce this and failed, and it looks like it may have been fixed.
The original problem here was spuriously getting missing output line for total leaks
without an actual crash; that error is expected if a process exits abnormally. Also, there was always the shutting down early because of crash!
error (indicating a content process thinking the parent process had crashed), and it was always just one content process, not a large number of processes.
All of the reports since about December 5th (between comment #237 and comment #238), as well as all the failures when I pushed to Try yesterday and today, have either had an actual PROCESS-CRASH
, or something where apparently all the child processes (including RDD and GPU) fail to write leak logs; in no case is there the shutting down early because of crash!
message.
Specifically, looking at the list in orangefactor, the last real failure is from revision 55b77d1533ab6 and the first lookalike is from revision d8a3f89d4bb67f. In between those there are a few IPC changes: bug 1487249, bug 1500944, bug 1509362, and bug 1512085.
Leaving the needinfo because I'd like to look at this some more (and maybe try bisecting to find the responsible commit) before declaring victory.
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 254•5 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #242)
Leaving the needinfo because I'd like to look at this some more (and maybe try bisecting to find the responsible commit) before declaring victory.
I'm not going to have time for this. Resolving the bug so that people will stop assigning unrelated crashes to it.
Comment 255•5 years ago
|
||
Ideally leakcheck or something would be smarter and not flag this failure if there was already some other crash explicitly recognized.
Description
•