Closed
Bug 1322138
Opened 8 years ago
Closed 7 years ago
Intermittent test_crash.py TestCrash.test_crash_chrome_process | AssertionError: "Process crashed" does not match "Process killed because the connection to Marionette server is lost. Check gecko.log for errors (Reason: Connection timed out after 10s)"
Categories
(Testing :: Marionette Client and Harness, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1376773
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])
Filed by: philringnalda [at] gmail.com
https://treeherder.mozilla.org/logviewer.html#?job_id=5805635&repo=mozilla-central
https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win64-pgo/1480908673/mozilla-central_win8_64_test_pgo-marionette-e10s-bm110-tests1-windows-build16.txt.gz
Comment 1•8 years ago
|
||
The test itself sets the socket timeout to 10s because the crashing code should crash Firefox immediately. As it looks like for this pgo build it has been taken longer, so Marionette killed the process due to socket connection loss.
It really reminds me to the remaining problem on bug 1299216 for Windows 8/10 64bit machines. Lets see and wait for more reports like this one.
OS: Unspecified → Windows 8
Hardware: Unspecified → x86_64
Comment 2•8 years ago
|
||
Closing as intermittent has not been seen in last 45 days
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Comment 3•8 years ago
|
||
This happened again today:
https://treeherder.mozilla.org/logviewer.html#?job_id=95212819&repo=autoland
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 11•8 years ago
|
||
So lately this is only happening on OS X 10.10 for opt (e10s) builds:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1322138&startday=2017-05-15&endday=2017-06-11&tree=all
Here the common output from the gecko log:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/e34db8e3ab27d214703bdba9dc5f5c2db3817890d36a453bdcd8f43054ea3a171d7a8331e3183a20abd2bb65b23db8ab7e5f949f17da36ddbb9f7201e61dfa7c
1497023126005 Marionette TRACE 173 -> [0,11,"executeScript",{"scriptTimeout":null,"newSandbox":true,"args":[],"filename":"test_crash.py","script":"\n // Copied from crash me simple\n Components.utils.import(\"resource://gre/modules/ctypes.jsm\");\n\n // ctypes checks for NULL pointer derefs, so just go near-NULL.\n var zero = new ctypes.intptr_t(8);\n var badptr = ctypes.cast(zero, ctypes.PointerType(ctypes.int32_t));\n var crash = badptr.contents;\n ","sandbox":null,"line":84}]
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
[Child 1722] WARNING: pipe error: Broken pipe: file /builds/slave/m-in-m64-000000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709
** Unknown exception behavior: -2147483647
2017-06-09 08:46:35.020 plugin-container[1725:13582] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x973f, name = 'com.apple.tsm.portname'
See /usr/include/servers/bootstrap_defs.h for the error codes.
2017-06-09 08:46:35.026 plugin-container[1725:13582] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x4923, name = 'com.apple.CFPasteboardClient'
See /usr/include/servers/bootstrap_defs.h for the error codes.
2017-06-09 08:46:35.026 plugin-container[1725:13582] Failed to allocate communication port for com.apple.CFPasteboardClient; this is likely due to sandbox restrictions
I wonder if the GFX process related behavior here is causing the crash during shutdown instead of normally exiting.
Milan, do you know someone who could help with that?
Flags: needinfo?(milan)
OS: Windows 8 → Mac OS X
Hardware: x86_64 → All
Comment 12•8 years ago
|
||
Actually the crashes started to happen on April 13th:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1322138&startday=2017-04-12&endday=2017-06-11&tree=all
(In reply to Henrik Skupin (:whimboo) from comment #11)
> So lately this is only happening on OS X 10.10 for opt (e10s) builds
> ...
> I wonder if the GFX process related behavior here is causing the crash
> during shutdown instead of normally exiting.
Shouldn't be related to GFX process - we don't actually create those on OS X.
Nical, the CompositorBridgeChild::ActorDestroy() getting called with a AbnormalShutdown for the ActorDestroyReason - this is a side effect of something bad happening earlier?
Flags: needinfo?(milan) → needinfo?(nical.bugzilla)
Comment 14•8 years ago
|
||
AbnormalShutdown usually means the other process crashed (or the connection was lost for whatever other unexpected reason).
Flags: needinfo?(nical.bugzilla)
Comment 15•8 years ago
|
||
I wonder if the issue seen here lately could be related to bug 1371207 which is about a crash of the main thread, and which started to happen recently.
Comment hidden (Intermittent Failures Robot) |
Comment 17•8 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #14)
> AbnormalShutdown usually means the other process crashed (or the connection
> was lost for whatever other unexpected reason).
So I assume this assertion is not something critical? I'm asking because I can see this always in our content crash unit test for Marionette.
Flags: needinfo?(nical.bugzilla)
Comment 18•8 years ago
|
||
It means something went wrong on the other process (which most likely crashed) but it doesn't tell how critical that is. You can expect to see this whenever a process crashes that has som gfx related ipc. Gfx stuff is tricky to properly shutdown when ipc goes nuts so it is a good indicator for us when something else fails catastrophically in gfx-land right after, but it doesn't mean the root cause is actually gfx-related.
Flags: needinfo?(nical.bugzilla)
Comment 19•8 years ago
|
||
I see. Thank you for this explanation. So I doubt that it is important for us here given that this is a forced crash by the harness for testing purposes.
It actually should no longer occur with my upcoming changes for the unit test on bug 1223277.
Depends on: 1223277
Comment 20•8 years ago
|
||
glad this is understood and there are patches in the works for bug 1223277! this has a lot of failures, but it looks like the real fix will get in sooner than later, no need to consider backing out.
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Whiteboard: [stockwell needswork] → [stockwell unknown]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 28•8 years ago
|
||
We missed to uplift the patch on bug 1381403 to beta. So this is only fixed for 56. The last failures as reported by OF are expected.
I will leave the bug open until we are clear about the real underlying issue.
Updated•8 years ago
|
Keywords: leave-open
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 35•7 years ago
|
||
Only happens on esr-52 and release. Nothing we can do about that.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 7 years ago
Resolution: --- → FIXED
Comment 36•7 years ago
|
||
No, this is disabled and should actually be a dupe of bug 1376773.
Resolution: FIXED → DUPLICATE
Comment hidden (Intermittent Failures Robot) |
Comment 38•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Updated•2 years ago
|
Product: Testing → Remote Protocol
Comment 39•2 years ago
|
||
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in
before you can comment on or make changes to this bug.
Description
•