Closed
Bug 1267106
Opened 9 years ago
Closed 9 years ago
Intermittent Windows/Mac e10s startup hang leading to REFTEST PROCESS-CRASH | reftest | application crashed [@ CrashingThread(void *)], | application timed out after 330 seconds with no output
Categories
(Core :: IPC, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla52
People
(Reporter: RyanVM, Assigned: jimm)
References
(Blocks 1 open bug)
Details
(Keywords: crash, intermittent-failure, Whiteboard: [e10s-orangeblockers])
Attachments
(2 files, 1 obsolete file)
76.66 KB,
image/png
|
Details | |
1.05 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
I'd noticed this on Ash but haven't had time to narrow it down. Then I noticed today that it was affecting Aurora as well. After a giant pile of retriggering, I can say with a high level of certainty that it's caused by bug 1235633. It also fits the regression range on Ash (and I assume it would on other trunk branches as well, but I'll leave that to the full-time sheriffs to confirm).
Screenshot always shows a blank window. Affects Win7/Win8 reftests & crashtests with e10s enabled.
https://treeherder.mozilla.org/logviewer.html#?job_id=2460175&repo=mozilla-aurora
Flags: needinfo?(wmccloskey)
Reporter | ||
Comment 1•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #0)
> but I'll leave that to the full-time sheriffs to confirm).
Going back through the history, I see failures matching this starred as "intermittent" without a bug being filed going back for about two weeks, so there's that...
![]() |
Assignee | |
Updated•9 years ago
|
Blocks: e10s-tests
Can you provide more information about the priority here Ryan? How often does it happen? Does it happen on inbound/central? I don't understand why it wasn't caught when my patch originally landed.
Flags: needinfo?(wmccloskey) → needinfo?(ryanvm)
Reporter | ||
Comment 3•9 years ago
|
||
It's intermittent. Yes, happens on production as well:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1267106&endday=2016-04-28&startday=2016-04-01&tree=trunk
Mostly PGO-only, for whatever that's worth. My main concern is whether this'll manifest in a user-facing way or not.
Flags: needinfo?(ryanvm)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 13•9 years ago
|
||
09:30:07 INFO - JavaScript error: , line 0: uncaught exception: undefined
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 25•9 years ago
|
||
This is extremely frequent across the board on Windows these days. Would be great if we could get someone to look into it. Affects all Windows e10s suites that use the reftest harness.
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox51:
--- → affected
Flags: needinfo?(jmathies)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Comment 30•9 years ago
|
||
Tracy, can you look into this? ping me if you have questions.
Flags: needinfo?(jmathies) → needinfo?(twalker)
Comment 33•9 years ago
|
||
I'm willing to believe that the OS X failures which started a few days ago are somehow different, but until someone explains how by filing a separate bug, I'm going to just dump them in here.
Summary: Intermittent Windows e10s startup hang leading to REFTEST PROCESS-CRASH | reftest | application crashed [@ CrashingThread(void *)] → Intermittent Windows/Mac e10s startup hang leading to REFTEST PROCESS-CRASH | reftest | application crashed [@ CrashingThread(void *)], | application timed out after 330 seconds with no output
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Updated•9 years ago
|
Flags: needinfo?(jmathies)
Comment hidden (Intermittent Failures Robot) |
Comment 39•9 years ago
|
||
I looked at this and logs say that we are failing to launch the child process completely: which triggers some assertions, but we don't have stacks/reasons for failing to launch.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•9 years ago
|
||
I dragged a handy mozilla-central push off into a corner and retriggered the Win7 opt e10s reftest-noaccel 25 times, and got this 13 times on it, so that suite is unacceptable and needs to be hidden. Not sure how many other flavors also need to be hidden.
Comment 47•9 years ago
|
||
Only 7 of 25 on PGO, but then, 28% is also completely unacceptable.
Comment hidden (Intermittent Failures Robot) |
Comment 49•9 years ago
|
||
Hiding it is temporarily blocked by bug 1301260.
Fun story: the Mac failures completely stopped September 1st, and since that time there have been 4 failures on regular accelerated reftest, and 245 on reftest-noaccel.
![]() |
Assignee | |
Updated•9 years ago
|
Flags: needinfo?(jmathies)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 52•9 years ago
|
||
Win7 opt/PGO e10s Ru hidden on all trunk trees (and try, though it won't actually be running there until next week), click the "Excluded Jobs" link in the gray bar at the top of treeherder to see it.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 55•9 years ago
|
||
Interesting. I was about to update the exclusion to hide it on aurora, since we merged whatever is busted there this morning, but 10 retriggers there gave me 10 green runs.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Comment 61•9 years ago
|
||
An update here - on Windows the child process *is* launching, but for some reason it fails to load up content and operate correctly. Will continue digging.
07:24:46 INFO - REFTEST INFO | Checking for orphan ssltunnel processes...
07:24:46 INFO - REFTEST INFO | Checking for orphan xpcshell processes...
07:24:47 INFO - REFTEST INFO | Running with e10s: True
07:24:47 INFO - REFTEST INFO | Application command: C:\slave\test\build\application\firefox\firefox.exe -marionette -profile c:\users\cltbld\appdata\local\temp\tmpwydbjz.mozrunner
07:24:51 INFO - 1475763891578 Marionette INFO Listening on port 2828
07:24:52 INFO - GeckoChildProcessHost::LaunchAndWaitForProcessHandle()
07:24:52 INFO - GeckoChildProcessHost::PrepareLaunch()
07:24:52 INFO - GeckoChildProcessHost::RunPerformAsyncLaunch()
07:24:52 INFO - GeckoChildProcessHost::PerformAsyncLaunch()
07:24:52 INFO - GeckoChildProcessHost::PerformAsyncLaunchInternal()
07:24:52 INFO - PerformAsyncLaunchInternal: GeckoProcessType_Content
07:24:52 INFO - call mSandboxBroker.LaunchApp...
07:24:52 INFO - ==> process 2160 launched child process 3124 ("C:\slave\test\build\application\firefox\firefox.exe" -contentproc --channel="2160.0.121296096\921340549" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" 2160 "\\.\pipe\gecko-crash-server-pipe.2160" tab)
(timeout)
Flags: needinfo?(jmathies)
![]() |
Assignee | |
Comment 62•9 years ago
|
||
This is caused by a mysteriously missing 'load' event reftest-content.js registers for when it loads [1]. I couldn't find any existing bugs on frame scripts failing to fire 'load', so maybe this is something new. I need to dig a bit further.
try log [2]:
05:48:05 INFO - REFTEST INFO | Checking for orphan ssltunnel processes...
05:48:05 INFO - REFTEST INFO | Checking for orphan xpcshell processes...
05:48:06 INFO - REFTEST INFO | Running with e10s: True
05:48:06 INFO - REFTEST INFO | Application command: C:\slave\test\build\application\firefox\firefox.exe -marionette -profile c:\users\cltbld\appdata\local\temp\tmpffuyrr.mozrunner
05:48:06 INFO - **** firefox.exe child process launched
05:48:10 INFO - 1475930890470 Marionette INFO Listening on port 2828
05:48:11 INFO - GeckoChildProcessHost::LaunchAndWaitForProcessHandle()
05:48:11 INFO - GeckoChildProcessHost::PrepareLaunch()
05:48:11 INFO - GeckoChildProcessHost::RunPerformAsyncLaunch()
05:48:11 INFO - GeckoChildProcessHost::PerformAsyncLaunch()
05:48:11 INFO - GeckoChildProcessHost::PerformAsyncLaunchInternal()
05:48:11 INFO - PerformAsyncLaunchInternal: GeckoProcessType_Content
05:48:11 INFO - call mSandboxBroker.LaunchApp...
05:48:11 INFO - ==> process 3420 launched child process 1840 ("C:\slave\test\build\application\firefox\firefox.exe" -contentproc --channel="3420.0.1033849503\316985519" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" 3420 "\\.\pipe\gecko-crash-server-pipe.3420" tab)
05:48:11 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:11 INFO - TabParent::SendLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:11 INFO - TabParent::SendLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:11 INFO - **** firefox.exe child process launched
05:48:11 INFO - TabParent::SendLoadRemoteScript url='chrome://gfxsanity/content/gfxFrameScript.js'
05:48:11 INFO - dropping into loop->Run()
05:48:11 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:11 INFO - TabChild::RecvLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:11 INFO - TabChild::RecvLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:11 INFO - TabChild::RecvLoadRemoteScript url='chrome://gfxsanity/content/gfxFrameScript.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/browser-child.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/select-child.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://browser/content/tab-content.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://browser/content/content.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://browser/content/content-UITour.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/manifestMessages.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://browser/content/content-sessionStore.js'
05:48:12 INFO - TabParent::SendLoadRemoteScript url='chrome://marionette/content/listener.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/browser-child.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/select-child.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://browser/content/tab-content.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://browser/content/content.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://browser/content/content-UITour.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/manifestMessages.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://browser/content/content-sessionStore.js'
05:48:12 INFO - TabChild::RecvLoadRemoteScript url='chrome://marionette/content/listener.js'
05:48:14 INFO - REFTEST INFO | function OnRefTestLoad(win)
05:48:14 INFO - TabParent::SendLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:14 INFO - TabParent::SendLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:14 INFO - TabParent::SendLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:14 INFO - TabChild::RecvLoadRemoteScript url='chrome://global/content/browser-content.js'
05:48:14 INFO - TabChild::RecvLoadRemoteScript url='resource://gre/modules/addons/Content.js'
05:48:14 INFO - TabChild::RecvLoadRemoteScript url='chrome://satchel/content/formSubmitListener.js'
05:48:14 INFO - REFTEST INFO | function RegisterMessageListenersAndLoadContentScript()
05:48:14 INFO - REFTEST INFO | Loading reftest-content.js frame script...
05:48:14 INFO - TabParent::SendLoadRemoteScript url='chrome://reftest/content/reftest-content.js'
05:48:14 INFO - TabChild::RecvLoadRemoteScript url='chrome://reftest/content/reftest-content.js'
05:48:14 INFO - REFTEST INFO | [CONTENT] reftest-content.js: i'm loaded!
05:53:44 ERROR - REFTEST ERROR | reftest | application timed out after 330 seconds with no output
[1] http://searchfox.org/mozilla-central/rev/c635b8c61d648bb8a0317c19f8905b3be8132a8a/layout/tools/reftest/reftest-content.js#1157
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=29f88c9234fbbeee8a19d1d06489b6b205e29d53
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Comment 64•9 years ago
|
||
Assignee: nobody → jmathies
![]() |
Assignee | |
Comment 65•9 years ago
|
||
This isn't a browser bug, it's a harness bug. The load of reftest-content.js frame script sometimes happens after the load event for content fires so reftest-content.js never gets started and we hang on test startup. The fix is to check to see if the load event has fired, and if so, go ahead and get the test suite running. Otherwise, keep the old load event as a starting trigger.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e36bf38727ce3bdadb91ae29525c2f5fbd0f5b5e
Attachment #8799781 -
Attachment is obsolete: true
Attachment #8799889 -
Flags: review?(jmaher)
Comment 66•9 years ago
|
||
Comment on attachment 8799889 [details] [diff] [review]
fix
Review of attachment 8799889 [details] [diff] [review]:
-----------------------------------------------------------------
good find!
Attachment #8799889 -
Flags: review?(jmaher) → review+
![]() |
Assignee | |
Comment 67•9 years ago
|
||
![]() |
Assignee | |
Updated•9 years ago
|
Keywords: checkin-needed
Comment 68•9 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb296adbce78
Reftest harness should check readyState and continue if the document state is 'complete'. r=jmaher
Keywords: checkin-needed
![]() |
Assignee | |
Comment 69•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #52)
> Win7 opt/PGO e10s Ru hidden on all trunk trees (and try, though it won't
> actually be running there until next week), click the "Excluded Jobs" link
> in the gray bar at the top of treeherder to see it.
Should I file a bug on undoing hiding of tests now that we (hopefully) have this fixed?
Flags: needinfo?(philringnalda)
Comment 70•9 years ago
|
||
Nope, that would be bug 1301905, you just have to wait until it has at least made it to mozilla-central and thus around to other branches, if not given it a day or so to bake after that, and then poke at me there.
Flags: needinfo?(philringnalda)
Comment 71•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Reporter | ||
Comment 72•9 years ago
|
||
Thanks for tracking this down, Jim!
Whiteboard: [e10s-orangeblockers][checkin-needed-aurora][checkin-needed-beta]
Comment 73•9 years ago
|
||
bugherder uplift |
Whiteboard: [e10s-orangeblockers][checkin-needed-aurora][checkin-needed-beta] → [e10s-orangeblockers][checkin-needed-beta]
Comment 74•9 years ago
|
||
Whiteboard: [e10s-orangeblockers][checkin-needed-beta] → [e10s-orangeblockers]
Updated•9 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•