Closed Bug 831010 Opened 9 years ago Closed 7 years ago

Marionette start_session fails if called too early after gecko launch


(Testing :: Marionette, defect)

Not set


(Not tracked)



(Reporter: jgriffin, Assigned: jlal)




(3 files)

A recent patch of mine on bug 826111 was backed out because it caused desktop Mn tests to go orange.  The cause was a change of time.sleep(5) to time.sleep(1) in wait_for_port.  The error this produced was:

Apparently, we are depending on something in newSession that isn't available right away at browser launch, and we're not detecting or waiting on this properly.
Summary: Marionette start_session fails if called too early after browser launch → Marionette start_session fails if called too early after gecko launch
This happens in b2g as well. For b2g, we need to figure out when gaia is ready to be used, and only then should we respond to newSession
Assignee: nobody → jlal
So even with the measures we take in the gaia-ui-tests I believe we hit similar problems (or the same problem) though that example seems to go one step further

 - The loadFrameScript call almost always happens before the system app loads (the iframe is not even generated yet) 

 - Some portions of these bugs are "crashes" and don't actually fall into this category of bug I am trying to fix (hangs basically without a crash)

 - Waiting for the system app iframe to become available seems to help (I have no idea why yet)
A little early to tell but another symtom here is the systemapp frame is never created in (some) cases of failure!
So shell.js is never "started" (see the start method) probably a settings db bug or something further up.
Hrm so maybe this is totally unrelated to the title of this bug but the problem I see is actually the settingsdb is not opened yet. Log: you can see  "Try to open database:settings 5" but never actually successfully opening. I also see indications that the settings db is upgrading.
The settingsdb is a red herring... Increasing the timeout does eventually show that shell.js starts but this does not seem to effect marionette actually by increasing the timeout I think I have a better hint at what is going on. It appears that the marionette server is unresponsive (I have literally no idea why) and the listener is never getting created.
Looks like loadFrameScript is misbehaving or something (not clear yet) from this log its clear that we do in fact call loadFrameScript but its never running any of the meaningful bits of loading the scripts (a few not just the listener) with the exception of preload.js
See Also: → 1037924
This is likely the same problem that we are seeing in bug 1037924 and the thing that was preventing us from un-hiding Gij.
My latest try:

To debug this I have been been doing something like this to debug:

-> hardcode a gecko build (in make b2g) to the gecko try build


b2g: node_modules/.bin/mozilla-download
	curl | tar xj

-> Create a pull request in gaia and observe results in for your PR.

-> Analyze failures which have "before suite" related issues ( I usually trace the logs from newSession down)

-> Add more printf/debug/statements repeat
Attached file normal_settings.log
Normal settings db startup log
Attached file abnormal_settings.log
Settings db startup logs inn the case of session startup failures
It's hard to tell if this is a symptom or the problem but clearly startup is broken here. Trace through "language.current" in both logs and you will see in the error case we never communicate back the settings data for language.current (which in turn prevents shell.js from starting). I thought this was a race (it may still be) but I am allowing a 6 minute! timeout here for shell.js to start.
Ah forgot to turn on SettingsRequestManager DEBUG=true (/me feeling more motivated to make these "loggers" friendly to runtime)
Attached file abnormal2.log
Attached log with "Cannot find lock ID" errors... I think this is the bug (or at least a bug) ! When the lock ID cannot be found we essentially just freeze without a response to the lock requester... Now to figure out why we cannot find the lock id...
This is interesting... In the constructor of the "SettingsLock" from SettingsManager.js we send "CreateLock" I clearly see a bunch of these objects being created (more then 3) but only 3 actually get processed by SettingsRequestManager.jsm. So these messages are getting lost somehow...
Depends on: 1089405
Note the updated context is here 1089405 once that bug lands this should go away... going to leave this open until we are sure this gone (give it a week of TBPL).
have not seen this since landing 1089405 marking as fixed.
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 1091685
You need to log in before you can comment on or make changes to this bug.