Closed
Bug 991262
Opened 11 years ago
Closed 10 years ago
Intermittent issue with blank homescreen during gaia-ui test
Categories
(Core Graveyard :: DOM: Apps, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: bsilverberg, Unassigned)
References
Details
Attachments
(7 files)
This issue appears when running test_homescreen_delete_app.py multiple times. At some point (often the 2nd run), when the test starts the homescreen on the device remains blank, just showing the background. The test waits for quite some time and then fails with the following message(s):
TEST-START test_homescreen_delete_app.py
test_delete_app (test_homescreen_delete_app.TestDeleteApp) ...
ERROR
======================================================================
ERROR: None
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 145, in run
self.setUp()
File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/homescreen/test_homescreen_delete_app.py", line 20, in setUp
self.apps.switch_to_displayed_app()
File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 74, in switch_to_displayed_app
self.marionette.switch_to_frame(self.displayed_app.frame)
File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 895, in switch_to_frame
response = self._send_message('switchToFrame', 'ok', element=frame.id, focus=focus)
File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 604, in _send_message
"Connection timed out", status=ErrorCodes.TIMEOUT)
TimeoutException: TimeoutException: Connection timed out
Most recent errors/exceptions are:
04-02 15:41:37.180 I/Gecko ( 1267): 1396467697194 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {3b555fd2-7ddf-492d-b7b9-a7555065cec0}, {3b555fd2-7ddf-492d-b7b9-a7555065cec0}
04-02 15:41:37.340 I/Gecko ( 1267): 1396467697356 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {1ff09116-47c9-4199-b41a-9627a6a111cd}, {1ff09116-47c9-4199-b41a-9627a6a111cd}
04-02 15:41:37.620 I/Gecko ( 1267): 1396467697630 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {8c505fe1-5e63-46cb-8f35-3b083190d405}, {8c505fe1-5e63-46cb-8f35-3b083190d405}
04-02 15:41:38.580 I/Gecko ( 1267): A coding exception was thrown and uncaught in a Task.
04-02 15:41:38.580 I/Gecko ( 1267): Full message: TypeError: aIdsApp is undefined
TEST-UNEXPECTED-FAIL | test_homescreen_delete_app.py test_homescreen_delete_app.TestDeleteApp.test_delete_app | The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure.
----------------------------------------------------------------------
Ran 1 test in 394.265s
FAILED (errors=1)
After this the homescreen is unresponsive to taps and flicks, but if you tap the home button a few times the homescreen appears, but with overly large icons.
See Stephen's video of this at https://www.youtube.com/watch?v=124hCO_EOpY&feature=youtu.be
I also reproduced this locally and am attaching the logcat from the run on my device which is a Hamachi, with the following build:
Gaia 6274b3645b3d41df4aa5a55398b5d106edf1c528
Gecko https://hg.mozilla.org/mozilla-central/rev/4941a2ac0786
BuildID 20140402040201
Version 31.0a1
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013
Comment 1•11 years ago
|
||
Jonathan, would you mind having someone from your team investigate this from the (potential) Marionette side, and/or help us figure out where to go next? Thx :-)
Flags: needinfo?(jgriffin)
Updated•11 years ago
|
Severity: normal → major
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
Comment 2•11 years ago
|
||
this was obtained by running MOZ_IGNORE_NUWA_PROCESS=1 ./get_about_memory.py right before the bug occurred.
Comment 3•11 years ago
|
||
There's an error being triggered here:
04-02 15:41:38.580 I/Gecko ( 1267): *************************
04-02 15:41:38.580 I/Gecko ( 1267): A coding exception was thrown and uncaught in a Task.
04-02 15:41:38.580 I/Gecko ( 1267):
04-02 15:41:38.580 I/Gecko ( 1267): Full message: TypeError: aIdsApp is undefined
04-02 15:41:38.580 I/Gecko ( 1267): Full stack: this.OperatorAppsRegistry._installOperatorApps/<@resource://gre/modules/OperatorApps.jsm:317:23
04-02 15:41:38.580 I/Gecko ( 1267): TaskImpl_run@resource://gre/modules/Task.jsm:298:1
04-02 15:41:38.580 I/Gecko ( 1267): resolve@resource://gre/modules/commonjs/sdk/core/promise.js:118:11
04-02 15:41:38.580 I/Gecko ( 1267): then@resource://gre/modules/commonjs/sdk/core/promise.js:43:43
04-02 15:41:38.580 I/Gecko ( 1267): resolve@resource://gre/modules/commonjs/sdk/core/promise.js:185:11
04-02 15:41:38.580 I/Gecko ( 1267): TaskImpl_run@resource://gre/modules/Task.jsm:307:9
04-02 15:41:38.580 I/Gecko ( 1267): Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:748:11
04-02 15:41:38.580 I/Gecko ( 1267): this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:627:7
04-02 15:41:38.580 I/Gecko ( 1267):
04-02 15:41:38.580 I/Gecko ( 1267): *************************
Moving this bug over to gecko.
Component: Gaia::UI Tests → DOM: Apps
Flags: needinfo?(jgriffin)
Product: Firefox OS → Core
Comment 4•11 years ago
|
||
Sounds like a dupe of bug 991246.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment 5•11 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #4)
> Sounds like a dupe of bug 991246.
>
> *** This bug has been marked as a duplicate of bug 991246 ***
Although our failure has gwagner's filed bug in it, as well, he says it "shouldn't be a problem...always shows up during startup, but works."
I'm cautiously reopening so we can see, post-bug 991246 landing, if it still happens.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 6•11 years ago
|
||
Bob/team, does this still happen? I note there's also bug 992589 still out.
Comment 7•11 years ago
|
||
This does not reproduces for me. Tested on the latest Hamachi build:
Gaia 650e8c2c611ed07495d3bf3769f44a0efd88a492
Gecko https://hg.mozilla.org/mozilla-central/rev/5811efc11011
BuildID 20140408160203
Version 31.0a1
Reporter | ||
Comment 8•11 years ago
|
||
I just reproduced it on the second run of test_homescreen_delete_app with today's build:
Gaia 650e8c2c611ed07495d3bf3769f44a0efd88a492
Gecko https://hg.mozilla.org/mozilla-central/rev/5811efc11011
BuildID 20140408160203
Version 31.0a1
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013
Comment 9•11 years ago
|
||
I saw it too on the same build from Comment 8. TestCleanupGaia.test_cleanup_gaia fails in b2g.hamachi.mozilla-central.unittests fails because of this I believe.
Comment 10•10 years ago
|
||
This can also be reproduced when test_killall.py in the unittest folder is executed alone. The test may complete successfully, but afterwards, the phone shows no icons.
Phone contained this version:
Gaia 553da99ab09b6b894d9f95bb06b16b6e1ddbf0a1
Gecko https://hg.mozilla.org/mozilla-central/rev/5b6e82e7bbbf
BuildID 20140415040201
Version 31.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Comment 11•10 years ago
|
||
Also, test_cleanup_gaia (unit test) fails (bug 997552) because of this bug. When the script is executed , the homescreen hangs upon restart.
Comment 15•10 years ago
|
||
Can you get a dmesg at the same point in the process that the b2g-info from comment 12 was taken?
This is from a single, recent run of test_cleanup_gaia.py, directly after a branch-new flash.
Updated•10 years ago
|
Flags: needinfo?(bkelly)
This is the corresponding adb shell dmesg output of that recent, single run of test_cleanup_gaia.py, after a fresh flash of today's Hamachi m-c/master build.
Comment 18•10 years ago
|
||
I noticed that the end of the first test_homescreen_delete_app.py leaves the homescreen in the "move-or-kill app" mode. The next test restart then fails to launch homescreen.
If you manually enter the "move-or-kill app" mode, then similarly the next test restart fails to launch homescreen.
I also say bug 998438 in "move-or-kill app" mode, but not sure if its related.
Comment 19•10 years ago
|
||
Ok, I can work around the problem by adding a time.sleep(5) here:
https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L722
It seems that perhaps the code to wait for the system to be ready is no longer adequate. This could be related to the major window manager refactor in bug 907013.
Alive, what do you think of the code at the above link? Is it still correct for trying to block tests until the system is ready?
Flags: needinfo?(bkelly) → needinfo?(alive)
See Also: → app-window-manager
Comment 20•10 years ago
|
||
The 'appWindow.active' is the first opened app which should be FTU or homescreen.
I need to know what does 'ready' mean for test so I can't tell more here.
What does test really do after it thinks device is 'ready'?
Flags: needinfo?(alive)
Comment 21•10 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!][God bless Taiwan.] from comment #20)
> The 'appWindow.active' is the first opened app which should be FTU or
> homescreen.
> I need to know what does 'ready' mean for test so I can't tell more here.
> What does test really do after it thinks device is 'ready'?
9/10 the first thing a test does is launch an app so we need to wait until we can use the AppWindowManager safely to launch an app. but also it's good to wait for the RIL and other processes to have started up. This is much more complex than the js tests are dealing with because the device is slower and more hardware and diverse test cases. desktopb2g doesn't have these dependencies so it's a lot harder to deal with on the device.
I don't think bug 907013 is to blame because I only put the appWindow.active wait into gaiatest (linked to in comment#19) after that was merged by Alive.
This wait ( appWindow.active == displayed) used to only resolve when the FTU or homescreen was showing but now I see it resolving even when the Firefox OS animation is still present. The timing has definitely changed here recently, it could be several things.
I don't mind changing what we wait for here provided it is reliable. but we can't really wait for a Javascript event because there are too many spurious javascript errors in the same timeframe that Marionette catches instead. Which is why we are using appWindow.active, it's at the DOM level and not related to javascript errors.
Comment 22•10 years ago
|
||
Have we tried the patch in comment 19 yet? I'm also wondering if we can get around this by changing the selector a bit? Are we positive that we are getting the right appWindow? We could also try something like:
locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')
(In reply to Kevin Grandon :kgrandon from comment #22)
> Have we tried the patch in comment 19 yet? I'm also wondering if we can get
> around this by changing the selector a bit? Are we positive that we are
> getting the right appWindow? We could also try something like:
>
> locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')
:mdas and/or :njpark, can we get help with trying the patch, first, and then -- if needed -- the other suggestion(s) here? Thanks!
Flags: needinfo?(npark)
Comment 24•10 years ago
|
||
(In reply to Kevin Grandon :kgrandon from comment #22)
> Have we tried the patch in comment 19 yet? I'm also wondering if we can get
> around this by changing the selector a bit? Are we positive that we are
> getting the right appWindow? We could also try something like:
>
> locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')
This will fail because the first app you get is actually the FTU. but it depends on the build (eg NOFTU=1) you're running too so we need to keep this flexible. Waiting for the first app, whether it be FTU or homescreen, has in the past been a pretty reliable way to know we're OK to start the setup.
One thing perhaps worth trying is to wait for the splash screen element to have cleared, then wait for the appWindow.active.render.
If you go adding sleeps please add them behind a device-only condition so we're not just adding unnecessary test duration to Travis and TBPL.
Comment 25•10 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #23)
> (In reply to Kevin Grandon :kgrandon from comment #22)
> > Have we tried the patch in comment 19 yet? I'm also wondering if we can get
> > around this by changing the selector a bit? Are we positive that we are
> > getting the right appWindow? We could also try something like:
> >
> > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')
>
> :mdas and/or :njpark, can we get help with trying the patch, first, and then
> -- if needed -- the other suggestion(s) here? Thanks!
Adding extra sleep in line 722 didn't work, and when the locator value is changed, the script failed with nosuchelementfound exception, because strangely, when that change is added, the screen is in FTE, not showing the homescreen. this is in start_b2g, and clearing the FTE flag happens in cleanup_gaia method, which gets invoked after start_b2g is completed.
Also, test_homescreen_delete_app.py fails consistently now, as well as test_cleanup_gaia.py
Flags: needinfo?(npark)
Comment 26•10 years ago
|
||
(In reply to npark from comment #25)
> (In reply to Stephen Donner [:stephend] from comment #23)
> > (In reply to Kevin Grandon :kgrandon from comment #22)
> > > Have we tried the patch in comment 19 yet? I'm also wondering if we can get
> > > around this by changing the selector a bit? Are we positive that we are
> > > getting the right appWindow? We could also try something like:
> > >
> > > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')
> >
> > :mdas and/or :njpark, can we get help with trying the patch, first, and then
> > -- if needed -- the other suggestion(s) here? Thanks!
>
> Adding extra sleep in line 722 didn't work, and when the locator value is
> changed, the script failed with nosuchelementfound exception, because
> strangely, when that change is added, the screen is in FTE, not showing the
> homescreen. this is in start_b2g, and clearing the FTE flag happens in
> cleanup_gaia method, which gets invoked after start_b2g is completed.
>
> Also, test_homescreen_delete_app.py fails consistently now, as well as
> test_cleanup_gaia.py
Oh, comment 24 explains exactly why it happens.
Comment 27•10 years ago
|
||
It turns out you need to add sleep of minimum 25 seconds here:
https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L717.
I measured the time it takes between when self.marionette.start_session() gets invoked until FTE is shown, by manually pausing the script by adding raw_input"Pausing Here" command after start_session(). on today's latest master, it takes 25-26 seconds before it shows FTE screen.
The odd thing is, during the bootup, it shows "based on mozilla technology" logo twice. first with black background, then a moment of dark blank screen, then with white background. After that, FTE shows up. I thought it used to just show once.
When I added sleep of 27 seconds, it was passing consistently. This is after removing sleep(10) inside the setUp() in gaia_test.py for 990580.
There is a possible race condition here, probably with something that either start_b2g or setUp() invokes.
Comment 28•10 years ago
|
||
Hey Alive - We were investigating this in SF today, but were wondering if you could take over as it's getting late in the day.
Basically there is a race condition between FTU, app window manager/factory, and the runapp CLI command. This is only really visibile when running automated tests, or using the runapp CLI command. Normally I'd say that the tests should just be fixed to use the interface like a normal human, but due to the fact that we have this documented CLI command, we should ensure it works properly.
I started working on a patch here, but it's not quite finished, and I'm sure you might have some better ways of doing things: https://github.com/mozilla-b2g/gaia/pull/18509
Please let me know if you can take this, otherwise I might give it another shot again tomorrow. Thanks!
Flags: needinfo?(alive)
Comment 29•10 years ago
|
||
Thanks Kevin and Alive.. Doing this in Gaia will help when we move the js tests onto device, too!
I do feel like something has regressed here in the last 2-3 weeks or so, but hard to see what it is. If we can shore it up via a patch like this, I'd be happy.
Comment 30•10 years ago
|
||
(In reply to Zac C (:zac) from comment #24)
> One thing perhaps worth trying is to wait for the splash screen element to
> have cleared, then wait for the appWindow.active.render.
I just experimented with this and the #os-logo element behaves like this; 1. not in dom, 2. in dom 3. 'hide' class attached 4. deleted from dom.
Waiting for that is a recipe for unreliability and #1 and #4 match each other, so this option is out.
Comment 31•10 years ago
|
||
I am trapped by other bugs today. Will take this tomorrow.
Comment 32•10 years ago
|
||
I think this was resolved by the resolution of bug 990580.
Kevin/Alive do you think there is more work here that can be done long term? should we break it out into another bug, or re-title this one?
Comment 33•10 years ago
|
||
If it's working for you guys I'm fine with closing this for now. I'm not convinced that all of the edge cases have been solved, but if there are remaining issues please reopen this bug.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Flags: needinfo?(alive)
Resolution: --- → WORKSFORME
My money's on bug 992752 - the timeframe, the changes, and the problem as described seem to fit us, precisely, here.
Updated•7 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•