991262 - Intermittent issue with blank homescreen during gaia-ui test

Reporter

Description

•

11 years ago

Attached file Logcat of the run — Details

This issue appears when running test_homescreen_delete_app.py multiple times. At some point (often the 2nd run), when the test starts the homescreen on the device remains blank, just showing the background. The test waits for quite some time and then fails with the following message(s): TEST-START test_homescreen_delete_app.py test_delete_app (test_homescreen_delete_app.TestDeleteApp) ... ERROR ====================================================================== ERROR: None ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 145, in run self.setUp() File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/homescreen/test_homescreen_delete_app.py", line 20, in setUp self.apps.switch_to_displayed_app() File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 74, in switch_to_displayed_app self.marionette.switch_to_frame(self.displayed_app.frame) File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 895, in switch_to_frame response = self._send_message('switchToFrame', 'ok', element=frame.id, focus=focus) File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 604, in _send_message "Connection timed out", status=ErrorCodes.TIMEOUT) TimeoutException: TimeoutException: Connection timed out Most recent errors/exceptions are: 04-02 15:41:37.180 I/Gecko ( 1267): 1396467697194 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {3b555fd2-7ddf-492d-b7b9-a7555065cec0}, {3b555fd2-7ddf-492d-b7b9-a7555065cec0} 04-02 15:41:37.340 I/Gecko ( 1267): 1396467697356 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {1ff09116-47c9-4199-b41a-9627a6a111cd}, {1ff09116-47c9-4199-b41a-9627a6a111cd} 04-02 15:41:37.620 I/Gecko ( 1267): 1396467697630 Marionette INFO sendToClient: {"from":"0","error":{"message":"Unable to locate element: div.appWindow.active","status":7,"stacktrace":null}}, {8c505fe1-5e63-46cb-8f35-3b083190d405}, {8c505fe1-5e63-46cb-8f35-3b083190d405} 04-02 15:41:38.580 I/Gecko ( 1267): A coding exception was thrown and uncaught in a Task. 04-02 15:41:38.580 I/Gecko ( 1267): Full message: TypeError: aIdsApp is undefined TEST-UNEXPECTED-FAIL | test_homescreen_delete_app.py test_homescreen_delete_app.TestDeleteApp.test_delete_app | The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure. ---------------------------------------------------------------------- Ran 1 test in 394.265s FAILED (errors=1) After this the homescreen is unresponsive to taps and flicks, but if you tap the home button a few times the homescreen appears, but with overly large icons. See Stephen's video of this at https://www.youtube.com/watch?v=124hCO_EOpY&feature=youtu.be I also reproduced this locally and am attaching the logcat from the run on my device which is a Hamachi, with the following build: Gaia 6274b3645b3d41df4aa5a55398b5d106edf1c528 Gecko https://hg.mozilla.org/mozilla-central/rev/4941a2ac0786 BuildID 20140402040201 Version 31.0a1 ro.build.version.incremental=eng.archermind.20131114.105818 ro.build.date=Thu Nov 14 10:58:33 CST 2013

Stephen Donner [:stephend] Not actively reading bugmail

Comment 1

•

11 years ago

Jonathan, would you mind having someone from your team investigate this from the (potential) Marionette side, and/or help us figure out where to go next? Thx :-)

Flags: needinfo?(jgriffin)

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

11 years ago

Severity: normal → major

OS: Mac OS X → Gonk (Firefox OS)

Hardware: x86 → ARM

No-Jun Park [:njpark]

Comment 2

•

11 years ago

Attached file about-memory-0.tar.gz — Details

this was obtained by running MOZ_IGNORE_NUWA_PROCESS=1 ./get_about_memory.py right before the bug occurred.

Jonathan Griffin (:jgriffin)

Comment 3

•

11 years ago

There's an error being triggered here: 04-02 15:41:38.580 I/Gecko ( 1267): ************************* 04-02 15:41:38.580 I/Gecko ( 1267): A coding exception was thrown and uncaught in a Task. 04-02 15:41:38.580 I/Gecko ( 1267): 04-02 15:41:38.580 I/Gecko ( 1267): Full message: TypeError: aIdsApp is undefined 04-02 15:41:38.580 I/Gecko ( 1267): Full stack: this.OperatorAppsRegistry._installOperatorApps/<@resource://gre/modules/OperatorApps.jsm:317:23 04-02 15:41:38.580 I/Gecko ( 1267): TaskImpl_run@resource://gre/modules/Task.jsm:298:1 04-02 15:41:38.580 I/Gecko ( 1267): resolve@resource://gre/modules/commonjs/sdk/core/promise.js:118:11 04-02 15:41:38.580 I/Gecko ( 1267): then@resource://gre/modules/commonjs/sdk/core/promise.js:43:43 04-02 15:41:38.580 I/Gecko ( 1267): resolve@resource://gre/modules/commonjs/sdk/core/promise.js:185:11 04-02 15:41:38.580 I/Gecko ( 1267): TaskImpl_run@resource://gre/modules/Task.jsm:307:9 04-02 15:41:38.580 I/Gecko ( 1267): Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:748:11 04-02 15:41:38.580 I/Gecko ( 1267): this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:627:7 04-02 15:41:38.580 I/Gecko ( 1267): 04-02 15:41:38.580 I/Gecko ( 1267): ************************* Moving this bug over to gecko.

Component: Gaia::UI Tests → DOM: Apps

Flags: needinfo?(jgriffin)

Product: Firefox OS → Core

Myk Melez [:myk] [@mykmelez]

Comment 4

•

11 years ago

Sounds like a dupe of bug 991246.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Stephen Donner [:stephend] Not actively reading bugmail

Comment 5

•

11 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #4) > Sounds like a dupe of bug 991246. > > *** This bug has been marked as a duplicate of bug 991246 *** Although our failure has gwagner's filed bug in it, as well, he says it "shouldn't be a problem...always shows up during startup, but works." I'm cautiously reopening so we can see, post-bug 991246 landing, if it still happens.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Stephen Donner [:stephend] Not actively reading bugmail

Comment 6

•

11 years ago

Bob/team, does this still happen? I note there's also bug 992589 still out.

[:AndreiH]

Comment 7

•

11 years ago

This does not reproduces for me. Tested on the latest Hamachi build: Gaia 650e8c2c611ed07495d3bf3769f44a0efd88a492 Gecko https://hg.mozilla.org/mozilla-central/rev/5811efc11011 BuildID 20140408160203 Version 31.0a1

Bob Silverberg [:bsilverberg]

Reporter

Comment 8

•

11 years ago

I just reproduced it on the second run of test_homescreen_delete_app with today's build: Gaia 650e8c2c611ed07495d3bf3769f44a0efd88a492 Gecko https://hg.mozilla.org/mozilla-central/rev/5811efc11011 BuildID 20140408160203 Version 31.0a1 ro.build.version.incremental=eng.archermind.20131114.105818 ro.build.date=Thu Nov 14 10:58:33 CST 2013

No-Jun Park [:njpark]

Comment 9

•

11 years ago

I saw it too on the same build from Comment 8. TestCleanupGaia.test_cleanup_gaia fails in b2g.hamachi.mozilla-central.unittests fails because of this I believe.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

11 years ago

Blocks: 801898

No-Jun Park [:njpark]

Comment 10

•

11 years ago

This can also be reproduced when test_killall.py in the unittest folder is executed alone. The test may complete successfully, but afterwards, the phone shows no icons. Phone contained this version: Gaia 553da99ab09b6b894d9f95bb06b16b6e1ddbf0a1 Gecko https://hg.mozilla.org/mozilla-central/rev/5b6e82e7bbbf BuildID 20140415040201 Version 31.0a1 ro.build.version.incremental=eng.tclxa.20131223.163538 ro.build.date=Mon Dec 23 16:36:04 CST 2013

No-Jun Park [:njpark]

Comment 11

•

11 years ago

Also, test_cleanup_gaia (unit test) fails (bug 997552) because of this bug. When the script is executed , the homescreen hangs upon restart.

No-Jun Park [:njpark]

Updated

•

11 years ago

Blocks: 997552

Stephen Donner [:stephend] Not actively reading bugmail

Comment 12

•

11 years ago

Attached file adb shell b2g-info output — Details

Stephen Donner [:stephend] Not actively reading bugmail

Comment 13

•

11 years ago

Attached file logcat from taking a screenshot while in this "empty homescreen state" — Details

Stephen Donner [:stephend] Not actively reading bugmail

Comment 14

•

11 years ago

Attached image screenshot of homescreen, after taking screenshot of homescreen :-( — Details

Ben Kelly [:bkelly, not reviewing]

Comment 15

•

11 years ago

Can you get a dmesg at the same point in the process that the b2g-info from comment 12 was taken?

Stephen Donner [:stephend] Not actively reading bugmail

Comment 16

•

11 years ago

Attached file adb shell b2g-info.txt — Details

This is from a single, recent run of test_cleanup_gaia.py, directly after a branch-new flash.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

11 years ago

Flags: needinfo?(bkelly)

Stephen Donner [:stephend] Not actively reading bugmail

Comment 17

•

11 years ago

Attached file adb shell dmesg.txt — Details

This is the corresponding adb shell dmesg output of that recent, single run of test_cleanup_gaia.py, after a fresh flash of today's Hamachi m-c/master build.

Ben Kelly [:bkelly, not reviewing]

Comment 18

•

11 years ago

I noticed that the end of the first test_homescreen_delete_app.py leaves the homescreen in the "move-or-kill app" mode. The next test restart then fails to launch homescreen. If you manually enter the "move-or-kill app" mode, then similarly the next test restart fails to launch homescreen. I also say bug 998438 in "move-or-kill app" mode, but not sure if its related.

Ben Kelly [:bkelly, not reviewing]

Comment 19

•

11 years ago

Ok, I can work around the problem by adding a time.sleep(5) here: https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L722 It seems that perhaps the code to wait for the system to be ready is no longer adequate. This could be related to the major window manager refactor in bug 907013. Alive, what do you think of the code at the above link? Is it still correct for trying to block tests until the system is ready?

Flags: needinfo?(bkelly) → needinfo?(alive)

Updated

•

11 years ago

Comment 20

•

11 years ago

The 'appWindow.active' is the first opened app which should be FTU or homescreen. I need to know what does 'ready' mean for test so I can't tell more here. What does test really do after it thinks device is 'ready'?

Flags: needinfo?(alive)

Dimi Lee [:dimi]

Updated

•

11 years ago

Blocks: 987760

Zac C (:zac)

Comment 21

•

11 years ago

(In reply to Alive Kuo [:alive][NEEDINFO!][God bless Taiwan.] from comment #20) > The 'appWindow.active' is the first opened app which should be FTU or > homescreen. > I need to know what does 'ready' mean for test so I can't tell more here. > What does test really do after it thinks device is 'ready'? 9/10 the first thing a test does is launch an app so we need to wait until we can use the AppWindowManager safely to launch an app. but also it's good to wait for the RIL and other processes to have started up. This is much more complex than the js tests are dealing with because the device is slower and more hardware and diverse test cases. desktopb2g doesn't have these dependencies so it's a lot harder to deal with on the device. I don't think bug 907013 is to blame because I only put the appWindow.active wait into gaiatest (linked to in comment#19) after that was merged by Alive. This wait ( appWindow.active == displayed) used to only resolve when the FTU or homescreen was showing but now I see it resolving even when the Firefox OS animation is still present. The timing has definitely changed here recently, it could be several things. I don't mind changing what we wait for here provided it is reliable. but we can't really wait for a Javascript event because there are too many spurious javascript errors in the same timeframe that Marionette catches instead. Which is why we are using appWindow.active, it's at the DOM level and not related to javascript errors.

Jason Smith [:jsmith]

Updated

•

11 years ago

No longer blocks: 987760

Kevin Grandon :kgrandon

Comment 22

•

11 years ago

Have we tried the patch in comment 19 yet? I'm also wondering if we can get around this by changing the selector a bit? Are we positive that we are getting the right appWindow? We could also try something like: locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe')

Stephen Donner [:stephend] Not actively reading bugmail

Comment 23

•

11 years ago

(In reply to Kevin Grandon :kgrandon from comment #22) > Have we tried the patch in comment 19 yet? I'm also wondering if we can get > around this by changing the selector a bit? Are we positive that we are > getting the right appWindow? We could also try something like: > > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe') :mdas and/or :njpark, can we get help with trying the patch, first, and then -- if needed -- the other suggestion(s) here? Thanks!

Flags: needinfo?(npark)

Zac C (:zac)

Comment 24

•

11 years ago

(In reply to Kevin Grandon :kgrandon from comment #22) > Have we tried the patch in comment 19 yet? I'm also wondering if we can get > around this by changing the selector a bit? Are we positive that we are > getting the right appWindow? We could also try something like: > > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe') This will fail because the first app you get is actually the FTU. but it depends on the build (eg NOFTU=1) you're running too so we need to keep this flexible. Waiting for the first app, whether it be FTU or homescreen, has in the past been a pretty reliable way to know we're OK to start the setup. One thing perhaps worth trying is to wait for the splash screen element to have cleared, then wait for the appWindow.active.render. If you go adding sleeps please add them behind a device-only condition so we're not just adding unnecessary test duration to Travis and TBPL.

No-Jun Park [:njpark]

Comment 25

•

11 years ago

(In reply to Stephen Donner [:stephend] from comment #23) > (In reply to Kevin Grandon :kgrandon from comment #22) > > Have we tried the patch in comment 19 yet? I'm also wondering if we can get > > around this by changing the selector a bit? Are we positive that we are > > getting the right appWindow? We could also try something like: > > > > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe') > > :mdas and/or :njpark, can we get help with trying the patch, first, and then > -- if needed -- the other suggestion(s) here? Thanks! Adding extra sleep in line 722 didn't work, and when the locator value is changed, the script failed with nosuchelementfound exception, because strangely, when that change is added, the screen is in FTE, not showing the homescreen. this is in start_b2g, and clearing the FTE flag happens in cleanup_gaia method, which gets invoked after start_b2g is completed. Also, test_homescreen_delete_app.py fails consistently now, as well as test_cleanup_gaia.py

Flags: needinfo?(npark)

No-Jun Park [:njpark]

Comment 26

•

11 years ago

(In reply to npark from comment #25) > (In reply to Stephen Donner [:stephend] from comment #23) > > (In reply to Kevin Grandon :kgrandon from comment #22) > > > Have we tried the patch in comment 19 yet? I'm also wondering if we can get > > > around this by changing the selector a bit? Are we positive that we are > > > getting the right appWindow? We could also try something like: > > > > > > locator = (By.CSS_SELECTOR, 'div.appWindow.active.homescreen iframe') > > > > :mdas and/or :njpark, can we get help with trying the patch, first, and then > > -- if needed -- the other suggestion(s) here? Thanks! > > Adding extra sleep in line 722 didn't work, and when the locator value is > changed, the script failed with nosuchelementfound exception, because > strangely, when that change is added, the screen is in FTE, not showing the > homescreen. this is in start_b2g, and clearing the FTE flag happens in > cleanup_gaia method, which gets invoked after start_b2g is completed. > > Also, test_homescreen_delete_app.py fails consistently now, as well as > test_cleanup_gaia.py Oh, comment 24 explains exactly why it happens.

No-Jun Park [:njpark]

Comment 27

•

11 years ago

It turns out you need to add sleep of minimum 25 seconds here: https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L717. I measured the time it takes between when self.marionette.start_session() gets invoked until FTE is shown, by manually pausing the script by adding raw_input"Pausing Here" command after start_session(). on today's latest master, it takes 25-26 seconds before it shows FTE screen. The odd thing is, during the bootup, it shows "based on mozilla technology" logo twice. first with black background, then a moment of dark blank screen, then with white background. After that, FTE shows up. I thought it used to just show once. When I added sleep of 27 seconds, it was passing consistently. This is after removing sleep(10) inside the setUp() in gaia_test.py for 990580. There is a possible race condition here, probably with something that either start_b2g or setUp() invokes.

Kevin Grandon :kgrandon

Comment 28

•

11 years ago

Hey Alive - We were investigating this in SF today, but were wondering if you could take over as it's getting late in the day. Basically there is a race condition between FTU, app window manager/factory, and the runapp CLI command. This is only really visibile when running automated tests, or using the runapp CLI command. Normally I'd say that the tests should just be fixed to use the interface like a normal human, but due to the fact that we have this documented CLI command, we should ensure it works properly. I started working on a patch here, but it's not quite finished, and I'm sure you might have some better ways of doing things: https://github.com/mozilla-b2g/gaia/pull/18509 Please let me know if you can take this, otherwise I might give it another shot again tomorrow. Thanks!

Flags: needinfo?(alive)

Zac C (:zac)

Comment 29

•

11 years ago

Thanks Kevin and Alive.. Doing this in Gaia will help when we move the js tests onto device, too! I do feel like something has regressed here in the last 2-3 weeks or so, but hard to see what it is. If we can shore it up via a patch like this, I'd be happy.

Zac C (:zac)

Comment 30

•

11 years ago

(In reply to Zac C (:zac) from comment #24) > One thing perhaps worth trying is to wait for the splash screen element to > have cleared, then wait for the appWindow.active.render. I just experimented with this and the #os-logo element behaves like this; 1. not in dom, 2. in dom 3. 'hide' class attached 4. deleted from dom. Waiting for that is a recipe for unreliability and #1 and #4 match each other, so this option is out.

(inactive after 6/18) Alive Kuo [:alive]

Comment 31

•

11 years ago

I am trapped by other bugs today. Will take this tomorrow.

Zac C (:zac)

Comment 32

•

11 years ago

I think this was resolved by the resolution of bug 990580. Kevin/Alive do you think there is more work here that can be done long term? should we break it out into another bug, or re-title this one?

Kevin Grandon :kgrandon

Comment 33

•

11 years ago

If it's working for you guys I'm fine with closing this for now. I'm not convinced that all of the edge cases have been solved, but if there are remaining issues please reopen this bug.

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Flags: needinfo?(alive)

Resolution: --- → WORKSFORME

Stephen Donner [:stephend] Not actively reading bugmail

Comment 34

•

11 years ago

My money's on bug 992752 - the timeframe, the changes, and the problem as described seem to fit us, precisely, here.

BMO Automation

Updated

•

7 years ago

Product: Core → Core Graveyard

Logcat of the run 11 years ago Bob Silverberg [:bsilverberg] 415.95 KB, text/plain		Details
about-memory-0.tar.gz 11 years ago No-Jun Park [:njpark] 2.06 MB, application/x-gzip		Details
adb shell b2g-info output 11 years ago Stephen Donner [:stephend] Not actively reading bugmail 922 bytes, text/plain		Details
logcat from taking a screenshot while in this "empty homescreen state" 11 years ago Stephen Donner [:stephend] Not actively reading bugmail 3.99 KB, text/plain		Details
screenshot of homescreen, after taking screenshot of homescreen :-( 11 years ago Stephen Donner [:stephend] Not actively reading bugmail 265.09 KB, image/png		Details
adb shell b2g-info.txt 11 years ago Stephen Donner [:stephend] Not actively reading bugmail 887 bytes, text/plain		Details
adb shell dmesg.txt 11 years ago Stephen Donner [:stephend] Not actively reading bugmail 44.59 KB, text/plain		Details