Closed Bug 955626 Opened 11 years ago Closed 10 years ago

Try to wait for b2g startup without execute_script

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect, P3)

Other
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zcampbell, Assigned: zcampbell)

References

Details

Attachments

(1 file)

46 bytes, text/x-github-pull-request
Details | Review
I'd like to revisit this as a task.

My rationale and for P1 is three-fold:

1) The failures are logged/interpreted by those who don't work on the suite as framework intermittents instead of Gaia bugs and thus reflect badly on what is otherwise a reliable test suite. The failures are not related to the reliability of the test suite.

2) Intermittents on Travis and TBPL cost devs and sheriffs productivity and confidence as they have to re-run the suite. In some cases the exception will be an early warning sign for broken Gaia functionality but in that scenario our test case will pick up broken Gaia functionality during the test run instead without compromising stability.

3) The gaiatest package is well shared widely and externally and the impact of `start_b2g` failing on unrelated problems can block a lot of test coverage.

Since the AppWindowManager was finalised we might be able to reliably wait for a DOM property (for example the 'active' class on the loaded iframe) instead of using execute_script.

It may also be worth revisiting Bebe's try/except loop idea which was considered not ideal at the time but the importance of this issue has changed and it might be worth it now.
See Also: → 924912
Assignee: nobody → zcampbell
Attached file github pr
Attachment #8355218 - Flags: review?(dave.hunt)
Attachment #8355218 - Flags: review?(bob.silverberg)
Surprised by the adhoc test results, hard to see the relation. They seem to be a keyboard problem. Unless it is starting up much faster and the keyboard is lazy loaded/not initialized. Retriggered, but I'll put this aside for a bit and work on some intermittents.
Dropping this off P1 now.
Priority: P1 → P3
Comment on attachment 8355218 [details] [review]
github pr

I don't think it's worth reviewing at this stage due to the failures. Please re-request review once these are addressed.
Attachment #8355218 - Flags: review?(dave.hunt)
Attachment #8355218 - Flags: review?(bob.silverberg)
(In reply to Dave Hunt (:davehunt) from comment #5)
> 
> I don't think it's worth reviewing at this stage due to the failures. Please
> re-request review once these are addressed.

Yeah sorry about that, was planning to debug it locally, but the future happened sooner than I anticipated..
Seems fine locally now; I'll rebase and rebuild the adhoc.

Fear it might be our old friend the update toaster!
I took a look at this as well and I also saw a failure locally trying to connect to cell data on one of the tests I ran:

```
test_browser_cell_data (test_browser_cell_data.TestBrowserCellData) ... ERROR

======================================================================
ERROR: None
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.2-py2.7.egg/marionette/marionette_test.py", line 127, in run
    self.setUp()
  File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/browser/test_browser_cell_data.py", line 17, in setUp
    self.data_layer.connect_to_cell_data()
  File "/Users/bsilverberg/gitRepos/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 298, in connect_to_cell_data
    result = self.marionette.execute_async_script("return GaiaDataLayer.connectToCellData()", special_powers=True)
  File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.2-py2.7.egg/marionette/marionette.py", line 1080, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.2-py2.7.egg/marionette/marionette.py", line 584, in _send_message
    self._handle_error(response)
  File "/Users/bsilverberg/.virtualenvs/gaia/lib/python2.7/site-packages/marionette_client-0.7.2-py2.7.egg/marionette/marionette.py", line 633, in _handle_error
    raise ScriptTimeoutException(message=message, status=status, stacktrace=stacktrace)
TEST-UNEXPECTED-FAIL | test_browser_cell_data.py test_browser_cell_data.TestBrowserCellData.test_browser_cell_data | ScriptTimeoutException: timed out
----------------------------------------------------------------------
Ran 1 test in 46.419s

FAILED (errors=1)
```

I notice that the last adhoc run [1] also seemed to have a few errors connecting to cell data, e.g., test_cost_control_data_alert_mobile, test_sms_send, test_enable_cell_data_via_settings_app. I wonder if this is related to this patch? Maybe the condition we are waiting for isn't waiting long enough for the OS to be in a state where we can attempt to connect to cell data?


[1] http://qa-selenium.mv.mozilla.com:8080/job/b2g.hamachi.mozilla-central.ui.adhoc/69/consoleFull
It looks suspiciously like a pattern doesn't it?

Will debug locally a bit more.
Bob, I wasn't able to replicate locally.

After talking to Hsin-yi (Ril owner) about enabling the ril.data, the only thing I could imagine going wrong is a race between us pushing the APN settings from json file and the RIL setting them up itself, or where sometimes a carrier use shared bandwidth on another carrier we were pushing completely wrong settings.

I've removed the APN settings from CI and I'll let the RIL do the work itself.

Failing that, in connect_to_cell_data method we can wait for mozMobileConnection.data.state=='registered' and that will tell us that it is ready to go.

I'll kick off another adhoc of this soon.
Have been doing some research on this, including debugging as per the above comment and debugging the 'ondataerror' event which is supposed to trigger when the ril connection fails. For the former I found no problem - the apn settings are always set correctly before we start the connection. For the latter I found it doesn't trigger before the timeout which led me to believe that sometimes it just takes a long time to connect.

I'm going to debug a bit more along that principle.
Trying another adhoc run with cell data timeout increased.

http://qa-selenium.mv.mozilla.com:8080/job/b2g.hamachi.mozilla-central.ui.adhoc/80/
Still unsure why this is affecting the cell data connection but I will attempt to debug a bit more still.
https://github.com/mozilla-b2g/gaia/commit/5f4b2af6143dc64fd0d3b08893289351bb007f87
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: