Closed Bug 1272589 Opened 4 years ago Closed 2 years ago

Intermittent test_toolbars.py TestLocationBar.test_load_url | TimeoutException: Timed out after 5.7 seconds

Categories

(Testing :: Firefox UI Tests, defect)

49 Branch
defect
Not set

Tracking

(firefox49 wontfix, firefox50 wontfix, firefox51 wontfix, firefox52 fix-optional, firefox53 fix-optional, firefox54 fix-optional)

RESOLVED INCOMPLETE
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fix-optional
firefox53 --- fix-optional
firefox54 --- fix-optional

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

(Keywords: intermittent-failure, regression)

https://treeherder.mozilla.org/logviewer.html#?job_id=27787841&repo=mozilla-inbound#L37992

 18:52:17     INFO - TEST-UNEXPECTED-ERROR | test_toolbars.py TestLocationBar.test_load_url | TimeoutException: TimeoutException: Timed out after 5.7 seconds

 18:52:17     INFO - Traceback (most recent call last):

 18:52:17     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette/marionette_test.py", line 351, in run

 18:52:17     INFO -     testMethod()

 18:52:17     INFO -   File "/home/worker/workspace/build/tests/firefox-ui/tests/puppeteer/test_toolbars.py", line 133, in test_load_url

 18:52:17     INFO -     Wait(self.marionette).until(lambda mn: mn.get_url() == data_uri)

 18:52:17     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/wait.py", line 143, in until

 18:52:17     INFO -     cause=last_exc)

18:52:17 INFO - TEST-INFO took 13435ms
Single failure only. Lets reopen if it appears again.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Actually happened again on mozilla-central. So not WFM.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
It would be great if we could improve the call to Wait().until() and add a specific failure message, so it's not a generic timeout message. Johannes would you be interested to get started with this bug?
Flags: needinfo?(mozilla_dev)
Summary: TEST-UNEXPECTED-ERROR | test_toolbars.py TestLocationBar.test_load_url | TimeoutException: TimeoutException: Timed out after 5.7 seconds → TEST-UNEXPECTED-ERROR | test_toolbars.py TestLocationBar.test_load_url | TimeoutException: Timed out after 5.7 seconds
This looks to be a perma fail for Linux64 debug builds now. Maybe I should have a look at this one.
Assignee: nobody → hskupin
Status: REOPENED → ASSIGNED
Flags: needinfo?(mozilla_dev)
Here is the output from the initial log:

 18:52:09     INFO -  1463079129214	Marionette	TRACE	conn140 -> [0,151,"sendKeysToElement",{"id":"dc36bd79-23d6-46fb-bce0-4e65cb94da8c","value":["d","a","t","a",":","t","e","x","t","/","h","t","m","l",",","<","t","i","t","l","e",">","T","i","t","l","e","<","/","t","i","t","l","e",">",""]}]
 18:52:09     INFO -  1463079129222	Marionette	DEBUG	Element does not have a correct accessibility role and may not be manipulated via the accessibility API: id: urlbar, tagName: textbox, className:
 18:52:09     INFO -  1463079129784	Marionette	TRACE	conn140 <- [1,151,null,{}]
 18:52:10     INFO -  1463079130513	Marionette	TRACE	conn140 -> [0,152,"getContext",null]
 18:52:10     INFO -  1463079130524	Marionette	TRACE	conn140 <- [1,152,null,{"value":"chrome"}]
 18:52:10     INFO -  1463079130681	Marionette	TRACE	conn140 -> [0,153,"setContext",{"value":"content"}]
 18:52:10     INFO -  1463079130692	Marionette	TRACE	conn140 <- [1,153,null,{}]
 18:52:10     INFO -  1463079130725	Marionette	TRACE	conn140 -> [0,154,"getCurrentUrl",null]
 18:52:14     INFO -  --DOCSHELL 0x7f328d112800 == 5 [pid = 919] [id = 54]
 18:52:14     INFO -  --DOMWINDOW == 47 (0x7f328a85cc00) [pid = 919] [serial = 207] [outer = (nil)] [url = about:blank]
 18:52:14     INFO -  ++DOCSHELL 0x7ff7f1917800 == 1 [pid = 1453] [id = 1]
 18:52:15     INFO -  ++DOMWINDOW == 1 (0x7ff7f1979800) [pid = 1453] [serial = 1] [outer = (nil)]
 18:52:15     INFO - ++DOMWINDOW == 2 (0x7ff7effbd400) [pid = 1453] [serial = 2] [outer = 0x7ff7f1979800] 
 18:52:14     INFO -  --DOMWINDOW == 47 (0x7f328a85cc00) [pid = 919] [serial = 207] [outer = (nil)] [url = about:blank]
 18:52:16     INFO - 1463079136259 Marionette TRACE conn140 <- [1,154,null,{"value":"about:blank"}] 

As you can see Marionette entered the data url via sendKeysToElement() and the final Return key should have triggered loading the URL, but we seem to get `about:blank` to be loaded. 

Here is the Orange Factor graph:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1272589&startday=2016-07-01&endday=2016-08-05&tree=all

It looks like we have a large spike since Aug 3rd. Checking treeherder for the first occurrence gives me:

https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=Firefox%20UI&fromchange=a9cc7c343752&tochange=186bab1ed189 

The only related bug I see could be bug 1290192 where I upgraded to the Ubuntu 16.04 docker images. Since then we had those failures. I just triggered some more jobs for tc-Fxfn-r-e10s on Linux64 debug to see how this intermittent failure visualizes.
Ok, so it's definitely a regression from the upgrade to Ubuntu 16.04.
Blocks: 1290192
Keywords: regression
:whimboo, are you looking into this?
Flags: needinfo?(hskupin)
I did a debug session in a one click loaner on TaskCluster and as noticed the tests for debug builds are getting executed very slowly there! So the underlying issue is Wait().until(), which has default values of interval=.1 and timeout=5 for our tests. But here is what we get:

> TEST-START | test_toolbars.py TestLocationBar.test_load_url
> time.struct_time(tm_year=2016, tm_mon=8, tm_mday=9, tm_hour=9, tm_min=57, tm_sec=58, tm_wday=1, tm_yday=222, tm_isdst=0)
> *** get_url(): about:blank
> time.struct_time(tm_year=2016, tm_mon=8, tm_mday=9, tm_hour=9, tm_min=58, tm_sec=3, tm_wday=1, tm_yday=222, tm_isdst=0)
> *** get_url(): data:text/html,<title>Title</title>
> TEST-PASS | test_toolbars.py TestLocationBar.test_load_url | took 16699ms

As you can see our callback gets called the first time at 9:57:58 and a second time at 9:58:03, which makes it 5 seconds! This is actually the timeout value for our call and results in a test failure. The interval of 0.1s is totally mood here and nothing we absolutely can trust on.

I will play locally with Wait().until() and see if we can do some performance enhancements. If not I would bump up the default timeout for UI (chrome) interaction from 5s to 10s for all of our fx ui tests, which should fix this issue and other intermittent failures for debug builds.
Flags: needinfo?(hskupin)
Depends on: 1293614
The patch on bug 1293614 made this nearly perma failure a rarely occurring one. Lets see how it goes the next days.
Summary: TEST-UNEXPECTED-ERROR | test_toolbars.py TestLocationBar.test_load_url | TimeoutException: Timed out after 5.7 seconds → Intermittent test_toolbars.py TestLocationBar.test_load_url | TimeoutException: Timed out after 5.7 seconds
Greg, we are experiencing massive slowdowns in running our Marionette tests for Firefox Linux64 debug builds on Taskcluster. In this case we are loading a data URL and waiting for it to be loaded. The simple check for the current URL takes about 5s which is absolutely long. Do you think that could be related to bug 1291940?
Flags: needinfo?(gps)
I don't think so. I'd look at an intermittent network issue related to DNS or accidental usage of a non machine local service first.
Flags: needinfo?(gps)
I wouldn't speculate for a intermittent network issue given that all tests of this job are using external data and all pass. For this particular test we even use only local test data. So no network access at all.

What I wonder is why marionette.get_url() takes that long for returning the URL. I might use a one-click loaner the next days to investigate that.
It looks like that moving to large desktop-test instances via bug 1281241 fixed that intermittent issue.
Depends on: 1281241
Since we moved to the large desktop-test instances in AWS this intermittent test failure is gone. Hurray for multi-cores!
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
We still have this problem with very slow running builds like ASAN:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=Firefox%20UI&bugfiler&selectedJob=3438129

I wish we could set the timeouts based on the cpu load of the application.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No more failures in the last 2 months. Lets close this bug.
Status: REOPENED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.