Intermittent test_safe_browsing_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | TimeoutException: Timed out after 300.2 seconds (outage of support.mozilla.org)

RESOLVED WORKSFORME

Status

Testing
Firefox UI Tests
RESOLVED WORKSFORME
7 months ago
6 months ago

People

(Reporter: Treeherder Bug Filer, Unassigned)

Tracking

({intermittent-failure})

Version 3
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

7 months ago
treeherder
Filed by: wkocher [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=88415215&repo=autoland

https://queue.taskcluster.net/v1/task/DWV-yEr-TfuwlZflhKPWtg/runs/0/artifacts/public/logs/live_backing.log

This started failing across inbound and autoland around the same time.
Not such a helpful assertion message. Looks like we missed to specify the message argument to the Wait().until() call.

task 2017-04-03T21:20:08.850324Z] 21:20:08     INFO - TEST-UNEXPECTED-ERROR | test_safe_browsing_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | TimeoutException: Timed out after 300.2 seconds
[task 2017-04-03T21:20:08.851735Z] 21:20:08     INFO - Traceback (most recent call last):
[task 2017-04-03T21:20:08.852406Z] 21:20:08     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 166, in run
[task 2017-04-03T21:20:08.852524Z] 21:20:08     INFO -     testMethod()
[task 2017-04-03T21:20:08.852940Z] 21:20:08     INFO -   File "/home/worker/workspace/build/tests/firefox-ui/tests/functional/security/test_safe_browsing_warning_pages.py", line 60, in test_warning_pages
[task 2017-04-03T21:20:08.853038Z] 21:20:08     INFO -     self.check_report_button(unsafe_page)
[task 2017-04-03T21:20:08.853177Z] 21:20:08     INFO -   File "/home/worker/workspace/build/tests/firefox-ui/tests/functional/security/test_safe_browsing_warning_pages.py", line 90, in check_report_button
[task 2017-04-03T21:20:08.853229Z] 21:20:08     INFO -     expected.element_stale(button))
[task 2017-04-03T21:20:08.853307Z] 21:20:08     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/wait.py", line 150, in until
[task 2017-04-03T21:20:08.853589Z] 21:20:08     INFO -     cause=last_exc)

So we are waiting here for the report button to become stale. It means the action we do does not trigger a page load, or there is a networking issue which prevents the target page from being loaded.
This happens because support.mozilla.org was hardly reachable over the day. Details can be found in bug 1351498.
Depends on: 1351498

Comment 3

7 months ago
26 failures in 134 pushes (0.194 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* mozilla-central: 11
* mozilla-inbound: 6
* mozilla-beta: 6
* autoland: 3

Platform breakdown:
* linux64: 15
* linux64-nightly: 5
* linux32: 5
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1353182&startday=2017-04-04&endday=2017-04-04&tree=all

Comment 4

7 months ago
39 failures in 867 pushes (0.045 failures/push) were associated with this bug in the last 7 days. 

This is the #46 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-central: 12
* mozilla-inbound: 11
* mozilla-beta: 10
* autoland: 6

Platform breakdown:
* linux64: 21
* linux32: 8
* linux64-nightly: 5
* osx-10-10: 2
* osx-10-9: 1
* osx-10-11: 1
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1353182&startday=2017-04-03&endday=2017-04-09&tree=all
I thought that this bug is caused by our recently big change in url-classifier but seems it is not.
For now, the intermittent only occurred on 3/4/5-04 and then reduced to 0 from 06/04, likely comment 2 is the root cause.
Thanks for verifying this Thomas.

Since this problem should go away once SUMO becomes reliable again, I'll mark this as WONTFIX.
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → WONTFIX
We have to keep this open to be able to star the failures in Treeherder. We can close as WFM once it appears to work fine again. If we close it now sheriffs might simply file a different bug.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Blocks: 1350867
I checked Orange Factor and the failures are indeed gone with builds from April 6th:

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1353182&startday=2017-04-03&endday=2017-04-12&tree=all

It means it was fixed on SUMO side.

Francois, I wonder if we could get rid of this remote dependency by turning some preference values to a local HTTP server, or if you really would like to have the checks against the real site.
Status: REOPENED → RESOLVED
Last Resolved: 7 months ago7 months ago
Flags: needinfo?(francois)
Resolution: --- → WORKSFORME
Summary: Intermittent test_safe_browsing_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | TimeoutException: Timed out after 300.2 seconds → Intermittent test_safe_browsing_warning_pages.py TestSafeBrowsingWarningPages.test_warning_pages | TimeoutException: Timed out after 300.2 seconds (outage of support.mozilla.org)
(In reply to Henrik Skupin (:whimboo) from comment #8)
> Francois, I wonder if we could get rid of this remote dependency by turning
> some preference values to a local HTTP server, or if you really would like
> to have the checks against the real site.

I think we only care that the button works.

In our test harness, is there a way to capture the request to SUMO and redirect it to some local webserver that will return an empty page (but still a 200)?
Flags: needinfo?(francois) → needinfo?(hskupin)
No, but I assume there is a pref with formatting options, which is retrieved by Firefox and the target url in this case SUMO is build from it? We could easily update this pref. May this is `app.support.baseURL;https://support.mozilla.org/1/firefox/%VERSION%/%OS%/%LOCALE%/`? If yes, we can get this switched to our local server on a new bug.
Flags: needinfo?(hskupin) → needinfo?(francois)
(In reply to Henrik Skupin (:whimboo) from comment #10)
> No, but I assume there is a pref with formatting options, which is retrieved
> by Firefox and the target url in this case SUMO is build from it? We could
> easily update this pref. May this is
> `app.support.baseURL;https://support.mozilla.org/1/firefox/%VERSION%/%OS%/
> %LOCALE%/`? If yes, we can get this switched to our local server on a new
> bug.

Yes, it does use that pref.

The code that opens SUMO ("Why was this page blocked?" button) is here: https://searchfox.org/mozilla-central/rev/d4eaa9c2fa54d553349ac88f0c312155a4c6e89e/browser/base/content/browser.js#3195

and it uses this code:

        openHelpLink("phishing-malware", false, "current");

The openHelpLink() function is defined here: https://searchfox.org/mozilla-central/rev/d4eaa9c2fa54d553349ac88f0c312155a4c6e89e/browser/base/content/utilityOverlay.js#906

and uses getHelpLinkUrl(): https://searchfox.org/mozilla-central/rev/d4eaa9c2fa54d553349ac88f0c312155a4c6e89e/browser/base/content/utilityOverlay.js#901

which does the use app.support.baseURL pref.

So if we can redirect all SUMO requests to a local server that always returns a 200, we should eliminate all of the false positives.
Flags: needinfo?(francois)
Great. So I filed bug 1357372.
You need to log in before you can comment on or make changes to this bug.