Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] for test jobs on try

REOPENED
Unassigned

Status

()

Toolkit
Safe Browsing
P5
normal
REOPENED
5 months ago
9 days ago

People

(Reporter: aryx, Unassigned)

Tracking

(Blocks: 1 bug, {intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

+++ This bug was initially created as a clone of Bug #1381745 +++

https://treeherder.mozilla.org/logviewer.html#?job_id=119254596&repo=autoland

13:38:26     INFO - TEST-UNEXPECTED-FAIL | test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in []
13:38:26     INFO - Traceback (most recent call last):
13:38:26     INFO -   File "Z:\task_1501333514\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 156, in run
13:38:26     INFO -     testMethod()
13:38:26     INFO -   File "Z:\task_1501333514\build\tests\firefox-ui\tests\functional\security\test_safe_browsing_initial_download.py", line 111, in test_safe_browsing_initial_download
13:38:26     INFO -     self.assertIn(f, files_on_disk_google4)

Comment 1

5 months ago
2 failures in 1008 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 2

Platform breakdown:
* windows10-64: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-07-24&endday=2017-07-30&tree=all
Low failure rate. No "download error" or invalid API keys in the log.
Priority: -- → P5

Comment 3

4 months ago
25 failures in 888 pushes (0.028 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 17
* autoland: 4
* mozilla-inbound: 3
* mozilla-central: 1

Platform breakdown:
* windows7-32: 10
* windows10-64: 9
* linux64: 3
* osx-10-10: 2
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-07-31&endday=2017-08-06&tree=all
Most of the cases we got update error = NS_BINDING_ABORTED (cancel update due to timed out)
Some intermittents occur on debug build because missing google api
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=120181189&lineNumber=19468
(that's weird, I thought we disabled the tests on debug build)
http://searchfox.org/mozilla-central/rev/b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/security/manifest.ini#12

And even opt mac os 10.10
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=120176838&lineNumber=661
after we landed bug 1384024
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> Some intermittents occur on debug build because missing google api
> https://treeherder.mozilla.org/logviewer.
> html#?repo=try&job_id=120181189&lineNumber=19468
> (that's weird, I thought we disabled the tests on debug build)
> http://searchfox.org/mozilla-central/rev/
> b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> security/manifest.ini#12

This is a PGO/OPT build. Also please note that in some cases we got a couple of wrongly starred test jobs lately.
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> > Some intermittents occur on debug build because missing google api
> > https://treeherder.mozilla.org/logviewer.
> > html#?repo=try&job_id=120181189&lineNumber=19468
> > (that's weird, I thought we disabled the tests on debug build)
> > http://searchfox.org/mozilla-central/rev/
> > b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> > security/manifest.ini#12
> 
> This is a PGO/OPT build. Also please note that in some cases we got a couple
> of wrongly starred test jobs lately.

Yes, then in this pgo case, google api is missing, so we could not make an update.
The correct link of debug build with the same reason (missing google api key)
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=121124920&lineNumber=19515
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6)
> The correct link of debug build with the same reason (missing google api key)
> https://treeherder.mozilla.org/logviewer.
> html#?repo=try&job_id=121124920&lineNumber=19515

As it looks like we fail to produce a correct entry for debug in mozinfo.json on Windows TC workers. I filed bug 1388012 for that.
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> > Some intermittents occur on debug build because missing google api
> > https://treeherder.mozilla.org/logviewer.
> > html#?repo=try&job_id=120181189&lineNumber=19468
> > (that's weird, I thought we disabled the tests on debug build)
> > http://searchfox.org/mozilla-central/rev/
> > b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> > security/manifest.ini#12
> 
> This is a PGO/OPT build. Also please note that in some cases we got a couple
> of wrongly starred test jobs lately.

Does that mean that PGO builds somehow aren't getting the API key?

I looked through all of the places that control whether or not secrets are available and they all make the Google key available to Try builds (scm_level = 1):

https://searchfox.org/mozilla-central/search?q=min_scm_level&case=false&regexp=false&path=
(In reply to François Marier [:francois] from comment #8)
> > This is a PGO/OPT build. Also please note that in some cases we got a couple
> > of wrongly starred test jobs lately.
> 
> Does that mean that PGO builds somehow aren't getting the API key?
>
> I looked through all of the places that control whether or not secrets are
> available and they all make the Google key available to Try builds
> (scm_level = 1):

Sorry, but this is outside of my knowledge. Maybe some Releng folks can help here. I will CC Justin in case he knows or can forward to the appropriate person.
Flags: needinfo?(bugspam.Callek)
Ok, afaict these secrets are NOT present on *try* windows builds, but are present on production jobs windows builds.

I'm not sure offhand if try and production are different or what the story is.  They are baked into the windows image itself (via OCC).

I'm going to needinfo wcosta for now, who hopefully will have more information here.
Flags: needinfo?(bugspam.Callek) → needinfo?(wcosta)
(In reply to Justin Wood (:Callek) from comment #10)
> Ok, afaict these secrets are NOT present on *try* windows builds, but are
> present on production jobs windows builds.
> 
> I'm not sure offhand if try and production are different or what the story
> is.  They are baked into the windows image itself (via OCC).
> 
> I'm going to needinfo wcosta for now, who hopefully will have more
> information here.

Hypothetically, if I was completely lost with OCC, who* should I ask help for? 

* Assuming pete and rob are in pto
Flags: needinfo?(wcosta)
Depends on: 1388012
(In reply to Wander Lairson Costa [:wcosta] from comment #11)
> * Assuming pete and rob are in pto

Rob is back now! Lets see if he has an idea.
Flags: needinfo?(rthijssen)
in occ, the builder secrets are downloaded at the beginning of the ami creation run:
- code to perform the download and decryption:
  https://github.com/mozilla-releng/OpenCloudConfig/blob/41975ff5/userdata/DynamicConfig.ps1#L42-L59
- manifest of required secrets:
  https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Manifest/releng-secrets.json
- encrypted secrets (*.gpg):
  https://github.com/mozilla-releng/OpenCloudConfig/tree/master/userdata/Configuration/FirefoxBuildResources

i think it's unlikely that the secrets are actually missing on the failed runs because they are put on the ami very early in the dsc run. if an ami was missing secrets, all builds for the level would fail (rather than intermittent failures) since all spot instances created from the ami should suffer from the missing files, rather than just an occasional instance.

perhaps a try run that includes a debug logging statement to list the contents of c:\builds whenever the assertion at https://dxr.mozilla.org/mozilla-central/rev/5322c03f4c8587fe526172d3f87160031faa6d75/testing/firefox-ui/tests/functional/security/test_safe_browsing_initial_download.py#111 would tell us if the api key file is actually missing?
Flags: needinfo?(rthijssen)

Comment 14

4 months ago
9 failures in 949 pushes (0.009 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 4
* try: 2
* autoland: 2
* mozilla-beta: 1

Platform breakdown:
* windows7-32: 3
* windows10-64: 3
* linux64: 2
* windows10-64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-08-14&endday=2017-08-20&tree=all

Comment 15

4 months ago
6 failures in 908 pushes (0.007 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 6

Platform breakdown:
* windows7-32: 3
* windows10-64: 2
* linux64-stylo: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-08-21&endday=2017-08-27&tree=all
Looks like that recently this only happens on try. So I'm pretty sure that the fix on bug 1388012 also fixed this bug.
Assignee: nobody → cmanchester
Status: NEW → RESOLVED
Last Resolved: 4 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
I had the same failures today with a try build which is based on latest mozilla-central, but it still doesn't work:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=82117d502f6e&selectedJob=126477428

Francois, would you mind having a look if the values have changed since bug 1388012 got fixed?
Assignee: cmanchester → nobody
Status: RESOLVED → REOPENED
status-firefox57: fixed → ---
Flags: needinfo?(francois)
Resolution: FIXED → ---
Summary: Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] due to outage of Google SafeBrowsing version 4 servers → Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] for test jobs on try
Target Milestone: mozilla57 → ---
Here's the reason why that Try failed:

 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Provider: google4 updateURL=
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Provider: google4 gethashURL=
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): getLists: browser.safebrowsing.provider.google4.lists
 ...
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-phish-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-malware-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-unwanted-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-badbinurl-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-downloadwhite-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-passwordwhite-proto

In other words, something is setting both of these prefs to the empty string:

browser.safebrowsing.provider.google4.updateURL
browser.safebrowsing.provider.google4.gethashURL

and also these ones:

browser.safebrowsing.provider.google.updateURL
browser.safebrowsing.provider.google.gethashURL

Probably this code: https://searchfox.org/mozilla-central/rev/cd82cacec2cf734768827ff85ba2dba90a534c5e/toolkit/components/url-classifier/SafeBrowsing.jsm#304-310

which means that the Google API key is missing from that build.
Flags: needinfo?(francois)
Depends on: 1394601
(In reply to François Marier [:francois] from comment #18)
> the Google API key is missing from that build.

It would be great if we could detect that condition at the start of the test and then fail early with a good error message.

Unfortunately, I can't think of a way to peak at the API key from within marionette.
(In reply to François Marier [:francois] from comment #19)
> Unfortunately, I can't think of a way to peak at the API key from within
> marionette.

The easiest way would be to just use `Services.urlFormatter.formatURL()` as you do in SafeBrowsing.jsm, and just execute that via a call to `self.marionette.execute_script()`.

Comment 21

3 months ago
9 failures in 939 pushes (0.01 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 8
* mozilla-beta: 1

Platform breakdown:
* windows10-64: 5
* windows7-32: 2
* osx-10-10: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-08-28&endday=2017-09-03&tree=all
I had a check for try results on Treeherder but wasn't able to see this specific failure at all anymore since Aug 31. Maybe we wait some more days and then close this bug as WFM. Not sure what could have fixed it.
It happened again to me:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ae29de1cbc1&selectedJob=129324154

Francois, can you please have a look if something has been changed?
Flags: needinfo?(francois)
It's the same problem:

  SafeBrowsing: 19:47:46 GMT+0000 (Coordinated Universal Time): Missing Google API key, clearing updateURL and gethashURL.

but now it's easier to diagnose because we can just search for "Missing" and it will turn up the log line I recently added.
Flags: needinfo?(francois)
Chris, maybe you have an idea why the Google API key is missing on try for tests based on artifact builds?
Blocks: 1278699
Flags: needinfo?(cmanchester)

Comment 26

3 months ago
2 failures in 924 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 2

Platform breakdown:
* windows7-32: 1
* windows10-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-09-04&endday=2017-09-10&tree=all
I haven't had a chance to test this out on try, but this does seem to be an issue with the opt artifact build mozconfigs, which don't end up including `ac_add_options --with-google-api-keyfile=...`.
Flags: needinfo?(cmanchester)

Comment 28

3 months ago
3 failures in 1032 pushes (0.003 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 3

Platform breakdown:
* windows10-64: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-09-11&endday=2017-09-17&tree=all

Comment 29

2 months ago
1 failures in 885 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-09-25&endday=2017-10-01&tree=all

Comment 30

2 months ago
1 failures in 824 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 1

Platform breakdown:
* windows10-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-10-02&endday=2017-10-08&tree=all

Comment 31

2 months ago
1 failures in 947 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-10-09&endday=2017-10-15&tree=all

Comment 32

a month ago
2 failures in 912 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* mozilla-central: 1

Platform breakdown:
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-10-23&endday=2017-10-29&tree=all
4 failures in 792 pushes (0.005 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 4

Platform breakdown:
* windows10-64: 2
* windows7-32: 1
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1385613&startday=2017-11-27&endday=2017-12-03&tree=all
You need to log in before you can comment on or make changes to this bug.