Open Bug 1385613 Opened 7 years ago Updated 1 year ago

Intermittent firefox-ui\tests\functional\safebrowsing\test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] (or test_initial_download.py)

Categories

(Toolkit :: Safe Browsing, defect, P5)

defect

Tracking

()

REOPENED

People

(Reporter: aryx, Unassigned)

References

Details

(Keywords: intermittent-failure, regression)

+++ This bug was initially created as a clone of Bug #1381745 +++

https://treeherder.mozilla.org/logviewer.html#?job_id=119254596&repo=autoland

13:38:26     INFO - TEST-UNEXPECTED-FAIL | test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in []
13:38:26     INFO - Traceback (most recent call last):
13:38:26     INFO -   File "Z:\task_1501333514\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 156, in run
13:38:26     INFO -     testMethod()
13:38:26     INFO -   File "Z:\task_1501333514\build\tests\firefox-ui\tests\functional\security\test_safe_browsing_initial_download.py", line 111, in test_safe_browsing_initial_download
13:38:26     INFO -     self.assertIn(f, files_on_disk_google4)
Low failure rate. No "download error" or invalid API keys in the log.
Priority: -- → P5
Most of the cases we got update error = NS_BINDING_ABORTED (cancel update due to timed out)
Some intermittents occur on debug build because missing google api
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=120181189&lineNumber=19468
(that's weird, I thought we disabled the tests on debug build)
http://searchfox.org/mozilla-central/rev/b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/security/manifest.ini#12

And even opt mac os 10.10
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=120176838&lineNumber=661
after we landed bug 1384024
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> Some intermittents occur on debug build because missing google api
> https://treeherder.mozilla.org/logviewer.
> html#?repo=try&job_id=120181189&lineNumber=19468
> (that's weird, I thought we disabled the tests on debug build)
> http://searchfox.org/mozilla-central/rev/
> b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> security/manifest.ini#12

This is a PGO/OPT build. Also please note that in some cases we got a couple of wrongly starred test jobs lately.
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> > Some intermittents occur on debug build because missing google api
> > https://treeherder.mozilla.org/logviewer.
> > html#?repo=try&job_id=120181189&lineNumber=19468
> > (that's weird, I thought we disabled the tests on debug build)
> > http://searchfox.org/mozilla-central/rev/
> > b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> > security/manifest.ini#12
> 
> This is a PGO/OPT build. Also please note that in some cases we got a couple
> of wrongly starred test jobs lately.

Yes, then in this pgo case, google api is missing, so we could not make an update.
The correct link of debug build with the same reason (missing google api key)
https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=121124920&lineNumber=19515
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6)
> The correct link of debug build with the same reason (missing google api key)
> https://treeherder.mozilla.org/logviewer.
> html#?repo=try&job_id=121124920&lineNumber=19515

As it looks like we fail to produce a correct entry for debug in mozinfo.json on Windows TC workers. I filed bug 1388012 for that.
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #4)
> > Some intermittents occur on debug build because missing google api
> > https://treeherder.mozilla.org/logviewer.
> > html#?repo=try&job_id=120181189&lineNumber=19468
> > (that's weird, I thought we disabled the tests on debug build)
> > http://searchfox.org/mozilla-central/rev/
> > b52285fffc13f36eca6b47de735d4e4403b3859e/testing/firefox-ui/tests/functional/
> > security/manifest.ini#12
> 
> This is a PGO/OPT build. Also please note that in some cases we got a couple
> of wrongly starred test jobs lately.

Does that mean that PGO builds somehow aren't getting the API key?

I looked through all of the places that control whether or not secrets are available and they all make the Google key available to Try builds (scm_level = 1):

https://searchfox.org/mozilla-central/search?q=min_scm_level&case=false&regexp=false&path=
(In reply to François Marier [:francois] from comment #8)
> > This is a PGO/OPT build. Also please note that in some cases we got a couple
> > of wrongly starred test jobs lately.
> 
> Does that mean that PGO builds somehow aren't getting the API key?
>
> I looked through all of the places that control whether or not secrets are
> available and they all make the Google key available to Try builds
> (scm_level = 1):

Sorry, but this is outside of my knowledge. Maybe some Releng folks can help here. I will CC Justin in case he knows or can forward to the appropriate person.
Flags: needinfo?(bugspam.Callek)
Ok, afaict these secrets are NOT present on *try* windows builds, but are present on production jobs windows builds.

I'm not sure offhand if try and production are different or what the story is.  They are baked into the windows image itself (via OCC).

I'm going to needinfo wcosta for now, who hopefully will have more information here.
Flags: needinfo?(bugspam.Callek) → needinfo?(wcosta)
(In reply to Justin Wood (:Callek) from comment #10)
> Ok, afaict these secrets are NOT present on *try* windows builds, but are
> present on production jobs windows builds.
> 
> I'm not sure offhand if try and production are different or what the story
> is.  They are baked into the windows image itself (via OCC).
> 
> I'm going to needinfo wcosta for now, who hopefully will have more
> information here.

Hypothetically, if I was completely lost with OCC, who* should I ask help for? 

* Assuming pete and rob are in pto
Flags: needinfo?(wcosta)
(In reply to Wander Lairson Costa [:wcosta] from comment #11)
> * Assuming pete and rob are in pto

Rob is back now! Lets see if he has an idea.
Flags: needinfo?(rthijssen)
in occ, the builder secrets are downloaded at the beginning of the ami creation run:
- code to perform the download and decryption:
  https://github.com/mozilla-releng/OpenCloudConfig/blob/41975ff5/userdata/DynamicConfig.ps1#L42-L59
- manifest of required secrets:
  https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Manifest/releng-secrets.json
- encrypted secrets (*.gpg):
  https://github.com/mozilla-releng/OpenCloudConfig/tree/master/userdata/Configuration/FirefoxBuildResources

i think it's unlikely that the secrets are actually missing on the failed runs because they are put on the ami very early in the dsc run. if an ami was missing secrets, all builds for the level would fail (rather than intermittent failures) since all spot instances created from the ami should suffer from the missing files, rather than just an occasional instance.

perhaps a try run that includes a debug logging statement to list the contents of c:\builds whenever the assertion at https://dxr.mozilla.org/mozilla-central/rev/5322c03f4c8587fe526172d3f87160031faa6d75/testing/firefox-ui/tests/functional/security/test_safe_browsing_initial_download.py#111 would tell us if the api key file is actually missing?
Flags: needinfo?(rthijssen)
Looks like that recently this only happens on try. So I'm pretty sure that the fix on bug 1388012 also fixed this bug.
Assignee: nobody → cmanchester
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
I had the same failures today with a try build which is based on latest mozilla-central, but it still doesn't work:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=82117d502f6e&selectedJob=126477428

Francois, would you mind having a look if the values have changed since bug 1388012 got fixed?
Assignee: cmanchester → nobody
Status: RESOLVED → REOPENED
Flags: needinfo?(francois)
Resolution: FIXED → ---
Summary: Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] due to outage of Google SafeBrowsing version 4 servers → Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] for test jobs on try
Target Milestone: mozilla57 → ---
Here's the reason why that Try failed:

 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Provider: google4 updateURL=
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Provider: google4 gethashURL=
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): getLists: browser.safebrowsing.provider.google4.lists
 ...
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-phish-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-malware-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-unwanted-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-badbinurl-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-downloadwhite-proto
 SafeBrowsing: 10:12:44 GMT-0700 (PDT): Invalid update url goog-passwordwhite-proto

In other words, something is setting both of these prefs to the empty string:

browser.safebrowsing.provider.google4.updateURL
browser.safebrowsing.provider.google4.gethashURL

and also these ones:

browser.safebrowsing.provider.google.updateURL
browser.safebrowsing.provider.google.gethashURL

Probably this code: https://searchfox.org/mozilla-central/rev/cd82cacec2cf734768827ff85ba2dba90a534c5e/toolkit/components/url-classifier/SafeBrowsing.jsm#304-310

which means that the Google API key is missing from that build.
Flags: needinfo?(francois)
Depends on: 1394601
(In reply to François Marier [:francois] from comment #18)
> the Google API key is missing from that build.

It would be great if we could detect that condition at the start of the test and then fail early with a good error message.

Unfortunately, I can't think of a way to peak at the API key from within marionette.
(In reply to François Marier [:francois] from comment #19)
> Unfortunately, I can't think of a way to peak at the API key from within
> marionette.

The easiest way would be to just use `Services.urlFormatter.formatURL()` as you do in SafeBrowsing.jsm, and just execute that via a call to `self.marionette.execute_script()`.
I had a check for try results on Treeherder but wasn't able to see this specific failure at all anymore since Aug 31. Maybe we wait some more days and then close this bug as WFM. Not sure what could have fixed it.
It happened again to me:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ae29de1cbc1&selectedJob=129324154

Francois, can you please have a look if something has been changed?
Flags: needinfo?(francois)
It's the same problem:

  SafeBrowsing: 19:47:46 GMT+0000 (Coordinated Universal Time): Missing Google API key, clearing updateURL and gethashURL.

but now it's easier to diagnose because we can just search for "Missing" and it will turn up the log line I recently added.
Flags: needinfo?(francois)
Chris, maybe you have an idea why the Google API key is missing on try for tests based on artifact builds?
Blocks: 1278699
Flags: needinfo?(cmanchester)
I haven't had a chance to test this out on try, but this does seem to be an issue with the opt artifact build mozconfigs, which don't end up including `ac_add_options --with-google-api-keyfile=...`.
Flags: needinfo?(cmanchester)
(In reply to François Marier [:francois] from comment #24)
> It's the same problem:
> 
>   SafeBrowsing: 19:47:46 GMT+0000 (Coordinated Universal Time): Missing
> Google API key, clearing updateURL and gethashURL.
> 
> but now it's easier to diagnose because we can just search for "Missing" and
> it will turn up the log line I recently added.

Francois, what should we do for failures of this type appearing locally when running the test? Should we skip it if no key is available? How could we determine that?
Flags: needinfo?(francois)
(In reply to Henrik Skupin (:whimboo) from comment #39)
> Francois, what should we do for failures of this type appearing locally when
> running the test? Should we skip it if no key is available? How could we
> determine that?

Good question. I would avoid disabling the test when the key is missing because then it wouldn't be as easy to detect that the key is missing in our build environment.

Maybe there's a way to show a message that explains that this test is expected to fail unless you add a Google API key to your mozconfig?

  ac_add_options --with-google-api-keyfile=/home/someone/.gapi
Flags: needinfo?(francois)
I like to work on this bug.  Where can I get started?
Flags: needinfo?(hskupin)
(In reply to François Marier [:francois] from comment #40)
> (In reply to Henrik Skupin (:whimboo) from comment #39)
> > Francois, what should we do for failures of this type appearing locally when
> > running the test? Should we skip it if no key is available? How could we
> > determine that?
> 
> Good question. I would avoid disabling the test when the key is missing
> because then it wouldn't be as easy to detect that the key is missing in our
> build environment.
> 
> Maybe there's a way to show a message that explains that this test is
> expected to fail unless you add a Google API key to your mozconfig?
> 
>   ac_add_options --with-google-api-keyfile=/home/someone/.gapi

Is there a Gecko API we could use to determine if there is a Google API key set? I don't want to always show this message because in cases the key is present and we still fail, should trigger an investigation because that shouldn't happen.
Flags: needinfo?(hskupin) → needinfo?(francois)
Summary: Intermittent test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] for test jobs on try → Intermittent testing\firefox-ui\tests\functional\safebrowsing\test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in []
(In reply to Henrik Skupin (:whimboo) from comment #42)
> Is there a Gecko API we could use to determine if there is a Google API key
> set? I don't want to always show this message because in cases the key is
> present and we still fail, should trigger an investigation because that
> shouldn't happen.

This is how it was added to about:support in bug 1336920: https://searchfox.org/mozilla-central/rev/403038737ba75af3842ba6b43b6e2fb47eb06609/toolkit/modules/Troubleshoot.jsm#225-226
Flags: needinfo?(francois)
This failure doesn't seem to happen anymore and I also cannot reproduce it. So I will wait with the implementation of the Google API key check until it's visible again.
Summary: Intermittent testing\firefox-ui\tests\functional\safebrowsing\test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] → Intermittent firefox-ui\tests\functional\safebrowsing\test_safe_browsing_initial_download.py TestSafeBrowsingInitialDownload.test_safe_browsing_initial_download | AssertionError: 'goog-badbinurl-proto.pset' not found in [] (or test_initial_download.py)
Note: updated summary so that intermittent try build failures can find this bug easily as well.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.