Closed Bug 1501040 Opened 11 months ago Closed 11 months ago

Perma beta raptor-firefox-tp6 JavaScript error: resource://gre/modules/addons/XPIProvider.jsm, line 2600: TypeError: addon is null; can't access its "seen" property

Categories

(Testing :: Raptor, defect, P1)

defect

Tracking

(firefox-esr60 unaffected, firefox63 unaffected, firefox64+ fixed, firefox65+ fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox-esr60 --- unaffected
firefox63 --- unaffected
firefox64 + fixed
firefox65 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: rwood)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=207024905&repo=mozilla-beta

https://queue.taskcluster.net/v1/task/EZ_cCZIZSYyeeALKKxBw0A/runs/0/artifacts/public/logs/live_backing.log

In bug 1495310 Talos tp6 got turned off for beta and Raptor Firefox tp6 turned on because Talos tp6 was broken. Raptor tp6 fails on beta due to addon signing issues. The test of Raptor tp6 accidentally used raptor chrome.

If Raptor tp6 gets turned off for tp6, there won't be a tp6 measurement.

09:50:32     INFO -  Application command: /home/cltbld/workspace/build/application/firefox/firefox -profile /tmp/tmpozcCGe.mozrunner
09:50:32     INFO -  PID 29152 | 1540227032333	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
09:50:32     INFO -  PID 29152 | 1540227032333	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
09:50:32     INFO -  PID 29152 | 1540227032333	addons.xpi-utils	WARN	addMetadata: Add-on raptor@mozilla.org is invalid: Error: Extension raptor@mozilla.org is not correctly signed(resource://gre/modules/addons/XPIDatabase.jsm:2426:17) JS Stack trace: addMetadata@XPIDatabase.jsm:2426:17
09:50:33    ERROR -  PID 29152 | JavaScript error: resource://gre/modules/addons/XPIProvider.jsm, line 2600: TypeError: addon is null; can't access its "seen" property
10:02:58     INFO -  raptor-main application timed out after 750 seconds
10:03:14     INFO -  raptor-mitmproxy Stopping mitmproxy playback, klling process 29145
10:03:14     INFO -  Replaying from files: ['/home/cltbld/workspace/testing/raptor/amazon.mp']
10:03:24     INFO -  raptor-mitmproxy Successfully killed the mitmproxy playback process
10:03:24     INFO -  raptor-mitmproxy Turning off the browser proxy
10:03:24     INFO -  raptor-main removing webext /home/cltbld/workspace/build/tests/raptor/webext/raptor
10:03:24    ERROR -  Traceback (most recent call last):
10:03:24     INFO -    File "/home/cltbld/workspace/build/tests/raptor/raptor/raptor.py", line 349, in <module>
10:03:24     INFO -      main()
10:03:24     INFO -    File "/home/cltbld/workspace/build/tests/raptor/raptor/raptor.py", line 327, in main
10:03:24     INFO -      raptor.run_test(next_test, timeout=int(next_test['page_timeout']))
10:03:24     INFO -    File "/home/cltbld/workspace/build/tests/raptor/raptor/raptor.py", line 259, in run_test
10:03:24     INFO -      self.profile.addons.remove_addon(webext_id)
10:03:24     INFO -    File "/home/cltbld/workspace/build/venv/lib/python2.7/site-packages/mozprofile/addons.py", line 329, in remove_addon
10:03:24     INFO -      path = self.get_addon_path(addon_id)
10:03:24     INFO -    File "/home/cltbld/workspace/build/venv/lib/python2.7/site-packages/mozprofile/addons.py", line 108, in get_addon_path
10:03:24     INFO -      raise IOError('Add-on not found: %s' % addon_id)
10:03:24     INFO -  IOError: Add-on not found: raptor@mozilla.org
Robert, please take a look and decide what should be done with tp6 on beta. Thank you.
Flags: needinfo?(rwood)
:aryx, is this perma fail?  why did it work on a beta simulation last week?
Flags: needinfo?(aryx.bugmail)
It's a permafail and the job added last week was "Rap-C-..." which is raptor-chrome (which passes), we need raptor-firefox.
Flags: needinfo?(aryx.bugmail)
:aswan for our testing web extensions on mozilla-beta, how do we sign them?  you have been converting many talos extensions to be web extensions, here is our raptor web extension: https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor
Flags: needinfo?(aswan)
Ideally we wouldn't sign something that's used in automation.  How were talos-powers and pageloader handled prior to switching from talos to raptor?
Flags: needinfo?(aswan) → needinfo?(jmaher)
I believe it was a pref and env var:
pref: security.turn_off_all_security_so_that_viruses_can_take_over_this_computer=true
env: self.env['MOZ_DISABLE_NONLOCAL_CONNECTIONS'] = '1'

I see the pref set for raptor, but not the env var, although for tp6 I thought we had to turn off the env var to start with since we would spoof dns for all traffic that was recorded.

:aswan, does this sound right?
Flags: needinfo?(jmaher)
Yes, the combination of that pref and that environment variable (plus a few other prefs like xpinstall.signatures.enabled and extensions.legacy.enabled) will allow unsigned privileged extensions to run on beta.  But as you say, one of the talos tests (tp5 or tp6 or both?) uses mitmproxy and so cannot set the DISABLE_NONLOCAL_CONNECTIONS env var.
I'm a little lost here, are we talking about a test that was previously running on beta with talos or about enabling a new test that we haven't ever run on beta before?
thanks for the clarification :aswan.

tp6 was perma failing on beta (windows only), we will be running tp6 in raptor soon, so we decided to run raptor-tp6 on mozilla-beta instead of talos tp6.  Now we find that raptor tp6 is failing due to signing.  I will look into this with :rwood when he is back from pto; I think we need to figure out why tp6 worked on beta 2 weeks ago running in talos and why this fails now in raptor.  Maybe this is related to some of the addons -> extension work.


here is a push from Monday showing talos tp6 failures:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=tp6&revision=aba25ceab10f829e07c36ab1fd9fdbb83d4774c2

from the log of win10, I see:
14:54:17     INFO -  PID 6520 | 1540220057112	addons.xpi-utils	WARN	Add-on pageloader@mozilla.org is not correctly signed.
14:54:17     INFO -  PID 6520 | 1540220057115	addons.xpi-utils	WARN	Add-on talos-powers@mozilla.org is not correctly signed.
[taskcluster:error] Aborting task...


this would explain why we hit max timeout and aborted the tasks- the addons were not properly signed- but why on windows only?  no errors like that in the linux logs.
Ugh, this is starting to show up on Raptor jobs on Windows 10 on try now [1] (in a totally unrelated change):

20:43:34     INFO -  PID 7288 | 1540413814455	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
20:43:34     INFO -  PID 7288 | 1540413814455	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
20:43:34     INFO -  PID 7288 | 1540413814455	addons.xpi-utils	WARN	addMetadata: Add-on raptor@mozilla.org is invalid: Error: Extension raptor@mozilla.org is not correctly signed(resource://gre/modules/addons/XPIDatabase.jsm:2426:17) JS Stack trace: addMetadata@XPIDatabase.jsm:2426:17

So something has definitely changed in Firefox, that is not letting our webext load.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=550a25adb6816e772a2a46757773533f54826000&selectedJob=207615797
Flags: needinfo?(rwood)
Also on OSX on try too, so it's not just Win.

14:13:36     INFO -  PID 584 | 1540415616659	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
14:13:36     INFO -  PID 584 | 1540415616659	addons.xpi-utils	WARN	Add-on raptor@mozilla.org is not correctly signed.
14:13:36     INFO -  PID 584 | 1540415616659	addons.xpi-utils	WARN	addMetadata: Add-on raptor@mozilla.org is invalid: Error: Extension raptor@mozilla.org is not correctly signed(resource://gre/modules/addons/XPIDatabase.jsm:2426:17) JS Stack trace: addMetadata@XPIDatabase.jsm:2426:17
Ah. So this was broken in Talos on beta even before Raptor was turned on in Beta. This looks like the change that originally introduced this problem on Talos (and seen on Raptor when it was turned on):

https://hg.mozilla.org/releases/mozilla-beta/rev/a6996c5491a5ff1f4aee11ceb090a3710c3a365a#l10.15
(In reply to Robert Wood [:rwood] from comment #12)
> Ah. So this was broken in Talos on beta even before Raptor was turned on in
> Beta. This looks like the change that originally introduced this problem on
> Talos (and seen on Raptor when it was turned on):
> 
> https://hg.mozilla.org/releases/mozilla-beta/rev/a6996c5491a5ff1f4aee11ceb090a3710c3a365a#l10.15

SO if I am understanding correctly, it looks like for R64 this build config was changed:

-# Disable enforcing that add-ons are signed by the trusted root
-MOZ_REQUIRE_SIGNING=${MOZ_REQUIRE_SIGNING-0}
+# Enable enforcing that add-ons are signed by the trusted root
+MOZ_REQUIRE_SIGNING=${MOZ_REQUIRE_SIGNING-1}

And that is causing our talos (and raptor) webext not to be able to load.

:aswan, :jmaher, any idea about this setting or how we can get around it? There's no bug associated with that merge/change so I really don't know whom to ask or how to find out why this setting was turned on etc.
Flags: needinfo?(jmaher)
Flags: needinfo?(aswan)
Signing has been required on beta for some time, I suspect that's a patch that gets applied automatically every time we uplift from central to beta.  As I said earlier, I have no idea how this ever worked, other bits of automation use the combination Joel mentioned in comment 6 but tp6 wants to use non-local network connections so I have no idea how it ever worked on beta.
Flags: needinfo?(aswan)
Thanks Andrew. 

Maybe there was something in the merge commit immediately before that, that caused this change in behaviour - before the merge for R64 talos tp6 was running fine on beta:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=talos%2Ctp6&tochange=a6996c5491a5ff1f4aee11ceb090a3710c3a365a&fromchange=2704bbd93c5655757d331b7463bcbf49ab497956
Ryan, can you help us understand what's going on here?
Flags: needinfo?(ryanvm)
I don't know what you're asking me for here. All comment 15 is saying is that it worked when 63 was on Beta and doesn't with 64. I have no idea what might have changed over the course of that development cycle to make that be the case. With some hackery, you could probably try to bisect it on Try, but I don't have cycles to help there.

And yes, that MOZ_REQUIRE_SIGNING change is something we do on every merge day, it's not new to 64.
Flags: needinfo?(ryanvm)
Sorry for the vague question Ryan.  The MOZ_REQUIRE_SIGNING change appears to be a red herring.

Here's an old (arbitrarily chosen) beta run where the tests passed:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=talos%2Ctp6&revision=095138ddfce66ad816470b4f0132e700d859f00a

But signing was required in that build:
https://hg.mozilla.org/releases/mozilla-beta/file/095138ddfce66ad816470b4f0132e700d859f00a/build/mozconfig.common
Ahh: in the merge was the change in Bug 1493867 to convert talos pageloader to a webext... so talos tp6 on beta (green) before that was when it was the old pageloader add-on.
Its still not clear to me how the old pageloader extension worked on beta.  Even if there's some signed copy of it laying around somewhere, it was a bootstrapped extension, which we don't allow to run on beta.
(In reply to Robert Wood [:rwood] from comment #22)

> https://treeherder.mozilla.org/#/jobs?repo=try&revision=43b51d27b7a11a6038716e689870401d385a45ce

I can reproduce the issue in Raptor tp6 on try by setting the MOZ_REQUIRE_SIGNING=${MOZ_REQUIRE_SIGNING-1} config to emulate beta, but interesting on try that doesn't cause the same failure on talos tp6.
Assignee: nobody → rwood
Status: NEW → ASSIGNED
(In reply to Robert Wood [:rwood] from comment #36)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=13d3c81d191e0172011d6ee7d357c9fa30c932bc

Good news: This solution seems to fix the issue without having to add an additional firefox startup/shutdown before the test.

On the try pushes I set the signing requirement in mozconfig.common like it is on mozilla-beta (+MOZ_REQUIRE_SIGNING=${MOZ_REQUIRE_SIGNING-1}) and this patch allows the unsigned webext to load anyway. I can't guarantee it would definitely fix the issue on mozilla-beta though until we run it there, but I think it's worth a go.
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/236d1f315ea7
Raptor fix so unsigned webext will load on mozilla-beta; r=jmaher
Blocks: 1481867
https://hg.mozilla.org/mozilla-central/rev/236d1f315ea7
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Comment on attachment 9021719 [details]
Bug 1501040 - Raptor fix so unsigned webext will load on mozilla-beta; r?jmaher

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1501040

User impact if declined: 

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): 

String changes made/needed:
Attachment #9021719 - Flags: approval-mozilla-beta?
This looks good on mozilla-central.
Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended] [checkin-needed-beta]
Comment on attachment 9021719 [details]
Bug 1501040 - Raptor fix so unsigned webext will load on mozilla-beta; r?jmaher

test-only changes don't need approval to be uplifted
Attachment #9021719 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/releases/mozilla-beta/rev/560ec5d63826
Flags: in-testsuite+
Whiteboard: [stockwell disable-recommended] [checkin-needed-beta] → [stockwell disable-recommended]
Depends on: 1504843
Issue in a script, fixed!
Version: Version 3 → Trunk
Priority: P5 → P1
You need to log in before you can comment on or make changes to this bug.