Closed Bug 1344670 Opened 3 years ago Closed 3 years ago

Permafailure on aurora as beta simulation /2dcontext/transformations/2d.transformation.transform.multiply.html | application crashed [@ mozilla::net::nsSocketTransport::InitiateSocket]

Categories

(Core :: Networking, defect, blocker)

defect
Not set
blocker

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- unaffected
firefox53 blocking fixed
firefox54 --- wontfix
firefox55 --- wontfix

People

(Reporter: intermittent-bug-filer, Assigned: cbook)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

i was hitting this on all platforms on uplift simulation runs with aurora as beta builds on mac, linux and windows in web platform tests.

Any idea what could have caused this ?
Flags: needinfo?(xeonchen)
Flags: needinfo?(mcmanus)
Flags: needinfo?(drno)
I think gary is the right contact here.. generically speaking it means some test is accessing the global internet, and one way that would happen is the proxy redirection wasn't working as expected..
Flags: needinfo?(mcmanus)
I did a build matching the try run and can confirm that for reasons I don't know its choosing not to use the proxy for example.com in a RELEASE_OR_BETA world.. I bet gary can figure it out.
This is making wpt basically permafail across the board post-uplift. I'm inclined to call it a b1 blocker.
Blocks: 1344705
That's a lot of tests failing and no one is sure why at this point. I'm holding off on the beta 1 builds for now.
Gary: this is blocking the reopening of beta after merge day, can you take a look ? 

Also in general i agree its a p1 blocker -> updating to blocker
Severity: normal → blocker
CCíng James and ms2ger who are maintaining the wpt-tests and the runner. Maybe something has changed in the harness for Firefox 53.
The proxyInfo in the call stack is null, I'm still looking for the root cause.
Flags: needinfo?(xeonchen)
In the web-platform-tests of m-c, seems no connection to example.com is made.
In [1], the hostname was replaced from |%(server)s| to |example.com|, where the comment says "Make sure self support doesn't hit the network.".

Mythmon, do you have any idea why this breaks in the beta channel?

[1] https://hg.mozilla.org/mozilla-central/rev/1fec53b169d0
Flags: needinfo?(mcooper)
Blocks: 1326225
[1] is the first revision breaks this test.

[1] https://hg.mozilla.org/releases/mozilla-beta/rev/d89512dab048
(In reply to Gary Chen [:xeonchen] (use ni? please) from comment #11)
> In [1], the hostname was replaced from |%(server)s| to |example.com|, where
> the comment says "Make sure self support doesn't hit the network.".

This change doesn't look correct to me. We should not have hard-coded `example.com` here. Could it be that those prefs are used by the Normandy addon which is only active for release but not nightly builds? If yes that might be the reason why we haven't seen these failures on aurora and central.
gijs: you reviewed bug 1326225 - could you take a look, thanks!
Flags: needinfo?(gijskruitbosch+bugs)
good news the web platform tests don't use the proxy code - that stuff is unduly complicated.

why would we run tests in beta that we don't run in nightly? That doesn't seem right.. (what's normandy?)
(In reply to Henrik Skupin (:whimboo) from comment #13)
> (In reply to Gary Chen [:xeonchen] (use ni? please) from comment #11)
> > In [1], the hostname was replaced from |%(server)s| to |example.com|, where
> > the comment says "Make sure self support doesn't hit the network.".
> 
> This change doesn't look correct to me. We should not have hard-coded
> `example.com` here. Could it be that those prefs are used by the Normandy
> addon which is only active for release but not nightly builds? If yes that
> might be the reason why we haven't seen these failures on aurora and central.

No, the code using this pref was simply broken before 1fec53b169d0. This is a revealed bug in the original code, but also, this should be disabled on 53 - bug 1344060. 

It looks like that cset also changed the pre-existing self-support pref, so it might work to just change back the selfsupport.url pref to:


user_pref("browser.selfsupport.url", "https://%(server)s/selfsupport-dummy/");


or to go whole-hog and use hardcoded 'localhost'.


Really though, what doesn't make any sense is why this is even happening - example.com is supposed to point to localhost for all our tests. Why isn't it so for the wpt tests? It's only a matter of time until other browser/platform code that uses that hostname hits this.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #16)
> (In reply to Henrik Skupin (:whimboo) from comment #13)
> > (In reply to Gary Chen [:xeonchen] (use ni? please) from comment #11)
> > > In [1], the hostname was replaced from |%(server)s| to |example.com|, where
> > > the comment says "Make sure self support doesn't hit the network.".
> > 
> > This change doesn't look correct to me. We should not have hard-coded
> > `example.com` here. Could it be that those prefs are used by the Normandy
> > addon which is only active for release but not nightly builds? If yes that
> > might be the reason why we haven't seen these failures on aurora and central.
> 
> No, the code using this pref was simply broken before 1fec53b169d0. This is
> a revealed bug in the original code, but also, this should be disabled on 53
> - bug 1344060. 

since this bug was/is in the regression range from comment #9 i pushed a try build based on beta in  https://treeherder.mozilla.org/#/jobs?repo=try&revision=20a74c50c014f667a346af5630cd07dd6528adc8 without this patch - just to be sure
(In reply to :Gijs from comment #16)
> Really though, what doesn't make any sense is why this is even happening -
> example.com is supposed to point to localhost for all our tests. Why isn't
> it so for the wpt tests? It's only a matter of time until other
> browser/platform code that uses that hostname hits this.

Where is the mapping defined for other test harnesses? Could you point that out? Thanks.
example.com isn't supported in web-platform-tests; the set of supported hostnames isn't related to the set in mochitest because it's an agreed set for all the browsers that use it (localhost also isn't expected to work although it may do by accident). 

Can't we disable whatever this feature is in prefs_general.js or in a wpt-specific pref? It doesn't sound like something that wpt is testing.
(In reply to Henrik Skupin (:whimboo) from comment #18)
> (In reply to :Gijs from comment #16)
> > Really though, what doesn't make any sense is why this is even happening -
> > example.com is supposed to point to localhost for all our tests. Why isn't
> > it so for the wpt tests? It's only a matter of time until other
> > browser/platform code that uses that hostname hits this.
> 
> Where is the mapping defined for other test harnesses? Could you point that
> out? Thanks.

https://dxr.mozilla.org/mozilla-beta/source/build/pgo/server-locations.txt

(In reply to James Graham [:jgraham] from comment #19)
> example.com isn't supported in web-platform-tests; the set of supported
> hostnames isn't related to the set in mochitest because it's an agreed set
> for all the browsers that use it (localhost also isn't expected to work
> although it may do by accident). 

Where is this set defined?

> Can't we disable whatever this feature is in prefs_general.js or in a
> wpt-specific pref? It doesn't sound like something that wpt is testing.

You could disable in a wpt-specific test by setting browser.selfsupport.enabled to false.

I don't think disabling it everywhere (using prefs_general) is a good idea, given that, inasmuch as possible, we need to test what we ship.
https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/README.md and http://web-platform-tests.org/introduction.html document that.

We already disable a lot of things that try to contact external servers, so I don't think disabling this is particularly weird.
Attached patch fix wpt suiteSplinter Review
The Suggestion from Gijs does indeed the trick according to the try run [1] , so providing a patch


[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8a69ba260361fe870831dcdfe79803e8dfb4161
Attachment #8844441 - Flags: review?(gijskruitbosch+bugs)
Attachment #8844441 - Flags: review?(gijskruitbosch+bugs) → review+
(In reply to James Graham [:jgraham] from comment #22)
> https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/
> README.md and http://web-platform-tests.org/introduction.html document that.
> 
> We already disable a lot of things that try to contact external servers, so
> I don't think disabling this is particularly weird.

What about the locations we ship with web-platform-tests?

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/harness/wptrunner/browsers/server-locations.txt
Hoping that works. If not, this is where we schedule the mozilla-beta on-push linux tests: https://hg.mozilla.org/releases/mozilla-beta/file/tip/taskcluster/taskgraph/target_tasks.py#l142

It's possible we need to add a line to add a match_run_on_projects check: https://hg.mozilla.org/releases/mozilla-beta/file/tip/taskcluster/taskgraph/target_tasks.py#l61
(In reply to Aki Sasaki [:aki] from comment #26)
> Hoping that works. If not, this is where we schedule the mozilla-beta
> on-push linux tests:
> https://hg.mozilla.org/releases/mozilla-beta/file/tip/taskcluster/taskgraph/
> target_tasks.py#l142
> 
> It's possible we need to add a line to add a match_run_on_projects check:
> https://hg.mozilla.org/releases/mozilla-beta/file/tip/taskcluster/taskgraph/
> target_tasks.py#l61

fix worked and setting beta tree back in operation
Flags: needinfo?(drno)
Flags: needinfo?(mcooper)
Resolving per comment #27
Assignee: nobody → cbook
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Depends on: 1345650
I filed bug 1345650 for the less-immediate "so what's the right way to fix this for the future" problem, given that the partial backout has resolved this on beta.
Is this wontfix on 54/55 then in favor of whatever fix comes out of bug 1345650?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)
> Is this wontfix on 54/55 then in favor of whatever fix comes out of bug
> 1345650?

I guess so, yeah. I'm not convinced the revert here is what we 'should' do. If people have concrete opinions/ideas, feel free to chime in in bug 1345650.
Flags: needinfo?(gijskruitbosch+bugs)
You need to log in before you can comment on or make changes to this bug.