Flip the network.http.spdy.websockets pref to false for Fx65 release users to work around bug 1523427

REOPENED
Assigned to

Status

()

enhancement
P2
critical
REOPENED
7 months ago
2 months ago

People

(Reporter: RyanVM, Assigned: mythmon)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox65blocking fixed)

Details

(Whiteboard: [necko-triaged])

Per bug 1523427, the feature causing problems can be disabled via setting the network.http.spdy.websockets pref to false. I've confirmed with Dragana that this pref doesn't require a restart to take effect.

This is high urgency as it blocks wider rollout of Fx65.

We should make the recipe affect the beta and release channels for versions 65.0 and 65.0b*. For now, let's leave the recipe as not affecting a future 65.0.1 version as there's a low-risk looking fix that we can possibly take as a ride-along fix.

Mythmon, are you the right person to take this or should the NI be redirected to someone else?

Flags: needinfo?(mcooper)

Confirmed in bug 1523427 #c22 that flipping the pref flip is an effective workaround.

I've prepared a Normandy recipe for this which is awaiting approval. Ryan, can you review it?

https://delivery-console.prod.mozaws.net/recipe/672/

Flags: needinfo?(mcooper)

Can we rollout this recipe slowly rather than at 100%? I am worried what scenarios have gone untested with the spdy.websockets true for all of beta65 cycle. How about 25% for a day? and 100% a day later?

Hi Dragana, is there a way to narrow down the users that might be impacted by bug 1523427 rather than turning it off for all of release population? This is to mitigate the risk mentioned in comment 3.

Flags: needinfo?(dd.mozilla)

We did something similar in Bug 1471672.

As noted in comment 3, there is a risk here of some undetected breakage in the websockets code having been introduced between early November and now. We are doing our best to mitigate that risk by a combination of manual testing performed by the QA team in Las Vegas and a Try push off mozilla-release with the pref flipped. That Try push isn't showing any obvious breakage:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83ddb75d6a294c59d15c46671dee49756bfadb42

QA also didn't notice any observable change in behavior with the pref flipped but was able to confirm that it resolved the originally-reported issue.

We had some offline discussions about this and came to the following conclusions:

  • This recipe needs to target both 64.* and 65.* to ensure that the rule is already in place for users updating from 64 to 65 after we unthrottle updates (otherwise Fx64 clients will see that the recipe doesn't apply and disregard). This isn't risky because the pref was added by bug 1434137 and will therefore be a no-op for users on 64.
  • Rolling this out gradually or with a restricted audience doesn't seem feasible to me since we don't have any real ways of detecting which users are in an environment with an affected proxy server and this blocks wider rollout of Fx65. Given the risk mitigations above, this gives me confidence in flipping the pref as a short-term fix.
Flags: needinfo?(dd.mozilla)

(In reply to Michael Cooper [:mythmon] from comment #2)

I've prepared a Normandy recipe for this which is awaiting approval. Ryan,
can you review it?

https://delivery-console.prod.mozaws.net/recipe/672/

This is now approved and published with the changes noted in comment 6. I was also able to confirm that the recipe properly sets network.http.spdy.websockets to false in the following cases:

  • Firefox 64 new profile.
  • Firefox 65 migrated from 64 with previously-enrolled profile.
  • Firefox 65 new profile.

(In reply to Ritu Kothari (:ritu) from comment #4)

Hi Dragana, is there a way to narrow down the users that might be impacted by bug 1523427 rather than turning it off for all of release population? This is to mitigate the risk mentioned in comment 3.

There is no way to narrow down the users that might be impacted by bug 1523427.

The pref turns off the websocket over http/2 feature. Firefox will still support websockets over http1. With the pref turn on, if a server do not support the websocket over http/2 feature, Firefox will use websockets over http1. Therefore both code paths were tested in nightly and beta.

The websocket over http/2 feature is a new feature and there is not a lot of servers that support it (this might be the reason we discovered it so late). Therefore, I believe that nightly 65 and beta 65 users have mostly used the websockets over http1 code path. Risk of turning off the pref is very very low.

(In reply to Dragana Damjanovic [:dragana] from comment #8)

(In reply to Ritu Kothari (:ritu) from comment #4)

Hi Dragana, is there a way to narrow down the users that might be impacted by bug 1523427 rather than turning it off for all of release population? This is to mitigate the risk mentioned in comment 3.

There is no way to narrow down the users that might be impacted by bug
1523427.

The pref turns off the websocket over http/2 feature. Firefox will still
support websockets over http1. With the pref turn on, if a server do not
support the websocket over http/2 feature, Firefox will use websockets over
http1. Therefore both code paths were tested in nightly and beta.

The websocket over http/2 feature is a new feature and there is not a lot of
servers that support it (this might be the reason we discovered it so late).
Therefore, I believe that nightly 65 and beta 65 users have mostly used the
websockets over http1 code path. Risk of turning off the pref is very very
low.

The details are very helpful and glad to hear the risk is minuscule. Thanks Dragana!

Assignee: nobody → mcooper
Whiteboard: [necko-triaged]
Priority: -- → P2

Hi mythmon, I've just uplifted the real fix for bug 1523427 to mozilla-release for Monday's 65.0.1 build. Can we please go ahead and update the recipe to versions <65.0.1? Thanks!

Flags: needinfo?(mcooper)

I've updated the recipe to exclude 65.0.1 and above.

Flags: needinfo?(mcooper)

Approved, thanks for the update.

Hey :mythmon -- can we close this bug out at this point?

Flags: needinfo?(mcooper)

I'd like to keep this bug open as long as the the Normandy recipe is still active.

This dashboard shows that we still have about 6% of users that have the hotfix active. Disabling the hotfix today would cause these users to revert to the default (broken) state. I think in a couple more weeks we can disable this.

Flags: needinfo?(mcooper)

Is this going to affect 66 at all ? Or are we only keeping this alive for 65 users?

Flags: needinfo?(mcooper)

This recipe affects only versions >= 64 and < 65.0.1. Users of 65.0.1 and above are not affected by this recipe.

Flags: needinfo?(mcooper)

Yes, I meant, do we want it to affect 66. I'll assume no then.

The real fix for this issue shipped in 65.0.1+.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED

If we haven't killed this recipe yet, we can now.

Flags: needinfo?(mcooper)

In the past week, we've seen 600K users that are still benefiting from this recipe being active. Specifically, that's users in the clients_daily table that were running 65.0 (the build without the fix) and had the hotfix active. For scale, that's about 0.4% of the users we've seen in the last week.

Ryan, do you still want to disable this recipe?

Flags: needinfo?(mcooper)

Ugh. I guess keep it around, but it would be nice if we had a clearer-cut policy around this as I don't think we should be supporting this indefinitely. Fx65 is almost 3 releases behind current now.

Re-opening since we are keeping the experiment active.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.