Closed Bug 1055145 Opened 6 years ago Closed 6 years ago

Remove push server fallback from Loop Client

Categories

(Hello (Loop) :: Client, defect, P1)

defect

Tracking

(firefox35 fixed, firefox36 fixed, firefox37 fixed)

RESOLVED FIXED
mozilla37
Iteration:
37.2
Tracking Status
firefox35 --- fixed
firefox36 --- fixed
firefox37 --- fixed
Blocking Flags:
backlog Fx35+

People

(Reporter: abr, Assigned: pkerr)

References

Details

(Whiteboard: [tech-debt])

Attachments

(2 files, 1 obsolete file)

After Bug 1055143 is deployed to the production servers, we need to remove the fall-back to "services.push.serverURL" that was introduced in Bug 1055139. See the description on that bug for the rationale.
Whiteboard: [tech-debt]
backlog: --- → Fx36+
backlog: Fx36+ → Fx37+
is this somethign we can do in fx38 or do we need in fx37
Flags: needinfo?(adam)
(In reply to sescalante from comment #1)
> is this somethign we can do in fx38 or do we need in fx37

The problem is that this is kind of a ticking timebomb: if we ramp up to heavy load and then have some hiccup in the HTTP endpoint that distributes push server names, all the loop clients will fall back to the push server used by other services (e.g., "find my phone") and almost certainly knock it over. We need to remove the fallback before our load reaches that point.

So I'd say 37 is kind of the latest we could do. I'd prefer to do it earlier, and maybe even consider an uplift.
Flags: needinfo?(adam)
based on adam's info - do we want to uplift into 36?  which is when it's most likely we get to volume with marketing assists?
backlog: Fx37+ → Fx36+
Flags: needinfo?(mreavy)
Priority: -- → P1
Yeah, let's take this for fx36. I'll add it to the trello.
Flags: needinfo?(mreavy)
Adam -- Depending on the size and risk of the patch, would you like to get this into Fx35?
Flags: needinfo?(adam)
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #5)
> Adam -- Depending on the size and risk of the patch, would you like to get
> this into Fx35?

If we can get a small patch in 35, that would be great. I should think this is pretty low risk, but let's evaluate after we have a working patch.

Because of the desire to uplift, the goal of this patch should be to make as small a change as is necessary to excise this code: if it looks like this is going to be more than 10 or so lines of change, or if it looks like it touches a file other than MozLoopPushHandler.jsm, please ping me.
Flags: needinfo?(adam)
Let's try to get this into Fx35 (assuming a small, low risk patch is achievable).
backlog: Fx36+ → Fx35+
Paul -- If you could take this after bug 1104279, that'd be great.  Be sure to note Abr's advice in Comment 6. Thanks!
Assignee: nobody → pkerr
In the code path where the fallback URL would have been used to create a websocket connection to the PushServer, I am going to kick off the retry manager to periodically attempt to open the connection. I feel this matches how the MozLoopPushHandler handles all connectivity issues: instead of returning an error through the registration callbacks.
Status: NEW → ASSIGNED
Attachment #8539491 - Flags: review?(adam)
Comment on attachment 8539491 [details] [diff] [review]
Remove push server fallback from Loop Client

Review of attachment 8539491 [details] [diff] [review]:
-----------------------------------------------------------------

Looks mostly good -- one thing to fix in the implementation; one thing to fix in tests. r+, assuming those changes are made.

::: browser/components/loop/MozLoopPushHandler.jsm
@@ +281,5 @@
>      }
>  
>      let pushServerURLFetchError = () => {
> +      console.warn("MozLoopPushHandler - Could not retrieve push server URL from Loop server, will retry");
> +      this._retryOperation(() => this._openSocket());

You also need to add a "this._retryEnd()" in the success case (e.g., further down in this function, just before "performOpen()")

::: browser/components/loop/test/xpcshell/test_looppush_initialize.js
@@ +3,5 @@
>  {
>    let dummyCallback = () => {};
>    let mockWebSocket = new MockWebSocketChannel();
> +  let pushServerRequestCount = 0;
> +  

WS

@@ -104,5 @@
>    function run_test() {
>      setupFakeLoopServer();
>  
> -    Services.prefs.setCharPref("services.push.serverURL", kServerPushUrl);
> -    Services.prefs.setCharPref("loop.server", kLoopServerUrl);

I don't think you want to remove the kLoopServerUrl override -- otherwise you'll get whacked on the test cluster for trying to attach to the real loop server (right?).
Attachment #8539491 - Flags: review?(adam) → review+
Comment on attachment 8539491 [details] [diff] [review]
Remove push server fallback from Loop Client

Approval Request Comment
[Feature/regressing bug #]: Loop

[User impact if declined]: It will try to fall back to the old push server

[Describe test coverage new/current, TBPL]: patch is mostly tests.  About to land; will verify before uplift

[Risks and why]: very low risk

[String/UUID change made/needed]: none
Attachment #8539491 - Flags: approval-mozilla-beta?
Attachment #8539491 - Flags: approval-mozilla-aurora?
Attachment #8539491 - Attachment is obsolete: true
Attachment #8539491 - Flags: approval-mozilla-beta?
Attachment #8539491 - Flags: approval-mozilla-aurora?
Comment on attachment 8539567 [details] [diff] [review]
Remove push server fallback from Loop Client

Carry forward r+ abr.
Attachment #8539567 - Flags: review+
Attachment #8539567 - Flags: approval-mozilla-beta?
Attachment #8539567 - Flags: approval-mozilla-aurora?
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/bc185d5c0f6c
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Attachment #8539567 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8539567 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
bc2 permafail due to network access
https://hg.mozilla.org/releases/mozilla-beta/rev/bde0494c6bbd
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
silly mcmerge
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Flags: qe-verify-
Comment on attachment 8541320 [details] [diff] [review]
Part 2: Fix failing tests: prevent spurious network access attemtps

Removed all usage of the services.push.serverURL from any tests since the loop client no longer makes any use of this pref.

Replaced explicit assignment of the loop_fxa.sjs as a hard-coded server path in mochitests to assignment in testing/profiles/prefs_general.js. Along with this change, removed explicit assignment and reset of loop.server in tests.

Add /push-server-config stub to login_fxa.sjs.

Restore longer timeout settings after running push server tests.
Attachment #8541320 - Flags: review?(MattN+bmo)
FYI, the original patch alone with EARLY_BETA effectively re-enabled still has this problem (i.e. it's not the change from EARLY_BETA to late beta, which there was speculation about): https://tbpl.mozilla.org/?tree=Try&rev=fa9263fd3426
Paul-- Do you want to land patch 2 ("fix failing tests...") on Nightly and Aurora as well as Beta?  I believe you do;  so I'm reopening this bug.  Please let me know if I'm correct.

MattN, Mike, Abr -- Are any of you available for a review today (Friday, Dec 26) so we can get this resolved for Fx35?  We're looking for a review on the second patch of this bug.  (I'm not sure what everyone's PTO schedules are;  so I'm doing a needinfo.)

As background, although we landed the fix for this bug on Nightly and Aurora, we couldn't land this fix on Beta (Fx35) due to test failures which Pkerr's second patch now fixes.
Status: RESOLVED → REOPENED
Flags: needinfo?(pkerr)
Flags: needinfo?(mdeboer)
Flags: needinfo?(adam)
Flags: needinfo?(MattN+bmo)
Resolution: FIXED → ---
Yes, that is correct, since what is fixed is more systematic within loop mochitests and not necessarily specific to this bug, which seems to trigger the problem in beta.
Flags: needinfo?(pkerr)
Mark -- If you happen to be about, we need a review on the second patch of this bug so that both patches can land on Beta (Fx35).

I'll take a review from any Loop peer on the second patch.  My plan is to land the second patch on fx-team as we have an r+ and then uplift it to Aurora.  We should then be able to uplift both the first and second patch from this bug to Beta and have it stick.  (The first patch has landed and stuck on Nightly and Aurora, but it had to be backed out of Beta due to test failures which the second patch now fixes.)

NOTE: Last Beta is this coming Monday (29th of Dec).  Thanks!
Flags: needinfo?(standard8)
Comment on attachment 8541320 [details] [diff] [review]
Part 2: Fix failing tests: prevent spurious network access attemtps

Review of attachment 8541320 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for cleaning this up!

::: browser/components/loop/test/mochitest/browser_fxa_login.js
@@ +141,5 @@
>      ok(true, "The login promise should be rejected due to non-JSON params");
>      caught = true;
>    });
>    ok(caught, "Should have caught the rejection");
> +  Services.prefs.setCharPref("loop.server", loopServerUrl);

I think you may also want this in a registerCleanupFunction so that it still gets reset for subsequent test files in the event an exception is thrown before this line is reached.
Attachment #8541320 - Flags: review?(MattN+bmo) → review+
Flags: needinfo?(standard8)
Flags: needinfo?(mdeboer)
Flags: needinfo?(adam)
Flags: needinfo?(MattN+bmo)
I think patch 2 may cover most of bug 1115186.
Blocks: 1115186
Status: REOPENED → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
Comment on attachment 8541320 [details] [diff] [review]
Part 2: Fix failing tests: prevent spurious network access attemtps

Approval Request Comment
[Feature/regressing bug #]: this one/existing issues

[User impact if declined]: inability to land patch 1 from this bug on Beta (and use retries on server failure).

[Describe test coverage new/current, TBPL]: on fx-team.  Will look to uplift after a merge to aurora/beta.

[Risks and why]: Low risk - test changes only to fix a lot of existing "we're-just lucky" instances of avoiding trying access real servers during tests.  Those cause beta to fail with part 1 applied, but likely would bite us in other instances or unrelated changes and become orange or perma-orange.

[String/UUID change made/needed]: none
Attachment #8541320 - Flags: approval-mozilla-beta?
Attachment #8541320 - Flags: approval-mozilla-aurora?
Try run based off of FIREFOX_35_0b6_RELEASE tag on mozilla-beta.

https://tbpl.mozilla.org/?tree=Try&rev=e19ef55f6ce4

bc2 tests pass. Fail on bc1 does not seem be to related to loop tests.
https://hg.mozilla.org/mozilla-central/rev/5a6300422191
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Attachment #8541320 - Flags: approval-mozilla-beta?
Attachment #8541320 - Flags: approval-mozilla-beta+
Attachment #8541320 - Flags: approval-mozilla-aurora?
Attachment #8541320 - Flags: approval-mozilla-aurora+
(In reply to Paul Kerr [:pkerr] from comment #32)
> Try run based off of FIREFOX_35_0b6_RELEASE tag on mozilla-beta.
> 
> https://tbpl.mozilla.org/?tree=Try&rev=e19ef55f6ce4
> 
> bc2 tests pass. Fail on bc1 does not seem be to related to loop tests.

And I had green beta (with patch 1) and inbound tries with this too
Iteration: --- → 37.2
You need to log in before you can comment on or make changes to this bug.