Closed Bug 714304 Opened 8 years ago Closed 8 years ago

Implement handling of "upgrade required" response

Categories

(Firefox for Android :: Android Sync, defect, P3)

ARM
Android
defect

Tracking

()

RESOLVED FIXED
mozilla15
Tracking Status
firefox14 --- fixed
blocking-fennec1.0 --- +

People

(Reporter: rnewman, Assigned: liuche)

References

Details

(Whiteboard: [qa+])

Attachments

(2 files)

I want to make sure that "orphaned" Sync accounts on Android devices don't end up being a punishing load on our infrastructure. We intend to have a get-out clause for this by conditionally returning HTTP 426 Upgrade Required, and have all Android Sync clients watch for this and self-disable.

Alongside auto-update checks, this should keep our poor infra safe.
Do we use HTTP 426 today? If not, I have hesitations about introducing it.

My main concern is I don't feel it is appropriate for our use. AFAICT 426 is intended to be utilized for switching the protocol on the channel (e.g. to TLS or to some other non-HTTP protocol). I don't believe we would be doing this, so the use of 426 isn't appropriate.
Please suggest something better.
(In reply to Gregory Szorc [:gps] from comment #1)
> Do we use HTTP 426 today? If not, I have hesitations about introducing it.

I don't believe we do.

> My main concern is I don't feel it is appropriate for our use. AFAICT 426 is
> intended to be utilized for switching the protocol on the channel (e.g. to
> TLS or to some other non-HTTP protocol). I don't believe we would be doing
> this, so the use of 426 isn't appropriate.

You are correct that existing uses of 426 are typically associated with an Upgrade: header for protocol switching. That said, it's semantically closest to what we are asserting to the client: "you need to make some change in how you make requests before we will proceed".

The alternative would be to send some other 4xx or 5xx response with something like an X-Weave-Piss-Off header, which I consider to be less meaningful.

I did my research and discussed with atoll before settling on this as "good enough". I would be ecstatic to find something better. Suggestions welcome.

Requirements: be understood by clients to mean "client isn't good enough, don't try again"; doesn't require parsing another header to be understood; doesn't collide with some other Sync code.
I think gps's concern is valid, since 426 will be consumed by Necko someday before it ever reaches us, if we do implement it.
I'm not sure exactly what you are trying to do here. But, I'll throw out the following suggestions:

1) For REST services, you can expose newer instances/versions of the service under a new URI space. Old/incompatible clients that hit an unsupported API get a 3XX or 410 response.

2) Bake some kind of version exchange in the headers. Some will argue URIs should be used for this in a REST service.

3) Return HTTP 403 Forbidden. The response body could contain JSON that tells clients to disable.

Either way, it sounds like this will require a new Sync HTTP storage API version, no?
(In reply to Gregory Szorc [:gps] from comment #5)
> I'm not sure exactly what you are trying to do here.

I'm attempting to identify a response which it's acceptable for a client to interpret as "never sync again".

Background, which I was too lazy to put in Comment 0: because Sync now runs independently of Fennec, we will gradually accrue users who install Firefox on their Android device, set up Sync, and then never use or upgrade Fennec again. 

Unlike XUL Sync, their Sync client will be happily running in the background for so long as their phone is on, until they uninstall Fennec. Given the number of Weave 0.1ish users we still have, this is something I want to avoid. Old clients will have bugs, security issues, or might otherwise benefit from being told to go away. We could do that with ordinary 4xx or 5xx responses, but then we end up with the situation we have now -- old Sync 1.7 clients that make Request URI Too Long requests over and over, getting a 400 back and retrying a little later.

The solution is to have a way of simply and unambiguously saying "go away and don't come back". Backoff isn't suited for this (easily mis-handled or lost by the client, can't be distinguished as "upgrade required").


> Either way, it sounds like this will require a new Sync HTTP storage API
> version, no?

This is exactly what I want to avoid, because I want to do this *this week*, and I want this to be implementable in Zeus. That is, it should be as simple as

  if (userAgent == "AndroidSync 0.1") {
    response.status = 498;
    response.addHeader("X-Weave-Alert", "Piss off and upgrade");
    // Yadda yadda.
    return;
  }

That means a decent hack now, and a proper solution later. Toby and I have already discussed this for Sync 2.0, I think.

"Decent hack" seems to imply a status code or header that's not already in use, and won't be, and minimal header parsing.

We *could* squat a code -- HTTP 468 Piss Off -- but using Upgrade Required without an Upgrade header seemed quite straightforward.
Since you are looking for a decent hack now and that hack only needs to work on the Android client you are currently writing, I would recommend a custom HTTP header over 426 or custom HTTP status code. Just because it is a hack doesn't give you leeway to disobey HTTP specs. I would also combine this HTTP header with a 403 response, since that has the semantics most closely related to what you are trying to accomplish. Of course, the client would look for the header first and the 403 should never come into play.

I would recommend making the header name as specific as possible. e.g. X-Weave-Client-Upgrade-Needed instead of X-Weave-Stop-Requesting: Your client is too old.

A proper solution can be devised when the storage API version is bumped.
atoll, what are you thoughts? You suggested a status code over a header, so I want to make sure your base concerns are addressed.
HTTP/1.1 400 Bad Request
X-Weave-Alert: ERR_UPGRADE_REQUIRED
X-Weave-Backoff: 2419200

8


Where 8 is the numeric error code labeled ERR_UPGRADE_REQUIRED in the wiki, and triggers localized upgrade text.
wfm.
Summary: Implement handling of HTTP 426 → Implement handling of "upgrade required" response
Attached patch Doc change.Splinter Review
Camping this response code.
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Attachment #585021 - Flags: review?(telliott)
While I don't like using HTTP 400, we use it today and in a very similar manner. So, I think this is a sufficient stop-gap until the next sync protocol version is cut.
Comment on attachment 585021 [details] [diff] [review]
Doc change.

Approach seems least disruptive for now. I'm assuming that we're just going to handle this at the zeus layer.

We didn't discuss anything formal for this in the 2.0 spec, though we should. We have the concept of "account needs purging", but nothing akin to "gtfo"
Attachment #585021 - Flags: review?(telliott) → review+
Doc change landed:

https://hg.mozilla.org/services/docs/rev/79125b1cedf9

This bug will track the Android Sync implementation. We can address the Zeus bit should that ever be necessary.
tracking-fennec: --- → ?
Priority: -- → P3
rnewman, my understanding is we need this for 12, not necessarily 11. Please confirm.
tracking-fennec: ? → 12+
Yup, not something I'm tracking for a particular release; we'll just get it in as soon as we can, so we don't have to block User-Agents for bad versions.
Blocks: 745431
tracking-fennec: 12+ → ---
blocking-fennec1.0: --- → +
I'm terrified of all of these sync bugs where we are still implementing features.  Is this critical to our release?  Why must this bug block if it were the last bug?
(In reply to Damon Sicore (:damons) from comment #17)
> I'm terrified of all of these sync bugs where we are still implementing
> features.  Is this critical to our release?  Why must this bug block if it
> were the last bug?

We discussed this at Monday's triage meeting.

Speaking generally:

All of these items are in the warpath now because we spent the last two or three months implementing super-duper high-priority product desires like form history sync, rather than finishing the protocol implementation.

I do not want to ship a Sync client that has known gaps and undocumented/untested/unaddressed behavior in sizable portions of its state machine.

That leads to situations where we have to make a hard decision like "permanently block *all* Fennec clients from syncing until they upgrade", because the alternative is possible data loss or account damage.

Speaking specifically:

This particular bug is to hard-stop a client if it's outdated and known-bad (which, coincidentally, is the approach we'd take for the above).

The alternative is for us to have a more expensive 503 barrier permanently configured into our LBs, which is not something I want to impose on services ops.

This is low-effort, low-risk, high-reward.
Whiteboard: [not started]
Assignee: rnewman → liuche
Whiteboard: [not started] → [work started]
https://hg.mozilla.org/integration/mozilla-inbound/rev/9882d0013809

Gonna flag this for Aurora approval momentarily, because we wont address it in triage until Monday.
Whiteboard: [work started]
Target Milestone: --- → mozilla15
Attached patch Patch.Splinter Review
Attachment #621228 - Flags: review+
Attachment #621228 - Flags: approval-mozilla-aurora?
QA: to verify this you'll need ops help.

First:

  adb shell setprop log.tag.SyncAccounts VERBOSE

Set up a Sync account against stage. You'll see in Settings that Fennec is checked in the Sync account pane, which means it's syncing. Verify that syncing has occurred normally.

Now ask ops to configure the stage setup to return Upgrade Required (code 16 on a 400).

Start capturing a log on the device, and initiate a sync.

You should see the sync fail immediately, with the Fennec checkbox being *unchecked*.

You'll see in the log

  "Setting authority ... to not sync automatically."

Automatic syncs will now *not* occur after any duration. Forced syncs can be used if you first check the checkbox. Don't test that right now.

Now disable the stage 400 configuration, and install another build of Fennec. (It must be the same channel -- if you're testing a Nightly, install another Nightly. The same build is fine, I think.)

After installation, the Fennec checkbox should be checked again, and syncs should proceed normally.
Whiteboard: [qa+]
https://hg.mozilla.org/mozilla-central/rev/9882d0013809
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Attachment #621228 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This should be safe, and non-harmful even if it doesn't work, so not waiting on explicit QA verification.

https://hg.mozilla.org/releases/mozilla-aurora/rev/8c8e96d70ab5
Product: Mozilla Services → Android Background Services
Blocks: 895526
Product: Android Background Services → Firefox for Android
You need to log in before you can comment on or make changes to this bug.