Closed Bug 1179274 Opened 9 years ago Closed 8 years ago

Provide way to distinguish between "Sync now" Sync requests and scheduled Sync requests.

Categories

(Firefox :: Sync, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bobm, Unassigned)

References

Details

Attachments

(1 file)

Users running the "Sync now" option are exposed to errors that are ordinarily ignored by a scheduled sync.  Distinguishing between "Sync now" requests and scheduled syncs will allow us to measure how often users see an error bar in this condition.  Also useful: we'll be able to determine how often, and how many users use the "Sync now" feature.

Instrumenting this may be as simple as making a cosmetic alteration to HTTP requests, or might require server modification.
QA Contact: kthiessen
I think we can do this fairly simply on the info/collections request we make - I think that happens exactly once per sync. Simplest way seems to be a header - something like "sync-forced: [1|0]" (X- headers are out of style now, right?)

Bob, sound OK?

But, where you say: "will allow us to measure how often users see an error bar in this condition" - I don't see how you can deduce this as you don't know whether the sync succeeded or not.
Flags: needinfo?(bobm)
(In reply to Mark Hammond [:markh] from comment #1)
> I think we can do this fairly simply on the info/collections request we make
> - I think that happens exactly once per sync. Simplest way seems to be a
> header - something like "sync-forced: [1|0]" (X- headers are out of style
> now, right?)

It would be best if we could get a NOOP type of parameter from the URL so we could pick it up out of the logs easily.  As discussed on IRC, setting forced=true or something similar.  That way we could potentially draw further conclusions about when a Sync happened due to the polling interval, score threshold, or Sync now button.
 
> But, where you say: "will allow us to measure how often users see an error
> bar in this condition" - I don't see how you can deduce this as you don't
> know whether the sync succeeded or not.

My initial thought was that this would be appended to every URL request, and we could just count the errors code returned to requests with the flag.  As it happens, as explained in the IRC conversation, the Sync code is mixed into the Fx code base in such a manner as to make this impractical to implement.

We're missing the ability to separate Sync traffic by sessions.  We could make reasonable guesses based on source IP, UA, and time frames after a GET on info/collections.  But, they're kind of painful, non-time series friendly approximations.  Perhaps we could implement some server side logic involving some attribute of the token to generate session information?  That might be hard because token's are presently valid for an hour.

Here's my next bad idea: make some kind of "I'm done syncing now" request at the end of a successful Sync.
Flags: needinfo?(bobm)
> make some kind of "I'm done syncing now" request at the end of a successful Sync.

I like this idea of puting bookends around the existing sync protocol, having each device say "I'm starting to sync now" and "I finished syncing now" or "I error out and am aborting this sync".  Not only useful for diagnosing server problems, but it could also be nice information to display in a "connected devices" dashboard.  These are your devices, and the last time each synced successfully.
Assignee: nobody → markh
(In reply to Ryan Kelly [:rfkelly] from comment #3)
> I like this idea of puting bookends around the existing sync protocol,
> having each device say "I'm starting to sync now" and "I finished syncing
> now" or "I error out and am aborting this sync".  Not only useful for
> diagnosing server problems, but it could also be nice information to display
> in a "connected devices" dashboard.  These are your devices, and the last
> time each synced successfully.

That sounds like a good idea, but I think we should tackle it in a different bug based around a real use-case (ie, if we decide to actually show that last-sync in the devices dashboard.) IIUC we'd need a new endpoint to hit, and it would double the number of requests made for a typical noop sync (currently we hit info/collections and that's it).
This patch adds a "reason=" param to pretty-much every request made by Sync (but *not* to FxA related requests, including the tokenserver). The values for reason are currently "syncnow", "schedule", "about-synctabs" and "syncedtabs".

Instead of adding another optional param to the sync method, I changed this to be a single "options" object where engineNamesToSync and reason are valid (but invalid values are ignored).

This also changes the lock-check semantics slightly - instead of handling the exception thrown by the lock methods we now check this.locked and return early if we are locked - this is because both the login process and the sync itself want access to .reason, making it tricky to set early enough but only when we aren't already syncing. It's not particularly pretty, but it seems quite safe - a worst case scenario is that we still throw due to the lock and we've now set a different syncOptions, the consequences of which seem small.

Richard, what do you think?
Attachment #8702462 - Flags: review?(rnewman)
(In reply to Mark Hammond [:markh] from comment #1)
> I think we can do this fairly simply on the info/collections request we make
> - I think that happens exactly once per sync. Simplest way seems to be a
> header - something like "sync-forced: [1|0]" (X- headers are out of style
> now, right?)

We already use URL parameters on the first info/collections request in at least one situation:

    // Ping the server with a special info request once a day.
    let infoURL = this.service.infoURL;
    let now = Math.floor(Date.now() / 1000);
    let lastPing = Svc.Prefs.get("lastPing", 0);
    if (now - lastPing > 86400) { // 60 * 60 * 24
      infoURL += "?v=" + WEAVE_VERSION;
      Svc.Prefs.set("lastPing", now);
    }

But whatever ops wants is best regardless :)
Comment on attachment 8702462 [details] [diff] [review]
0001-Bug-1179274-add-a-reason-URL-param-to-most-requests-.patch

Review of attachment 8702462 [details] [diff] [review]:
-----------------------------------------------------------------

I dislike having the reason attached to every request, because it complicates things more than I think is worthwhile for the value. Having URL concat scattered through various files is a bad smell.

Here are three counter-proposals.

0. Do this only for info/collections, using a similar mechanism to in my last comment. This isolates the change and makes me a lot happier. If we're not getting huge benefit from annotating every request, then let's not do so.

If we do see a benefit from annotating every request:

1. Do this through the token mechanism instead. Return a decorated token (after all, it's opaque), or decorate it ourselves. The Sync server sees the token on every request. Voila! Minimal client changes.

2. Do this through an intermediate sync object (a "session"). This is a bigger refactor on desktop, but basically free on Android and iOS: rather than Weave.Service.sync(), we'd do Weave.Something.begin("syncNow").sync(). Then the session's request/resource methods would be responsible for decorating for that particular syncing session, which gets rid of the groaty syncOptions state.

::: services/sync/modules/engines.js
@@ +1441,5 @@
> +      let url = this.engineURL;
> +      if (this.service.syncOptions.reason) {
> +        url += "?reason=" + this.service.syncOptions.reason;
> +      }
> +      let up = new Collection(url, null, this.service);

I am surprised this works correctly.

::: services/sync/modules/service.js
@@ +535,5 @@
>     * Obtain a Resource instance with authentication credentials.
>     */
>    resource: function resource(url) {
> +    if (this.syncOptions.reason) {
> +      url += "?reason=" + this.syncOptions.reason;

This will break when I do

  Weave.Service.resource(Weave.Service.storageURL + "bookmarks/abcdef?full=1");
Attachment #8702462 - Flags: review?(rnewman) → review-
Bob, do (0) or (1) above work for you? I'd prefer to avoid (2) as it becomes more work for little gain IMO.
Flags: needinfo?(bobm)
Ideally, I think we'd like to be able to tell a "sync now" request just by looking at the URL.  Part of the idea here is to determine how often we serve errors to "sync now" requests, which is difficult to tell reliably from in-app logging (which would be required to e.g. extract this information from the token).
(In reply to Ryan Kelly [:rfkelly] from comment #9)
> Ideally, I think we'd like to be able to tell a "sync now" request just by
> looking at the URL.  Part of the idea here is to determine how often we
> serve errors to "sync now" requests, which is difficult to tell reliably
> from in-app logging (which would be required to e.g. extract this
> information from the token).

Bob also said that one potential use-case is that we may be able to prioritize syncs other than "scheduled" ones in the theory that if we have a choice between requests to reject, scheduled would be a good one to choose. IIUC, this is why he'd prefer it on every request and not just info/collections.

It does sound like (2) is the way to go, although I honestly don't think this is worth the "bigger refactor" Richard mentions when it's not at all clear to me it would make things cleaner - indeed, I can only picture it making things worse. Richard, can you flesh that idea out some more? It sounds like you don't want the "session" object to be owned by the "service" (ie, you don't want |this.service.session| to be a thing) but all of the places we need to add this already have a reference to the service - having them hold a different reference to a "session" just to pick up the reason string sounds uglier than what I had - so can you please clarify?
Flags: needinfo?(bobm) → needinfo?(rnewman)
(In reply to Mark Hammond [:markh] from comment #10)

> Bob also said that one potential use-case is that we may be able to
> prioritize syncs other than "scheduled" ones in the theory that if we have a
> choice between requests to reject, scheduled would be a good one to choose.
> IIUC, this is why he'd prefer it on every request and not just
> info/collections.

Interesting.

Perhaps simply prioritizing writes would be enough?

Otherwise, this might warrant a little more thought. I'm not convinced that the same mechanism is right for the instrumentation purposes in Comment 0 and also for prioritization. If we really do want to do some kind of load/priority stuff, having the server guess what's important from those rudimentary states on each individual request might not be good enough.

I also suspect that doing prioritization even half right will be intertwined with how we address race resistance: better to finish an existing sync than to start a forced sync from another device, no?


> It does sound like (2) is the way to go, although I honestly don't think
> this is worth the "bigger refactor" Richard mentions when it's not at all
> clear to me it would make things cleaner - indeed, I can only picture it
> making things worse. Richard, can you flesh that idea out some more?

The desktop codebase is (still) very singleton-heavy. Service + State + Record + … all combine to describe both the persistent state of the user's account and Sync configuration, but also the in-flight state of the current sync, and some state that blurs the two (e.g., failures, backoff).

You can imagine a world in which all of the state that's needed to authenticate, decorate, and construct requests is wrapped up in an object with a smaller scope, derived from FxA data, Sync data, and user action, capturing what it needs from the broader environment in order to make a single sync happen.

(We started down this route a little with the authenticator mechanism: we get one from your FxA + token server client, and we hold on to it in the service to construct requests.)

Compare and contrast this web of state to the iOS codebase: it's literally impossible to construct a request to a Sync storage server without advancing through an explicit FxA state machine, then through the Sync state machine, getting a Ready state out the end.

That state can give you a collection client for a particular collection, with a strongly typed payload, and can post and get records from that collection.

https://github.com/mozilla/firefox-ios/blob/98ae79cf85117fc5cfd288af4455de3e73c30e2a/Sync/StorageClient.swift#L590

It looks like this in use:

https://github.com/mozilla/firefox-ios/blob/98ae79cf85117fc5cfd288af4455de3e73c30e2a/Sync/Synchronizers/LoginsSynchronizer.swift#L152

The way this bug would look on iOS would be: thread the reason/kind of sync in early, and hide it in the Sync15StorageClient. The actual syncing code -- the stuff that calls .post and .getSince -- wouldn't need to think about it at all, because they never touch the HTTP layer.

To do similar on desktop would involve splitting out the in-the-act stuff out of Service, so we'd have a storage client rather than calling Service.resource directly from all over the place. (Go grep for "storageURL" in s/sync/m!)

Probably the right barometer is: if we have more than one place adding parameters to a storage URL, we're doing something wrong, and we should introduce a factory.

We don't need to do a full rearrangement of Service, but we shouldn't have more than one place that needs to know what kind of sync is happening. That might require extracting a storage client, or a session, or something along those lines.
Flags: needinfo?(rnewman)
Flags: firefox-backlog+
I feel it would be easier to start with just measuring how often users are jamming the button.
Priority: -- → P3
Blocks: 1241563
Priorities have shifted, so I'm not actively working on this. Bob, please re-bump this bug if you feel there are concrete steps ops would like to take (along the lines of comment 2) that you are blocked on by not having this.
Assignee: markh → nobody
(In reply to Mark Hammond [:markh] from comment #13)
> Priorities have shifted, so I'm not actively working on this. Bob, please
> re-bump this bug if you feel there are concrete steps ops would like to take
> (along the lines of comment 2) that you are blocked on by not having this.

This matters much less now that the UI has changed and we're no longer concerned about displaying error bars.  Though I still think it would be interesting to know how often we let the user down in this context, and in general how Syncs are triggered, there's nothing operationally I'm planning on doing with that information.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: Firefox Sync: Cross-client → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: