Closed Bug 1314879 Opened 8 years ago Closed 8 years ago

[desktop] Add a unique GUID to each command sent via the clients collection.

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: markh, Assigned: markh)

References

Details

(Whiteboard: [send-tab-funnel])

Attachments

(1 file)

This is for the desktop implementation of bug 1314878
Comment on attachment 8808067 [details]
Bug 1314879 - Add a unique flowID GUID to each command sent via the clients collection.

https://reviewboard.mozilla.org/r/90988/#review90718

::: services/sync/modules/engines/clients.js:493
(Diff revision 1)
>      }
>  
>      let action = {
>        command: command,
>        args: args,
> +      flowID: Utils.makeGUID(), // used for telemetry.

This is a very simple patch, which adds a GUID to each command sent by desktop. Bug 1289536 is where we will record telemetry for this, and obviously we want to end up with the same "flowID" on Android and iOS. I wont land this yet, but want an general thumbs-up on the approach so I can open bugs for those other platforms.
Assignee: nobody → markh
Priority: -- → P1
Comment on attachment 8808067 [details]
Bug 1314879 - Add a unique flowID GUID to each command sent via the clients collection.

https://reviewboard.mozilla.org/r/90988/#review91044

::: services/sync/modules/engines/clients.js:493
(Diff revision 1)
>      }
>  
>      let action = {
>        command: command,
>        args: args,
> +      flowID: Utils.makeGUID(), // used for telemetry.

Questions:

- What else will you need to send with every command in order to make this useful? The sender ID? The client-side timestamp at which the command was created? The server-side timestamp of the client record into which this command was inserted? I'd rather not churn this four times if we can do it once.
- Do you need this flowID (and the rest) for every command, or only for displayURI? Does it hurt to include it on every command?
- Generalize? Should this be `commandMeta` instead of a plain field?

Please also file a bug to update the object format docs.
(In reply to Richard Newman [:rnewman] from comment #3)
> - What else will you need to send with every command in order to make this
> useful? The sender ID? The client-side timestamp at which the command was
> created? The server-side timestamp of the client record into which this
> command was inserted? I'd rather not churn this four times if we can do it
> once.

I think we only need this GUID, although telemetry will record the timestamp of when we recorded the event in telemetry.

The intent of this is to be able to determine how many such commands went missing - in aggregate we probably only need raw counts, but Alex strongly wants to be able to identify is missing commands are clustered around individual users or wide-spread.

The sender and receiver IDs (and platform etc info) are included in the ping.

The server timestamp is interesting, but I'm not sure we need that and would complicate things a little (ie, we could only then record the telemetry after posting to the server, which will require additional housekeeping to get right)

> - Do you need this flowID (and the rest) for every command, or only for
> displayURI? Does it hurt to include it on every command?

We've only identified we need it for displayURI, but I can see it being valuable for "repair" related commands we are discussing. I believe the low frequency of these commands being exchanged make it reasonable to do for every command.

> - Generalize? Should this be `commandMeta` instead of a plain field?

That's an interesting idea. The telemetry feature we will be using here will be limited to reporting an object with only string names and values, but that might be OK here, and would probably mean we could add more metadata without changing the telemetry code.

> Please also file a bug to update the object format docs.

Will do, thanks. I'll circle back to this later, but added the above notes while this bug is fresh in both our minds.
(In reply to Mark Hammond [:markh] from comment #4)
> The sender and receiver IDs (and platform etc info) are included in the ping.

Actually, that's wrong - there's nothing in the ping that will tell us the specific client the command was sent to.

> That's an interesting idea. The telemetry feature we will be using here will
> be limited to reporting an object with only string names and values, but
> that might be OK here, and would probably mean we could add more metadata
> without changing the telemetry code.

So yeah, I think I'll use this dictionary with { flowID, clientID }
I was actually thinking *sender* client ID — in the general case, a client receiving a command doesn't know who sent it!

*You* can figure that out by pairing up the flowID, but the client can't, and it seems like a useful capability:

{ from, to, when, flowID }

would let clients self-report a lot of stuff, as well as being able to figure out how quickly we can get commands through.
(In reply to Richard Newman [:rnewman] from comment #6)
> I was actually thinking *sender* client ID — in the general case, a client
> receiving a command doesn't know who sent it!
> 
> *You* can figure that out by pairing up the flowID, but the client can't,
> and it seems like a useful capability:

From telemetry's POV, we don't need to pair the flowID to discover the sender - we know because the sender will record it in its ping, which includes the current device/user info.

> { from, to, when, flowID }
> 
> would let clients self-report a lot of stuff, as well as being able to
> figure out how quickly we can get commands through.

But yeah, that does make sense - we can include additional metadata, but don't need to report all of it explicitly in telemetry.

(The reason the to ID would be good for telemetry is that we would, with some effort, then be capable of determining when the "to" client has never again synced after having been sent the command - we shouldn't consider that as a "failure", just an oddity.

Re "when", I'm not sure how useful that would be in practice - the local clock time is unreliable, the "most recent" last-modified of the clients collection doesn't seem particularly useful, and the last-modified time of the collection we are about to write is obviously in the future so impossible to note. What did you have in mind here?
Blocks: 1314878
(In reply to Mark Hammond [:markh] from comment #7)

> Re "when", I'm not sure how useful that would be in practice - the local
> clock time is unreliable, the "most recent" last-modified of the clients
> collection doesn't seem particularly useful, and the last-modified time of
> the collection we are about to write is obviously in the future so
> impossible to note. What did you have in mind here?

It's one thing to rely on a clock that's wrong 5% of the time for conflict avoidance, and another to use a clock that's right 95% of the time (and close 99% of the time) for logging, telemetry, heuristics, or UI.

Furthermore, we know the exact server timestamp mere milliseconds before we do the upload: we routinely download destination client records in order to concatenate our commands and reupload with XIUS. So we _can_ use a pretty good server time here: the timestamp header from that download, plus the realtime clock since the download.

Adding 'when' here would allow UI like "sent on Wednesday" for sent tabs, measuring round-trip times for fix commands, evaluating end-to-end times for send tab, and so on. I'm actually a little surprised that we haven't timestamped commands before now…
(In reply to Richard Newman [:rnewman] from comment #8)
> (In reply to Mark Hammond [:markh] from comment #7)
> 
> > Re "when", I'm not sure how useful that would be in practice - the local
> > clock time is unreliable, the "most recent" last-modified of the clients
> > collection doesn't seem particularly useful, and the last-modified time of
> > the collection we are about to write is obviously in the future so
> > impossible to note. What did you have in mind here?
> 
> It's one thing to rely on a clock that's wrong 5% of the time for conflict
> avoidance, and another to use a clock that's right 95% of the time (and
> close 99% of the time) for logging, telemetry, heuristics, or UI.
> 
> Furthermore, we know the exact server timestamp mere milliseconds before we
> do the upload:

Yeah - I opened bug 1310623 to record server timestamps, but didn't really want to block this.

> Adding 'when' here would allow UI like "sent on Wednesday" for sent tabs,
> measuring round-trip times for fix commands, evaluating end-to-end times for
> send tab, and so on. I'm actually a little surprised that we haven't
> timestamped commands before now…

Note that for telemetry, the when *will* be recorded - so if telemetry is the concern, then adding the timestamp to the command itself would simply be redundant.

Relatedly, a similar thing is coming up for this "repair" work - eg, how can we differentiate a "before" validation from an "after" one? The "when" sucks here, as even though there may only be 5% with a wrong clock, those 5% would show the exact opposite results - ie, we'd see 5% of users go *backwards* after a repair - it would be awesome to get some confidence around whether these are simply local clock issues vs functional repair issues.  Given both (ie, "send tab", "repair") of these things are also recording a "flowID", I was wondering if some kind of "flow sequence" would be even better? But as above, I'm primarily thinking of this sequence being recorded only in telemetry and keeping the commands themselves with only the data they need to actually work correctly.
The usual approach would be some flow ID plus a sequence number… but of course, you often can't rely on persistence of those sequence numbers, so you can get duplicates in any situation in which your action and persistence are separate. (Sequence numbers are great for messages, of course.)

The best way to distinguish before and after is to use a server timestamp. Conveniently enough, every time we touch the Sync server we get one, so I would suggest tracking the timestamp at which /storage/bookmarks was fetched for each validation.

(services.sync.bookmarks.lastSync would be enough, except that it advances on upload and presumably doesn't advance if we can't merge….)

Similarly, sending a tab involves touching the server twice, which again gives you the opportunity to grab a timestamp.

Feel free to land just the flowID, just do so in a way that doesn't involve too much format churn when you add more stuff!
Comment on attachment 8808067 [details]
Bug 1314879 - Add a unique flowID GUID to each command sent via the clients collection.

https://reviewboard.mozilla.org/r/90988/#review92764

r+ to the general approach; see comments in bug.
Attachment #8808067 - Flags: review?(rnewman) → review+
Whiteboard: [send-tab-funnel]
Splitting this from bug 1289536 made for less cohesive patches, so I plan on landing 2 parts in bug 1289536.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: