Many send-tab telemetry events are missing metadata about the peer device
Categories
(Firefox :: Sync, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: rfkelly, Assigned: eoger)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
The sync ping contains "command-sent" and "command-received" events designed to measure the success rate of send-tab. Ideally, for each such event we would be able to locate the peer device (that is, the device to which our device sent a tab, or from which our device received a tab) as an item in the devices
list from the sync ping, and thereby find out its basic device metadata like type and version.
This redash query explores what percentage of events do not have info about the peer device in their sync ping devices list. Key points:
- Around 20% of events contain the id of the peer device, but do not have it as an entry in the devices list of the containing sync ping.
- A further 15% to 20% of events do not have any info about the peer device at all, not even its id.
The second part is like due to Bug 1639835, but that first 20% is a bit of a mystery.
I was able to find an example of such a ping in my local archive, which I'm happy to share for reference:
15/05/2020 14:15
{
"os": {
"installYear": 2019,
"hasSuperfetch": true,
"hasPrefetch": true,
"name": "Windows_NT",
"version": "10.0",
"locale": "en-AU",
"servicePackMajor": 0,
"servicePackMinor": 0,
"windowsBuildNumber": 19041,
"windowsUBR": 264
},
"why": "shutdown",
"devices": [
{
"id": "04dca4827ac13596a49ce634d0a32f55e2ef4c42962680705be09dad8bbd1df5",
"type": "desktop"
}
],
"version": 1,
"syncs": [
// omitted
],
"uid": "333412aa09dc23fd3a671315e275578b",
"syncNodeType": "spanner",
"deviceID": "04dca4827ac13596a49ce634d0a32f55e2ef4c42962680705be09dad8bbd1df5",
"sessionStartDate": "2020-05-15T11:00:00.0+10:00",
"events": [
[
9968860,
"sync",
"open-uri",
"command-sent",
"83499eeabbcc046a7c3c948d0d5753b133b2bc80a24edc8138bf7228e9335fd9",
{
"flowID": "d50437b4-8fa5-402d-9c19-5d8bf98ddebb",
"serverTime": "1589515308.06"
}
],
[
10026313,
"sync",
"open-uri",
"command-received",
"aabea7d669d1f567a60536ab7e2f800b08b56ae58853f575405f87aa7bc5c71a"
],
[
10168188,
"sync",
"open-uri",
"command-sent",
"aabea7d669d1f567a60536ab7e2f800b08b56ae58853f575405f87aa7bc5c71a",
{
"flowID": "808a2d1e-f576-4553-a9f0-56761b0bce44",
"serverTime": "1589515910.14"
}
]
],
"histograms": {
// omitted
}
}
This ping shows me sending a tab to device 83499eeabbcc046a7c3c948d0d5753b133b2bc80a24edc8138bf7228e9335fd9
, and both sending to and receiving from device aabea7d669d1f567a60536ab7e2f800b08b56ae58853f575405f87aa7bc5c71a
. Neither of those devices appear in the devices
list, which includes only the current device 04dca4827ac13596a49ce634d0a32f55e2ef4c42962680705be09dad8bbd1df5
.
I understand that we can't necessarily include the full list of peer devices in every sync ping, because sometimes it's very large and has lots of duplicates. But is there something we can do to ensure that it at least includes all the devices that are referenced elsewhere in the ping?
Comment 1•4 years ago
|
||
The severity field is not set for this bug.
:markh, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 2•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
I made another finding today: for OAuth clients such as Fenix and iOS, the device lastAccessTime
property never ever gets updated, most likely because we never call /certificate/sign when using OAuth.
The lastAccessTime
property is critical when we send a telemetry ping: we ditch "stale" devices based on that.
I've noticed that the /account/attached_clients
FxA endpoint reports a proper last access time for iOS/Fenix devices, most likely because the server merges information from multiple sources (including the OAuth db) to report a more precise time.
What we should probably do is update the /devices
endpoint with the same mechanism, however :rfkelly noted that "a bunch of technical debt in the fxa-auth-server might be about to come due" and the work is do is not going to be a 1 line fix here.
Reporter | ||
Comment 4•4 years ago
|
||
:rfkelly noted that "a bunch of technical debt in the fxa-auth-server might be about to come due"
Things might be slightly better than I feared, which is always a nice discovery!
The core issue here is the way that we bolted OAuth clients on to the existing devices API:
- Depending on the type of client, a device record may be associated with a sessionToken, an OAuth refreshToken, or both.
- The
devices
table in the FxA auth database contains the device id, core device metadata such as supported commands, and nullablesessionTokenId
andrefreshTokenId
columns.- The
sessionTokenId
references rows in thesessionTokens
table which is in the same database. - The
refreshTokenId
references rows in therefreshTokens
table, which for historical reasons is in a separate database (the oauth-server db).
- The
- The correct
lastAccessTime
for a device is the later of the last-access time for its sessionToken and the last-access time for its refresh token (if any). - When handling the
/attached_clients
endpoint, the FxA server does a query to the auth db for devices and sessions, a separate query to the oauth db for refresh tokens, and merges the two together. Thus, it reports a correctlastAccessTime
for all clients. - At the time this was implemented, we couldn't update the
/devices
endpoint to do the same. Thus, it does not report a correctlastAccessTime
for OAuth clients, which may not have a sessionToken or may not use it regularly.
The reason why /devices
could not be updated at the time are explained in this comment:
// XXX: Ideally, we would call oauthdb.listAuthorizedClients here, and perform a similar merging
// to what we do in /account/attached_clients. That's awkward to do because this route can be called
// with a refreshToken, but oauthdb currently requires a sessionToken. Let's defer that to followup.
The good news is, those reasons no longer hold! Now that the oauth-server codebase has been more fully merged into the auth-server codebase, we don't have to worry about "the oauthdb currently requires a sessionToken" and we can directly query the contents of the oauth db as needed.
Here's what I think we need to do:
- In the code for the
/devices
endpoint, make a query tooauthdb.listAuthorizedClients
in addition to the existing call torequest.app.devices
. - Match the resulting oauth client records to the list of devices using the shared
refreshTokenId
column, similar to how it's done in/attached_clients
. Discard any oauth client records that do not correspond to a device record. - Merge properties of the device record and oauth client record using the same logic used for
/attached_clients
. This will give a correct last-access time, and may also produce e.g. more accurate names or device types.
It's probably worth factoring out "list all devices, merging data from auth and oauth dbs" into some sort of shared helper that can be used by both of these routes, perhaps in ./lib/devices.js
. But I haven't investigated what that might look like in any detail.
Assignee | ||
Comment 5•4 years ago
|
||
Reporter | ||
Comment 6•4 years ago
|
||
The above pull-request was merged, but is not yet live in production. It should go out with the next deploy, FxA train-176.
(Triage: in current spring => P1)
Comment 7•4 years ago
|
||
I think we also still want something like that phabricator patch too? Ed, you are going to persist with that still?
Updated•4 years ago
|
Comment 9•4 years ago
|
||
bugherder |
Assignee | ||
Updated•4 years ago
|
Comment 10•4 years ago
|
||
Unfortunately, I don't think this patch helped. We now include a (very large -- every one of my pings has over 50 devices in them now...) list of devices... but the only devices with usable information in them are the devices that we were previously including.
Specifically, prior to this patch, my list looked like:
[{
"id": "90f50e807e35fc9f12673673dc2ebe1d05e888b9f600579036660199ae452472",
"syncID": "95c2c6978380fd9a0b04c094e71faa565f6cce84aaefe5ebe82c5aebaef08a8c",
"type": "mobile", "os": "iOS", "version": "27.0"
}, {
"id": "0374965ba8f65a01eaeafadcc43026925dbba47e35a766105b2dc2a624b2f65d",
"syncID": "0fe877fe7298246b45f3fddbce9547ea16bb9e63855cbae6c1984f0708ca5725",
"type": "desktop", "os": "Darwin", "version": "80.0a1"
}, {
"id": "d94b178af788cef9447bff331a1bed3bde2e221793ddbf566698e10def68bfd9",
"syncID": "5db36368b081fe2c8205f2cc0ab784cf0968864228a709ab83bed136da39f66f",
"type": "desktop", "os": "WINNT", "version": "79.0"
}]
And afterwards it looks like:
[{
"id": "90f50e807e35fc9f12673673dc2ebe1d05e888b9f600579036660199ae452472",
"syncID": "95c2c6978380fd9a0b04c094e71faa565f6cce84aaefe5ebe82c5aebaef08a8c",
"type": "mobile", "os": "iOS", "version": "27.0"
}, {
"id": "0374965ba8f65a01eaeafadcc43026925dbba47e35a766105b2dc2a624b2f65d",
"syncID": "0fe877fe7298246b45f3fddbce9547ea16bb9e63855cbae6c1984f0708ca5725",
"type": "desktop", "os": "Darwin", "version": "80.0a1"
}, {
"id": "d94b178af788cef9447bff331a1bed3bde2e221793ddbf566698e10def68bfd9",
"syncID": "5db36368b081fe2c8205f2cc0ab784cf0968864228a709ab83bed136da39f66f",
"type": "desktop", "os": "WINNT", "version": "79.0"
},
{ "id": "33aa63c10da1c90f5c8d48167e6efac49ae805933b68789e1743dbf641c49b1d", "type": "desktop" },
{ "id": "964c1567462c445e958400e80650370271e88b7120b17de05b2379610a350e71", "type": "mobile" },
{ "id": "e4ea71d9a55bb56ba4fcee89900923e2cfba564a849763050134ed5a46c808b5", "type": "desktop" },
{ "id": "57cfe7ca0a869d1bbd1142b69692c31864b6137f29dcfbe496c19b238ac19528", "type": "mobile" },
{ "id": "1c72e830cb937c67f663dfa0c1a35e6bbf3b26dc4507157b8eb26475bf483833", "type": "mobile" },
{ "id": "b19410ff122129245ac82aa756ed5ae0cd3793019ce51a5f2ea5657dacef9502", "type": "desktop" },
{ "id": "759461aadaac986767e4354bc9fdb3722a922b5c0b78878ece90eb03b6b66c97", "type": "mobile" },
{ "id": "fb06c88441f33b8d8e756bb258a7a65b5b65e3391ba768dff3110bcc04a97ad0", "type": "desktop" },
{ "id": "39fe4fe1ccc531000488fead7db54069b81ee289b50768b288358cbffba6029f", "type": "mobile" },
{ "id": "706a9f3155252950094e140123da702b811c7f730bba2acde0465fd4be1602e0", "type": "desktop" },
{ "id": "5ee866da368fadcd3dc26449ef3f3a5eae966d969995bc9157d16f4aac7127cb", "type": "desktop" },
{ "id": "40c78192e0a2d67e843df1da17b1fd8f6fc3f0dd0376f6de3518e5f3eac7f044", "type": "mobile" },
{ "id": "8d6375bd1cef89d0a1a6eb7ccf862425c56ada2f8f9328a33bf5dc3ec3938931", "type": "desktop" },
{ "id": "c4937e0d5b9d5437a75c409305107e7c3302ba9b5e4ebfaf2478d3dd4ec4160a", "type": "desktop" },
// ... literally like 40 more along the same lines
Unfortunately, for the purposes of the send-tab telemetry, this data is not useful without the os
and version
information. So I think we still need to fix this issue.
Comment 11•4 years ago
|
||
(In reply to Thom Chiovoloni [:tcsc] (please ni?) from comment #10)
Unfortunately, I don't think this patch helped. We now include a (very large -- every one of my pings has over 50 devices in them now...) list of devices... but the only devices with usable information in them are the devices that we were previously including.
It did help - knowing the device is desktop vs mobile is helpful, especially because we know only desktop records the telemetry. While it would be great if the other devices has os
and version
, there's no regression here.
Description
•