Glean SDK Profile Backup and Restore
Categories
(Data Platform and Tools :: Glean: SDK, task)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: mconley)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fidefe-device-migration])
Firefox is building a profile backup+restore mechanism. Data Science requires specific behaviour from the client_id
on restore:
- The profile's
client_id
shouldn't change, even if a backup is restored over top of it - The backed-up
client_id
should be reported alongside the currentclient_id
in a ping reporting on the restoration
This means we need some way to backup the client_id
and perform logic on restoration to report it.
...but what else should we backup and restore? seq
? first_run_date
? Are there user
-lifetime metrics we want to bring with us? Or, like the client_id
, just report them so we know what they were and what they now are?
This bug is about designing and implementing backup and restore logic for the Glean SDK that:
- Satisfies Data Science's
client_id
behaviour requirements - Ensures that the SDK operates nicely after restoration
- Instruments the SDK's restoration sufficiently that we can validate things went well within the SDK
And, most importantly, this bug is about working closely with the Firefox devs working on bug 1885955, so that the APIs and behaviour are what they need and expect.
Assignee | ||
Updated•25 days ago
|
Updated•25 days ago
|
Assignee | ||
Comment 1•25 days ago
|
||
The profile's client_id shouldn't change, even if a backup is restored over top of it
I was to expand on this a bit to avoid confusion - the restoration mechanism for backups does not overwrite anything, or restore over top of existing data.
Let's say we have a computer with a Firefox profile A. Let's also say that we also have a Firefox with a profile B. These profiles might be on the same or different devices. In some cases A and B might actually be the same device and profile, but just at different points in time!
Let's say that A has created a backup for itself. We'll call that the A-Backup.
In order to recover from the backup, one must use a running instance of Firefox, and tell it to recover from the backup archive. Let's say that B is being used to do that - to initiate recovery of the A-Backup.
What happens is that B creates a new empty profile, we'll call it C, and then copies the contents of the A-Backup into C.
What data science wants is that C always inherits the client ID of the profile that initiated recovery, so in this case, B. C then becomes the default user profile on the device that it's running on. It is known that B and C will then share client IDs, but the expectation is that it's unlikely that the user will go back to actively using B.
We expect the common case is that B is actually just A, but in the future after the backup was created - OR, that B is a very recently created user profile on a new machine that is being used to recover from the A-Backup.
Hopefully all of this pseudo-algebra made things clearer both for you and me and for future historians, instead of complicating things. :)
Assignee | ||
Comment 2•25 days ago
|
||
Speaking with nflorez, our team's data scientist, she writes that things like profile age should match the client ID that is associated with it. So what I expect is that most of the metadata about B should be copied over to C. But no backlogged pings.
Reporter | ||
Comment 3•22 days ago
|
||
You want me to assign this to you for the proposal writing part, Mike?
Assignee | ||
Comment 4•22 days ago
|
||
Yeah, I'll take this for now while I get this document off the ground.
Assignee | ||
Comment 5•22 days ago
|
||
I have a draft here: https://docs.google.com/document/d/1sXNWmgImAu3XfNAq7w-VP_MbDPZ2IyhzPsdZEWy82Gw/edit
but I think I've taken it about as far as I can without some additional guidance or feedback.
Assignee | ||
Updated•21 days ago
|
Reporter | ||
Comment 6•21 days ago
|
||
As I said in the doc:
Seems like a good overview and explanation of what’s required. May need some more technical wrangling about who’s passing what file paths and who’s doing what I/O. For that, I’ll summon Travis.
Travis: does the proposal contain sufficient detail to design+impl the necessary APIs in the Glean SDK? I figure it won't need to concern itself with File I/O and can leave the file management to FOG. Instead it could probably return and consume some structured or opaque data as you'd like.
Comment 7•21 days ago
|
||
Travis: does the proposal contain sufficient detail to design+impl the necessary APIs in the Glean SDK? I figure it won't need to concern itself with File I/O and can leave the file management to FOG. Instead it could probably return and consume some structured or opaque data as you'd like.
I believe so, the requirements seem pretty clear to me. Not needing to handle storage concerns is great, and having a small-ish external API surface to get() and put() backup data seems quite bearable to me. You have my blessings as SDK tech-lead on this approach.
Assignee | ||
Comment 8•15 days ago
|
||
Okay, sounds like we've got high-level sign-off on this proposal? What's generally the next step for making this kind of change in Glean?
Comment 9•15 days ago
|
||
The next steps is to get this work assigned and implemented. We are always open to outside contributions but this one seems to be a little bit of a deep cut to ask of an outside contributor, so let me bring this up in the next Glean SDK meeting (and/or in our team channel) and I'll get back to this bug with an ETA on when we can get this done.
Reporter | ||
Updated•14 days ago
|
Description
•