Expose endpoint for getting hashed_fxa_uid
Categories
(Cloud Services :: Server: Firefox Accounts, enhancement)
Tracking
(Not tracked)
People
(Reporter: tcsc, Unassigned)
References
(Blocks 1 open bug)
Details
This is used to hash certain things in telemetry, such as device IDs.
Right now this is provided by the token server, however after the sync/fxa decouple, non-sync users won't have this.
We could still get a token for them just to get this value, but doing so seems dodgy. rfkelly suggested at one point that a good way to do this would be an endpoint that returned it from FxA.
Comment 1•5 years ago
|
||
I think Ryan was even suggesting existing endpoints might be able to return it. Vlad, do you remember any discussions around this?
Comment 2•5 years ago
|
||
:tcsc or :markh - are there any documents about what you need? I'm not aware of any existing endpoints on the FxA side of things that return the hashed FxA uid, IIRC, Ryan said the token server might.
Comment 3•5 years ago
|
||
The tokenserver does, but we want it for non-sync users. Ryan had some thoughts here, but I can't recall the specific details (eg, I think he was suggesting we add this to an existing end-point rather than creating a new one). Over the next couple of days I hope to determine if we can just wait for him to get back before moving forward here :)
Comment 4•5 years ago
|
||
I'm kicking this to future milestones waiting on Ryan. If this is critical, please let me know.
Comment 5•5 years ago
|
||
Here's a github issue where we kicked this topic around a bit previously:
Comment 6•5 years ago
•
|
||
I don't think we have a good written-down understanding of the intended privacy/security properties of the hashed_fxa_uid
identifier, or to be honest, of how well it actually meets them. Here's what I can reconstruct from memory:
- We want to be able to submit sync-related telemetry pings, and correlate pings that came from the same user on different devices.
- We do not want to be able to easily correlate those telemetry pings back to a real user account.
- So, we have tokenserver hmac the fxa uid with a server-side secret key and give that value to the client when connecting to sync.
- Sync clients include this
hashed_fxa_uid
in their sync telemetry ping, allowing us to achieve (1) while making (2) harder.
The important considerations around handling of hashed_fxa_uid
are thus:
- Who is able to reverse a
hashed_fxa_uid
back to the underlyinguid
(and thus link any sync telemetry ping back to a real user)? - Who is able to find out a particular user's
hashed_fxa_uid
(and thus identify all the sync telemetry pings from that user)?
Whatever we do here, we should try not to change the answer to those two questions, or be very deliberate about it if we do.
We want to replace this ad-hoc sync-specific telemetry mechanism with general-purpose ecosystem telemetry in the short- to mid-term, and I'm cc'ing :m_and_m who is leading that effort. I'll also note that a similar hmacing-with-a-secret-key approach was proposed for ecosystem telemetry and was rejected, because in practice it won't stop Mozilla operations folks from having both capabilities listed above.
But, it's the scheme we have for now in sync.
We're working to make it possible for users to sign in to Firefox and use services other than Sync, such as Monitor and FPN. Part of this "decoupling" work is to decouple signed-in-to-Firefox telemetry from sync-specific telemetry. Given that ecosystem telemetry isn't ready yet, we want to keep using the hashed_fxa_uid
mechanism in order to avoid losing visibility into browser-side FxA metrics.
So, we'd like a way to obtain the hashed_fxa_uid
without accessing the Sync service (since accessing the Sync service just to get some metrics stuff, might itself generate misleading metrics about usage of the Sync service!).
Given our ambitions to replace hashed_fxa_uid
with ecosystem telemetry, I think it'd be easy to spend too much effort on designing an elegant solution here. I had a couple of less-than-elegant ideas.
1) Return it as part of the user's profile data
The simplest option I can think of here is to have fxa-profile-server return a new hashed_uid
field as part of the user's profile data bundle. Our clients are calling that endpoint anyway, and desktop has robust support for storing and querying the data it returns. This would involve copying the hmacing logic from tokenserver into fxa-profile-server, along with the corresponding secret key.
Pros:
- Small change, if a bit weird.
- Client can avoid making an extra request, since it's fetching profile data already.
Cons:
- Anyone with a "profile"-scoped access token can find out the
hashed_fxa_uid
for that user.- In theory such tokens are only held by internal Mozilla RPs, but still, it's a change to the boundaries of the system.
2) Return it as part of the fxaccounts:login
webchannel message
The login data returned during webchannel login already contains the raw uid
. We could update fxa-auth-server to calculate hashed_fxa_uid
and return it in the response for /account/create
, /account/login
and /session/reauth
requests. The web content on accounts.firefox.com could then forward it on to the browser as part of the webchannel message.
Pros:
- Keeps current access restrictions intact
- (the data in
fxaccounts:login
is already sufficient to find outhashed_fxa_uid
, by asking the tokenserver for it)
- (the data in
- Client can avoid making an extra request, since it receives and stores this data already
Cons:
- More complicated change, and still pretty weird.
- Only available to webchannel clients
Do either of these alternatives seem like clearly the right way to go?
Comment 7•5 years ago
|
||
Thanks Ryan,
Returning it as part of the profile data seems ideal to me because:
- We fetch the profile even for users already signed in - the webchannel message will leave users already signed in somewhat stuck.
- This also means we could possibly avoid persisting this hashed UID - we don't persist it now. We (almost-)always fetch the profile soon after startup, so should have it for all but very short sessions - but that's true today too - we happily submit a ping without the hashed IDs if we haven't connected to the token server in the current session.
So, we'd like a way to obtain the hashed_fxa_uid without accessing the Sync service (since accessing the Sync service just to get some metrics stuff, might itself generate misleading metrics about usage of the Sync service!).
It's worth asking the question: are we sure that the tradeoffs here aren't OK? Non-sync users hitting the token server just to get the hashed UID sure smells wrong, but depending on various things (eg, those misleading metrics, the extra load and storage on the token server), it might still be a reasonable pragmatic decision given our desire to change this in the short-medium term.
The main winner here would be FxA not needing to do much (any?) extra work, so I suppose I should have let Shane propose that, but I've already written it, so... ;)
Comment 8•5 years ago
|
||
It's worth asking the question: are we sure that the tradeoffs here aren't OK? Non-sync users hitting the token server just to get the hashed UID
sure smells wrong, but depending on various things (eg, those misleading metrics, the extra load and storage on the token server),
it might still be a reasonable pragmatic decision given our desire to change this in the short-medium term.
Hmm, interesting. I've been thinking about adding a kind of "check whether this is an existing sync user" endpoint on tokenserver for other reasons [1], and that might be a useful point at which to have firefox obtain the hashed_fxa_uid
even if it's not going to actually sync.
[1] For example, do know whether we need to show the CWTS screen for users who are signing in.
Updated•5 years ago
|
Comment 9•5 years ago
•
|
||
(eg, those misleading metrics, the extra load and storage on the token server)
So I'm not too worried about extra load and storage on the tokenserver - if we hadn't made this change, we'd expect the majority of these users to end up fully syncing anyway. I'm also not terribly worried about inflating server-side tokenserver metrics, because I don't believe we monitor those for KPIs etc. Here are my concerns about continuing to get this data from tokenserver:
- Allocating users to a storage node when they don't actually intend to sync, might affect the load-balancing of users across nodes, causing some nodes to be underutilized or inflating our measures of how many users can be supported by a given node.
- FxA metrics might report that the user has used the "sync" service when they really haven't, affecting segmentation and analysis of retention, conversation rates, and other important metrics.
- Due to the above, marketing platform might think the user has used the "sync" service when they really haven't, which might result in the user getting a confusing onboarding experience, or missing out on sync upsell opportunities.
I think we can mitigate these concerns while shipping a fairly minimal change by doing the following:
- Adding a new "look but dont touch" endpoint to tokenserver, which reports the status of a user but doesn't assign them to a storage node if they don't already have one.
- I'm thinking
GET /1.0/sync/1.5/userinfo
versus the exisitingGET /1.0/sync/1.5/
but am open to bikeshedding. - The browser can call this instead of the existing tokenserver endpoint for users who have not enabled sync.
- In the future we might be able to use this to report more information about the user, e.g. whether they've made any sync data choices.
- I'm thinking
- When Firefox generates a BrowserID assertion in order to talk to said endpoint, have it send
service=5882386c6d801776
in the query parameters rather thanservice=sync
(or rather than sending noservice
parameter, which I think may be the current behaviour).- This will avoid confusing the FxA metrics, which pull the list of used services from values seen in this field.
- This will require a partial revert of this fxa server-side change, but ces't la vie.
I'll try to timebox a prototype on the tokenserver side to see if this is reasonable in practice.
Updated•5 years ago
|
Comment 10•5 years ago
|
||
This will require a partial revert of this fxa server-side change, but ces't la vie.
Actually, I don't think it will - that change made the service
parameter default to sync
, but still allows you to specify a different value explicitly.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
ni? myself to circle back to this with a concrete proposal
Comment 12•5 years ago
|
||
Given the way that ecosystem telemetry is shaping up for Q1, I think we should go with an alternative approach in this bug: do nothing, and instead just wait for ecosystem telemetry.
That might mean we spend a quarter or two without having the hashed uid in the metrics ping for non-sync users, which just doesn't seem like that big of a deal relative to other things we could be working on.
However, if we really do need this capability with some urgency and can't wait for ecosystem telemetry, then I think we should go with this approach from Comment 6:
- Return it as part of the user's profile data
That is broadly what we're going to have to do in an ecosystem-telemetry world anyway, it's just that FxA will be giving us back a super-secret-derived-and-encrypted identifier rather than a hmaced uid. So at this least approach would be moving us in the general direction of ecosystem telemetry.
This bug is actionable, if it's urgent enough for someone from the FxA team (or myself or Vlad) to make it a priority in one of our upcoming sprints. But I will lean on product and data to let us know how urgent it is in practice.
Comment 13•5 years ago
|
||
- Return it as part of the user's profile data
In practice, this looks like:
- Server-side:
- Add anew "hashed_uid" field to the profile-server, access to which is gated by the scope "profile:hashed_uid" in the same way that other profile fields currently work.
- Add the hmac key to the profile-server's config, and teach it how to calculate the "hashed_uid" field based on this value.
- Do some manual QA to assure ourselves that profile-server and tokenserver calculate the same value for the same account.
- Client-side:
- Read the "hashed_uid" field as part of the user's profile data, persisting it to disk alongside the raw uid.
- Shove the hashed_uid into the sync ping whenever we send one
Which I don't think is a huge amount of work, but it's non-zero.
Comment 14•5 years ago
•
|
||
For what it's worth, I think all of this plan makes sense strategically. It moves in a direction that aligns with future goals and plans, and is a decent fallback in case something goes horribly wrong with ecosystem telemetry.
Comment 15•4 years ago
|
||
I think we should go with an alternative approach in this bug: do nothing, and instead just wait for ecosystem telemetry.
This didn't work out, ecosystem telemetry got canned. I do not intend to work on this bug any further.
Comment 17•3 years ago
|
||
The status quo isn't ideal but does work - we can reopen this if we need it later.
Description
•