Closed Bug 568156 Opened 10 years ago Closed 9 years ago

Use Sync client version as the User-Agent for Sync requests

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: justin, Assigned: rnewman)

References

Details

(Whiteboard: [qa-])

Attachments

(2 files, 4 obsolete files)

Need this for critical security updates, to be able to offer new functionality to only new version, and who knows what else, but we should know what version each user is on.
We do ping with the client version at most once a day when it syncs. It'll show up in the info/collections?v=1.2.3 query
perfect - didnt know we did this.  perhaps we can hack up a maintenance script to find this and stick it in the db or ldap?
Let's be careful here.  I know why we want this information, but I'm also starting to wonder if we should go the other way, and override UA to be something very generic.  One core pillar about the Weave project has been giving the user as much control over their own data as possible, and reducing what we know about users to the bare minimum.  I'm not sure this qualifies.

The more information we provide about a user, the more the server operator knows about the client.  Looking at tech like panopticlick, we could (if we were evil) combine the fingerprint from the browser with the knowledge about what clients the users has to get a lot of detail about the user.  We've done security design from the perspective of having hostile/compromised servers, and this feels like a step against that, so we should discuss this in detail, probably on the public dev list.
(In reply to comment #3)
>  We've done
> security design from the perspective of having hostile/compromised servers, and
> this feels like a step against that, so we should discuss this in detail,
> probably on the public dev list.

I agree with Mike here. We specifically discussed this when we were implementing the client heartbeat for metrics tracking - in fact, I was still under the (mistaken) impression that we weren't sending anything after the ? in the query.

That said, I also see the value this particular piece of data. Talking about this on weave-dev sounds like a good next step.
how is this any different than when we send the ff ua string?  weave is a client and the uses for having that data are infact quite focused on keeping the user *more* secure if we needed to do something specific on the server side to a specific version of the client.  I agree having a public discussion is good (isn't this bug public?) but we should frame it as something that will help us offer better support and a more secure service rather than a metric we are gathering to keep track of users (which it isn't, and we aren't).
(In reply to comment #5)
> how is this any different than when we send the ff ua string?

That's why I mentioned overriding the UA for Weave requests.

> weave is a
> client and the uses for having that data are infact quite focused on keeping
> the user *more* secure if we needed to do something specific on the server side
> to a specific version of the client.  I agree having a public discussion is
> good (isn't this bug public?) but we should frame it as something that will
> help us offer better support and a more secure service rather than a metric we
> are gathering to keep track of users (which it isn't, and we aren't).

That's one potential use-case.  But if it were a hostile/compromised server, it would make users less secure, as it would be easier to target attacks.  This type of data cuts both ways.  The point of Weave has been to make it possible to use a service without needing to trust them with your data or your security.  I'd rather go even further in the other direction, and figure out how to make collection names opaque to the server.

This is public, but few people watch these components.
Duplicate of this bug: 580264
I'd like to revisit this discussion, as you may notice from the duplicate bug
in comment 7.  Having an understanding of the client version will increase our
security monitoring capabilities.  We've had several situations where we've
identified false positive security events that are due to bugs in the client
software. Once these bugs are fixed we have no way of differentiating older,
unfixed clients, from actual attacks.

There is a trade off here, we do expose a small amount of client version
information (no more than a user agent string), but in return we are able to
better protect the entire infrastructure against attackers.
a) as noted in comment 3, I would much rather obscure the UA for all requests than expose even more information to the server.
b) why wouldn't attackers identify as old clients to look like false positives?  Relying on what the client claims to be feels fragile at best.

I do understand the infra/infrasec perspective here, but I still feel like this runs directly counter to the direction we should go from a product perspective.  From that perspective, the client should be as anonymous and generic as possible in how it syncs.  We probably should go further than we do now, as I've said here.
Yes, in a one off situation an attacker could pose as an older version.  But that's not really how we plan to use the client version information. This information can be used to diagnose large quantities of events that we say from numerous clients of a particular version. This way can slowly eliminate the bugs which cause security false positive events without continually bothering dev teams with issues that have already been fixed in recent releases (and we keep seeing because some people haven't updated).

I personally don't think the version data is disclosing too much information. We may need to all get together in a room and hash this one out since we have differing opinions.  There is a trade-off here and we need to make a decision on what item is more important - protecting the client's version from reporting to the server or enhanced security monitoring and bug detection.

I will schedule a meeting for next week (sorry out the rest of this week) and we can finalize this issue with some discussion.  Everyone on this bug will receive the invite.
I'm on an island next week, with no phone.

Ultimately, Ragavan has to set the product direction, and we need to live within that space.  I think our core goals of privacy, and user-centric data policies, have to win here, but he needs to make that call, not me.
(In reply to comment #11)
> I'm on an island next week, with no phone.
> 
> Ultimately, Ragavan has to set the product direction, and we need to live
> within that space.  I think our core goals of privacy, and user-centric data
> policies, have to win here, but he needs to make that call, not me.

Ragavan, do you want to make a call on this? 

My points which are in line with some of the other comments:

1. We are seeing some very odd issues with the client/server interaction beyond just a single use case. Having the version in the log entry sent to us will be very helpful and help us figure out what problems to address. This is beyond just security. 

2. To Justin's point, we are running a service and this will only be used as part of this service. Since this is a service, what if we had a major issue with a client and didn't want it syncing with our environment, how could we enforce this? Again, we are responsible for this data and if there is a fundamental problem we might not want that data on our servers from a given client version. So we can't enforce this now but we should think about it. 

I think we are all on the same page with regards to privacy and the new log policy will no-doubt be in place with this service. Metrics is just secondary but not our main reason for having the client report version numbers.
This is on me - I'll update this bug with thoughts/comments before the end of this week.
(In reply to comment #12)
> (In reply to comment #11)
> > I'm on an island next week, with no phone.
> > 
> > Ultimately, Ragavan has to set the product direction, and we need to live
> > within that space.  I think our core goals of privacy, and user-centric data
> > policies, have to win here, but he needs to make that call, not me.
> 
> Ragavan, do you want to make a call on this? 
> 
> My points which are in line with some of the other comments:
> 
> 1. We are seeing some very odd issues with the client/server interaction beyond
> just a single use case. Having the version in the log entry sent to us will be
> very helpful and help us figure out what problems to address. This is beyond
> just security. 

I completely understand how useful this would be for debugging client changes.  This would make my life easier, probably even moreso than for infrasec.  No one is questioning the potential benefits or use-cases.  The question is whether they violate the basic product requirements of user privacy being paramount.

> 2. To Justin's point, we are running a service and this will only be used as
> part of this service. Since this is a service, what if we had a major issue
> with a client and didn't want it syncing with our environment, how could we
> enforce this? Again, we are responsible for this data and if there is a
> fundamental problem we might not want that data on our servers from a given
> client version. So we can't enforce this now but we should think about it. 

What you're really talking about here is "require unique client ID/version strings as part of the API calls" for all clients.  It's an open system, so we'd have to require this from all clients.

We're running a service, but the service operational limits cannot be the primary decision factor.  The success of Sync, and of how we build out services in general, will be in changing how people run services, and making very different tradeoffs from a privacy perspective.  This means we sometimes walk away from wins (like not encrypting data on the client, and not running a proxy server for decrypting data for mobile devices).

> I think we are all on the same page with regards to privacy and the new log
> policy will no-doubt be in place with this service. Metrics is just secondary
> but not our main reason for having the client report version numbers.

Remember, we're not the only people running servers.
(In reply to comment #14)
> I completely understand how useful this would be for debugging client changes. 
> This would make my life easier, probably even moreso than for infrasec.  No one
> is questioning the potential benefits or use-cases.  The question is whether
> they violate the basic product requirements of user privacy being paramount.

What are the requirements for user privacy at this point? 

Per the FF Sync Privacy Policy Definition: 
"“Non-Personal Information” is information that cannot by itself be directly associated with a specific person or entity. Non-Personal Information includes but is not limited to your computer’s configuration and the *version* of Firefox Sync you use."

So we are really saying is that the version should be considered personal information?
I promised a response by the end of the week, but I haven't had a chance to talk to Chris yet. I have a fairly good understanding of Mike's position on this, but want to talk to Chris as well. 

I'll sync up with Chris early next week and also talk to legal and resolve this.
I talked to Julie, she doesn't think it is an issue from the perspective of our Privacy Policy, but is going to think about it some more.

https://bugzilla.mozilla.org/show_bug.cgi?id=582483 tracks that. 

Also, @clyon and @mcoates, as Mardak mentions in comment #1, we currently include the client version in the daily heartbeat ping. Is that not sufficient? Any reason this should be included in each sync request?
The daily hearbeat version information would not be easily available to correlate with each event that we receive into arcsight. Any sort of automatic correlation between the two data sets would not be feasible considering the large number of events we receive.  Having the version number available with each request (via header or url argument) is the optimal solution for us.
I spoke with zandr today, we may already have access to this information via the http request object. If that is the case we could update our CEF log points to include this data with client.

Zandr, is it possible to get the client's version everytime they send a request?
(In reply to comment #19)

> Zandr, is it possible to get the client's version everytime they send a
> request?

It would be possible to modify the client to do this in the future yes, subject to, well, the rest of the conversation in this bug.

There's no technical barrier, no.
Ah, ok. Perhaps I didn't understand earlier.  I was thinking we already had the information available on the server and could just correlate the version info with each request that goes to CEF logging (e.g not requiring a client change).

If a client code change is required then we are back at the original discussion.  I thought I found something that worked without client modification.
That's right. Once a day the client appends the ?v= query string, so you need to find that request in the log to get a particular user's version (and if the user has multiple clients, there's lots of opportunity for things to get messy)

So we'll get that once/day/client for each user. Metrics counts total numbers of version pings for each version, but they don't tie it up with username to my knowledge.
Spoke to cslyon about this, going to append this to info/collections, as this is the start of each sync pass, and possibly the only call.
OS: Mac OS X → All
Priority: -- → P2
Hardware: x86 → All
Summary: Need to capture the client version number and update on each sync → include client and client version in each info/collections call
Target Milestone: --- → 1.6
From what I can see in the code, we hit info/collections with a ?v= parameter no more than once per 24 hours. That doesn't seem to be congruent with the difference between Comment 22 and Comment 23.

Is there any action still required on this? Five months of silence says "no", so I'm stripping the P2, but I'll keep this open until I hear otherwise.
Priority: P2 → --
Target Milestone: 1.6 → ---
So, the basic idea was to add this info (client name and version) to all info/collection calls, since that's effectively the start of every sync pass.  I think we still want this for forensics, and we'd strip the v= thing out (with some coordination around metrics).
Attached patch WIP. v1 (obsolete) — Splinter Review
This proposed patch puts everything and the kitchen sink in the URL, and preserves the once-daily "v" parameter for backward compatibility.

Now, out of the enormous pile of values, split up for easy reading:

  ver=1.7                 # Weave ver
  channel=rel             # Weave channel
  appID=%7Bec8030f7-c20a-464f-9b0e-13a3a9e97384%7D  # Fx ID
  appName=Firefox         # App name
  appVer=4.0
  appBuild=20110318052756
  plVer=2.0               # Platform (Gecko) version
  plBuild=20110318052756

which ones do we want?

Note also that I've included a pref to turn off uploading these version strings for the privacy-conscious.
Assignee: nobody → rnewman
Attachment #522792 - Flags: feedback?(mconnor)
Whiteboard: [has patch][needs feedback mconnor]
Comment on attachment 522792 [details] [diff] [review]
WIP. v1

a) I think we want this as a header on each info/collections call, not a special version string.
b) I think we want just client and sync version.  We'll get UA for the browser anyway.
Attachment #522792 - Flags: feedback?(mconnor) → feedback-
> a) I think we want this as a header on each info/collections call, not a
> special version string.

Ops folk: would you prefer this as a header, rather than as a query parameter?

It's more invasive to the code, but no big deal.

> b) I think we want just client and sync version.  We'll get UA for the browser
> anyway.

Can you clarify precisely which "client version"? There are four in the strawman.

App version and platform version should be in the UA, yes, and in future they will be very coarse indeed (from what I can glean from dev-planning). I would guess, then that you mean "appBuild" when you say "client version", yes?
Any reason we couldn't modify the user-agent on this call? That would seem to be the quietest solution.
(In reply to comment #29)
> Any reason we couldn't modify the user-agent on this call? That would seem to
> be the quietest solution.

Unknown. Depends on whether the header change we makes sticks (which it doesn't for Authorization, apparently!), and whether we can fetch the current user agent if it hasn't been set yet.

I'm happy to explore this channel if it's desirable; still waiting for the people who'll actually be consuming this to weigh in with their opinions.
OK, new patch coming shortly to do this for info/collections:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b13pre) Gecko/20110320 Firefox/4.0b13pre Sync/1.7 (rel; 20110331184041)

Any objections?
Attached patch User-agent version. v1 (obsolete) — Splinter Review
This version augments the user-agent string for info/collections requests. One more test!
Attachment #522792 - Attachment is obsolete: true
Attachment #523497 - Flags: review?(philipp)
Whiteboard: [has patch][needs feedback mconnor] → [has patch][needs review philikon]
I'd like to make this something all clients can implement.  Modifying the UA seems like the trickiest one to do reliably across arbitrary platforms, and means we have to implement special parsing for each client, instead of just parsing a known header.
(In reply to comment #33)
> I'd like to make this something all clients can implement.  Modifying the UA
> seems like the trickiest one to do reliably across arbitrary platforms, and
> means we have to implement special parsing for each client, instead of just
> parsing a known header.

However, the UA does appear in Apache logs, which helps with the "poking about with grep" case. Other headers would be useful for metrics consumption, but less immediately impactful on logs.

Trivial for me to do whatever; petef seemed happy with UA.

Ops gets to make the call; they're the ones who have to use this!
(In reply to comment #34)
> Ops gets to make the call; they're the ones who have to use this!

Infrasec wanted this info as well. However, we'll use it via CEF logs from code. So either way works with us as long as it can be reliably parsed in Sync server code and passed in our CEF logs.
Can we get a final decision from Ops on this matter please?

Note: I'm not too happy with the User-Agent header fiddling. It makes our client side implementation just a tad more complicated than seems necessary. I would prefer that we either set a separate header or append a GET parameter to the URL.
Whiteboard: [has patch][needs review philikon] → [has patch]
Comment on attachment 523497 [details] [diff] [review]
User-agent version. v1

Removing the review flag on this patch for now until we have a decision from Ops.

Commenting on the code that's independent of that decision:

>+  _appBuildID: Cc["@mozilla.org/xre/app-info;1"]
>+                 .getService(Ci.nsIXULAppInfo).appBuildID,
>+
>+  // Compose a UA string fragment from the various available identifiers.
>+  _appInfoString: function _appInfoString()
>+    " Sync/" + WEAVE_VERSION +
>+    " (" + WEAVE_CHANNEL + "; " + this._appBuildID + ")",
>+    

Please use Svc.AppInfo.appBuildID directly. (At some point we can replace a bunch of the Svc.* lazy getters with their Services.jsm equivalents.)

WEAVE_CHANNEL seems pointless at this point. We should get rid of it. If anything we might want to replace it with the value for the 'app.update.channel' pref.
Attachment #523497 - Flags: review?(philipp)
(In reply to comment #36)
> Can we get a final decision from Ops on this matter please?
> 
> Note: I'm not too happy with the User-Agent header fiddling. It makes our
> client side implementation just a tad more complicated than seems necessary. I
> would prefer that we either set a separate header or append a GET parameter to
> the URL.

I don't think anyone from ops is on this bug. Who do you want to make this call?
(In reply to comment #38)
> I don't think anyone from ops is on this bug. Who do you want to make this
> call?

Oh, thanks for catching this. CCing them. I don't care who makes the call, as long as we have a solution that works for infrasec, ops and us.
Ops requests that the User-Agent header for sync requests to the sync servers include the sync version.  Discussed in person and confirmed that this is an acceptable solution to Pete, Jeff, Toby, and Philipp. cc'ing Jennifer Arguello for review.

FAQ:

Q: Only sync requests will include the Sync version tag in user-agent, not all requests, right?
A: Correct.

Q: Wouldn't a header also work?
A: An extra header is also a technically valid solution, but we regardless prefer an additional version tag in user-agent.

Q: Should we modify user-agent for only info/collections requests, or for all sync requests?
A: For all sync requests.  Improves our ability to debug issues, very small cost in bytes, doing it for all requests is easier on the developers.
Whiteboard: [has patch]
Attached patch User-agent version. v2 (obsolete) — Splinter Review
Adds the header on every request. Tests extended to check both _fetchInfo (GET) and a manual call to Resource.post().

Header fragment is computed once, in AsyncResource.prototype.
Attachment #523497 - Attachment is obsolete: true
Attachment #524431 - Flags: review?(philipp)
Summary: include client and client version in each info/collections call → Include Sync client version in the User-Agent on Sync requests
Comment on attachment 524431 [details] [diff] [review]
User-agent version. v2

># HG changeset patch
># Parent f9299646f8eb451456a564ce82a3a31d17aa9dcd
># User Richard Newman <rnewman@mozilla.com>
>Bug 568156: include client information in every info/collections request.

I just renamed the bug to reflect the updated attack plan, so plz update this :)

>+    // Compose a UA string fragment from the various available identifiers.
>+    if (Svc.Prefs.get("sendVersionInfo", true)) {
>+      let ua = this._uaAddition;
>+      try {
>+        ua = channel.getRequestHeader("user-agent") + " " + ua;
>+      } catch (ex) {
>+        // Throws if header not present.

Should only catch that specific error here:

  catch (ex if ex.result == Cr.NS_ERROR_FOOTGUN)

>+function run_test() {
>+  test_resource_user_agent();

Please also add one for AsyncResource.

r=me with those fixes.
Attachment #524431 - Flags: review?(philipp) → review+
Welcome back to Episode 43 of The Bug That Will Not Die®!

There seems to be a lot of conflation of requirements going on here. Let's restate, in increasing order of fuzziness.

From a developer standpoint, I'd like to have enough logs available that I can ask atoll "hey, can you correlate HTTP failure codes with Sync code releases?".

From an ops standpoint, I believe the chaps would like to be able to selectively block clients, as well as support debugging. We devs are not entirely sure that the former is best performed via client headers (which are likely to be infrequently updated or omitted), but hey. This proposal is better than nothing.

My understanding of the infrasec requirements is that they'd like to be able to correlate Sync clients and versions to particular CEF log entries. That requires Toby and Tarek to process whatever the client sends. This proposal is slightly easier than earlier ones.


To support debugging, we're broadly in agreement that logging something like:

  User-Agent: Fx-Sync/1.7.1pre.2011010101.mobile

(where "mobile" = Svc.Prefs.get("client.type") and 2011... = appBuildID)

is the right balance between debuggability and user privacy. Fewer details are present in this line than in the current Firefox User-Agent.


I'm aware that this might not be the ideal solution for CEF logging, because arbitrary clients might be sending Crap® in the User-Agent field. Sorry.

Additional proposals are welcome, but this is what we intend to land today.

Philipp, new patch incoming shortly!
Summary: Include Sync client version in the User-Agent on Sync requests → Use Sync client version as the User-Agent for Sync requests
Attached patch User-agent version. v3 (obsolete) — Splinter Review
Implements previous comment.
Attachment #524431 - Attachment is obsolete: true
Attachment #524465 - Flags: review?(philipp)
Attachment #524465 - Attachment is patch: true
Attachment #524465 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 524465 [details] [diff] [review]
User-agent version. v3

Actually, how 'bout I remember to reflect review comments? *sigh*
Attachment #524465 - Attachment is obsolete: true
Attachment #524465 - Flags: review?(philipp)
Attachment #524480 - Flags: review?(philipp)
Attachment #524480 - Flags: review?(philipp) → review+
STRs for QA: set up a custom Sync server whose access logs you can look at. When you sync against that, the access logs should show "Sync/1.8.0.<number> (desktop)" as the user agent instead of "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110408 Firefox/4.2a1pre"
Status: NEW → ASSIGNED
Will the version of Sync be bound in any way to the version of Firefox?  Seems like we'll be losing some important information about what client the Sync was running on.
Daniel, what weave metrics graphs will be impacted if the Firefox version is unavailable in the weave logs?

Mike, could you work with Daniel to ensure that we no longer require the affected graphs?
We can certainly remove graphs, but I'd like to understand whether we are preventing the possibility of useful metrics around client name and version in the future.
We're thinking about this, see if we can preserve some useful information whilst still minimizing identifiable info.
Attached patch Minor update. v1Splinter Review
Adding product name and version back, because that's how bikesheds are supposed to look.
Attachment #524718 - Flags: review?(philipp)
Attachment #524718 - Flags: review?(philipp) → review+
Whiteboard: [fixed in services]
Whiteboard: [fixed in services]
Automated tests for this should be adequate. q-.
Whiteboard: [fixed in services] → [fixed in services][qa-]
http://hg.mozilla.org/mozilla-central/rev/6795af6d25c4
http://hg.mozilla.org/mozilla-central/rev/58d70dd8b7a8
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in services][qa-] → [qa-]
Target Milestone: --- → mozilla5
Blocks: 669275
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.