Closed Bug 444351 Opened 12 years ago Closed 11 years ago

do not send client UUID / GUID with crash reports


(Toolkit :: Crash Reporting, defect)

1.9.0 Branch
Not set





(Reporter: beltzner, Assigned: ted)



(Keywords: verified1.9.0.6, verified1.9.1)


(1 file)

Remove the client GUID from being sent by default with crash reports.
Flags: wanted1.9.0.x?
Flags: blocking1.9.1?
What's the impetus here?
Here's a patch against trunk.
Assignee: nobody → ted.mielczarek
Attachment #328727 - Flags: review?(benjamin)
Attachment #328727 - Flags: review?(benjamin) → review+
Beltzner, why should this block?
Any reason this hasn't been checked in yet?
There was some fervent discussion happening, mostly between ss and shaver (although I can't remember where), and I wasn't clear on the outcome. shaver: did you two ever reach consensus?
Shaver, decision please?
Assignee: ted.mielczarek → shaver
Flags: blocking1.9.1? → blocking1.9.1+
Can't see why we need to muck with this on the stable branch. Besides, the client UUID was occasionally tremendously useful with talkback reports -- not to have the UUID itself, but to know whether a particular 1000-report crash was from 900 unique users or 1 user (maybe developing a broken addon?).
Flags: wanted1.9.0.x? → wanted1.9.0.x-
ted, please push this.
I don't think this patch should get pushed to mozilla-central (or any branches) until a decision has been made. Has this even been discussed in a public forum like the newsgroups? It seems like this is a governance issue related to privacy and data, not simply a module owner decision. We should get further feedback.

(And no, I don't think Shaver and I ever reached consensus.)
I've heard arguments on both sides. We're balancing the reward of being able to associate the number of users experiencing crashes versus the privacy risk of being able to associate multiple crashes with a user and build up a user profile.

My decision as module owner is that the risk is much greater than the reward here. If you'd like to challenge that decision, please do so in the newsgroups posthaste.
(In reply to comment #7)
> Can't see why we need to muck with this on the stable branch. Besides, the
> client UUID was occasionally tremendously useful with talkback reports -- not
> to have the UUID itself, but to know whether a particular 1000-report crash was
> from 900 unique users or 1 user (maybe developing a broken addon?).

I can think of several ways to accomplish this without the privacy problems associated with a GUID:

* Send a random 4-bit identifier instead of a GUID.  If all the identifiers for a given crash signature are the same, we can guess that they're all from the same person.  But we can't tell the difference between 100 users hitting a crash and 10000 users hitting a crash.

* Send a number that is a function of the number of crashes reported by the user in the last month.  (For example, the buckets could be "1", "2 to 4", "5 to 15", and "16 or more").  If a given crash signature keeps showing up in the "16 or more" bucket, we can infer that the crash affects a small number of users a lot.
In a lot of "hallway" conversations we have talked about getting rid of e-mail address collection and going to a system like jesse suggests in comment 11.

The latter comment 11 suggestion of rotating the "submitter id" either after a number of reports or monthly based seems like the best approach.

On the macro analysis level we have gotten a lot of value in the past, and upcoming in the new MTBF report in knowing how many users have submitted reports versus the total number of reports submitted and comparing those numbers release to release.  On the micro analysis level understanding if a specific crash is coming by one, or just a few users, or if it is broad based has played an important part of isolating and reproducing many crashes.  Quickly understanding how broad based the crash is helps direct the next level of analysis and which path might yield the most information.  For example:

 * if it is just a few users crashing, understanding similarities in their configuration is often an effective next step.  
 * if it is many users, looking at configuration similarities is not so effect and we look to other analysis techniques.

Please don't break either one of these macro or micro analysis tools.
The MTBF report *should* be using the time-since-last-crash number, and shouldn't need to use the submitter ID at all. Is there a bug where it is being implemented?

Yes, it is valuable to know whether a crash comes from a few or many users. But you can perform automated regression testing to find similarities in configuration without any unique IDs. It is sufficient to compare the DLL list and other semi-unique characteristics of the report; data which is already collected and available. 

I don't believe that the "number of unique users experiencing this crash" number is a sufficient benefit to offset the risk to user privacy of the unique ID.
Today's platform meeting decided to go ahead with this patch and remove GUIDs from crash reports, and continue investigating other less invasive ways of getting crash-per-user data, such as system signatures.
Assignee: shaver → ted.mielczarek
Pushed to m-c:

I'll push to 1.9.1 once this cycles green on trunk. We should consider taking this on the 1.9.0 branch as well.
Closed: 11 years ago
Resolution: --- → FIXED
> But you can perform automated regression testing to find similarities in
> configuration without any unique IDs. It is sufficient to compare the DLL list
> and other semi-unique characteristics of the report; data which is already
> collected and available. 

Could an attacker that has access to the database use this same approach to develop a "user profile"?   Isn't this approach just a way of doing post processing on the data to create something just like a UUID, or a proxy for the UUID?   If we are going to develop tools like suggested in bug 472358 couldn't an attacker use those tools to associate a collection of crashes with a specific user, then use the same system configuration or other attributes to go find other reports from the same user?   If we don't develop those tools, couldn't an attacker develop them for themselves?

If its technically feasible to post process a collection of crash reports to determine if they come from the same user and create a proxy for a UUID or an actual "post processed" UUID its not clear to me what we have really gained by removing the UUID.  Its only increased the processing burden for us, and attackers to create the post processed user identification.

I agree that there maybe some privacy risk here, so we should articulate them and find some solutions that prevent possible problems.

ted is worried about an AOL style disclosure where a collection of URLs was enough to develop the browsing profile of specific individuals.  that seems a valid concern and that kind of attack seems theoretically possible if an attacker gets full access to the database right now.

here is how an attack like that might happen.

find a crash report with a url that has user info embedded in the url

take the crash report with that url
get the config info for the pc that submitted that crash.
start matching that config info to that in other crash reports.
if a match is found then print URL

no UIDs involved but my browsing history is reviled using a script that does the steps above and its directly associated to me as a person.

The problem in this case is that we have made URL collection and transmission "opt-out" as part of breakpad/socorro and automated the process of gathering and transmiting the full detailed host name and path information in every url.  

we have much more precise data now with that system, but we all feel a bit too uneasy to publish that data, and have it hang around on the servers with the rest of every the crash report.

I've said in several other forums, we really should audit each of the pieces of information that we are collecting, identify the pieces of information that are really sensitive, figure out if we need it still, and figure out if it should be opt in.

some candidates for removal are

e-mail address - could be sensitive - not used in analysis, misleads users into thinking that we will likely contact them about a specific crash we in practice we don't do this.  many users don't provide this now.  get rid of e-mail addresses.

IP address -  could be sensitive - not used in any analysis techniques we use now.

urls - could be very sensitive and revile a individuals browsing history - used in analysis, really valuable in finding reproducible sites and common content patterns that cause crashes.  consider making this opt-in again, where the user types in the url and discloses just the parts of the site they are comfortable with.   Add back the right disclosure so we are more comfortable with having URL data made public again.

uuid - by itself it seems not valuable since the number isn't connected to an individual.  could be reconstructed in post processing by looking at other configuration details.  what do we gain if we remove?

continue with looking at each of the data types we collect.

process list...
time of crash...
Flags: wanted1.9.0.x- → wanted1.9.0.x?
Attachment #328727 - Flags: approval1.9.0.6?
Comment on attachment 328727 [details] [diff] [review]
stop sending UserID

Pretty small change, it's all code removal. If we're serious about this we should take it on 1.9.0.
Blocks: 392608
Comment on attachment 328727 [details] [diff] [review]
stop sending UserID

Approved for, a=dveditz for release-drivers.

Code freeze is tonight though -- please land ASAP
Attachment #328727 - Flags: approval1.9.0.6? → approval1.9.0.6+
QA: To verify, please ensure crash submission works (and shows up in Socorro), the UUID does not display in the crash reporter after a crash, and confirm with IT that the UUID is not present in the database (you can file a Ops bug asking for information for a specific crash report ID).
Checked in to 1.9.0:
Checking in toolkit/crashreporter/nsExceptionHandler.cpp;
/cvsroot/mozilla/toolkit/crashreporter/nsExceptionHandler.cpp,v  <--  nsExceptionHandler.cpp
new revision: 1.40; previous revision: 1.39

To clarify what ss said, you can verify this client-side by using the crash me now extension, then clicking "Details" in the crash reporter window, and verifying that "UserID" is not in the list of data shown. Verifying it server-side would require an IT ticket.
Keywords: fixed1.9.0.6
Ted, I tested using the latest nightly with the crash me extension and here are my results.

BuildID: 20090107033209
CrashTime: 1231434676
InstallTime: 1231348391
ProductName: Firefox
SecondsSinceLastCrash: 1112360
StartupTime: 1231434666
Theme: classic/1.0
Vendor: Mozilla
Version: 3.2a1pre

This report also contains technical information about the state of the application when it crashed.

Now when I viewed the crash report online at

I see a UUID in the details tab.  UUID	c08ec64f-27d5-41ba-935a-ae5d22090108

Is this my client UUID or something different?
Crap, sorry for the bug spam.  After submitting I noticed that the UUID is actually of the crash report haha.  Sorry again about the bug spam!
For, I used This is verified but I need to check with server ops that the UUID is not in the DB.
Al, have you already filed a bug for the server-side verification?
Yes (although I don't recall the bug number) and it was verified.
Keywords: fixed1.9.1 Please do *not* remove the fixed1.9.1 keyword from bugs that have been fixed on the 1.9.1 branch (see comment 17 of this bug).
Keywords: fixed1.9.1
I removed it because it's verified1.9.1 now, but feel free to keep it.
Natch, fyi it was getting verified for, not for 1.9.1.
Sam and Henrik, it was Bug 474084 and it is verified.
Ok, so lets do the same thing for trunk and 1.9.1. I've filed bug 475531 for that.
Depends on: 474084, 475531
There are no user_id entries in the database for trunk and 1.9.1 builds. The id is even not visible in the crash reporter details window. Marking as verified.
Target Milestone: --- → mozilla1.9.2a1
Flags: wanted1.9.0.x? → wanted1.9.0.x+
You need to log in before you can comment on or make changes to this bug.