Closed Bug 984288 Opened 10 years ago Closed 10 years ago

Decide how to store the User Identifier in a way that protects users privacy.

Categories

(Hello (Loop) :: Server, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rhubscher, Assigned: rhubscher)

References

Details

(Whiteboard: [qa!])

Attachments

(1 file)

In Bug 980289, we talked about user privacy and user identification.

JR Conlin was saying that storing the user email may be a user privacy breach:

We link for a long time, until the UserAgent send a new SimplePush URL, the user with a list of SimplePush URLs.
For a user email address we could know the number a devices and their location (even if we don't store the user IP Address, it could be found in server logs by matching the storage date with the log date.)
In case that's a valid concern I was thinking of building an UUID from the user email address.

    import hashlib
    import sys
    from uuid import UUID

    text = sys.stdin.read()
    print UUID(hashlib.sha256(text).hexdigest()[:32])

Is that a secure way to do it?
Maybe we could simply store the sha256 hash instead.
Flags: needinfo?(rlb)
I'm still not understanding the problem here.  Could you provide a little more context?

In general, if you want to store something that is opaque, but can be used for comparison later, the thing to do is to use something like PBKDF2 to create a digest of the thing that is difficult to compute.  With hashes, you can identify a known element in a set of N hashes in ~O(1) time, since you just hash the known thing and then do equality tests.  With PBKDF2, the cost is more like O(N), since you have to compute a different hash for each member in the set.  So you can still verify that a given URL belongs to <email address>, but it's difficult to find all the URLs for <email address>.

Of course, if you don't need these IDs to be linkable to email addresses, just use random ones.
Flags: needinfo?(rlb)
rbarnes: In essence, that's the approach I was planning on taking, but as a backup. Originally, the Firefox Accounts assertion had a token that was being returned. This is apparently no longer the case. It was suggested that storing the user identifier (email?) returned from the validator would be fine since it does not disclose user private information. 

As it stands, I believe that the user identifier looks a good deal like an email and features user selectable data as the local address (e.g. rbarnes@firefox...). Tied to the location data that I'm storing, it's a fairly simple matter to determine who You are. 

I don't store a constant stream of location data, and in fact, I only store a limited set of data on user request. I'll admit that there is a heightened level of paranoia, but I was taught a long time ago to make systems secure from yourself, and this seems like a point of data that could be used against someone should the data store be compromised. 

None of this really solves issues that might occur due to pen registries or other possible complications.

I will be talking with privacy in the upcoming week or so, and hopefully a number of these issues may be addressed. I will continue to update here as need be.
Some IRC conversation with JR has cleared this up for me.

As I understand the problem, we have a database of stuff that needs to be indexed by a user ID that comes from FxA.  We want to make sure that if someone downloads the database of (ID, stuff) mappings, they can't tell which stuff belongs to whom.  We trust processing more than data storage.

In light of that, JR's suggestion of just using a MAC seems fine.  The MAC key resides on the processing node, and the MAC is used as the key in the DB.  If the DB is downloaded, then the attacker can't go from MAC to ID without the MAC key.  That seems to me to meet the security goals, given the above assumption about trust(processing) > trust(storage).

The only consideration in this case is how the MAC key is maintained.  On the one hand, if the attacker can get the MAC key along with the DB, then he wins.  On the other hand, if you ever lose the MAC key, then all your data is dead.
Attached file link to github PR
Attachment #8392831 - Flags: review?(rlb)
Comment on attachment 8392831 [details] [review]
link to github PR

Couple of minor comments in the PR, but otherwise looks good to me.
Attachment #8392831 - Flags: review?(rlb) → review+
Fixed by https://github.com/mozilla/loop-server/commit/413b252f4127bb8cd2fad8fe68bfcda7ebebc4cd
Assignee: nobody → rhubscher
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
OK. Verified in code.
Status: RESOLVED → VERIFIED
QA Contact: jbonacci
Whiteboard: [qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: