Closed Bug 1192833 Opened 10 years ago Closed 2 years ago

(k-randomization reporting) [META] Investigate k-rounds randomization reporting

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

Iteration:

43.2 - Sep 7

People

(Reporter: mzhilyaev, Assigned: mzhilyaev)

References

Details

(Whiteboard: [story])

maxim zhilyaev

Assignee

Description

•

10 years ago

Investigate privacy and extraction precision aspects of user data collection scheme where users reports RR-randomized categorical vectors k times. The reports are then anonymized and mixed. This scheme allows for sqrt(k) reduction of noise. It also appears that its privacy guarantee is proportional to L*k/N, where N is the number of distinct users, and L is length of categorical vector. For large audiences this method has a promise of delivering accurate user aggregates without significant loss in privacy.

Kevin Ghim

Updated

•

10 years ago

Blocks: 1189754

maxim zhilyaev

Assignee

Comment 1

•

10 years ago

the resulting proves are here: https://docs.google.com/document/d/1_kquDsJfN_MTgBjo4mOAmxKdmiHurFryJGKGBsIVT04/edit?pli=1# noise is reduced by sqrt(k) privacy guarantee is: L*k / N, where L - length of bit vector reported k - randomization rounds N - distinct users

Hermina

Updated

•

10 years ago

Blocks: 1193831
No longer blocks: 1189754

maxim zhilyaev

Assignee

Comment 2

•

10 years ago

The results listed above could be incorrect. I might have discovered an attack that will invalidate privacy guarantee limits above.

Hermina

Updated

•

10 years ago

Blocks: Sprint_CS_S1

Hermina

Updated

•

10 years ago

Points: --- → 2

Hermina

Updated

•

10 years ago

Blocks: Sprint_CS_S2

Hermina

Updated

•

10 years ago

Iteration: --- → 43.2 - Sep 7

Doug Turner (:dougt)

Updated

•

10 years ago

Flags: needinfo?(ekr)

maxim zhilyaev

Assignee

Comment 3

•

10 years ago

Notes to reviewer: The doc is mostly finished, the section on "Relaxing Privacy bounds" is still being worked on: https://docs.google.com/document/d/1_kquDsJfN_MTgBjo4mOAmxKdmiHurFryJGKGBsIVT04/edit?pli=1#heading=h.3b41pubv2rqo There seems to be a way to relax required collection size significantly without opening to high-cardinality attack. Stay tuned.

maxim zhilyaev

Assignee

Comment 4

•

10 years ago

The research is complete and ready for review. Final version https://docs.google.com/document/d/1_kquDsJfN_MTgBjo4mOAmxKdmiHurFryJGKGBsIVT04/edit?pli=1#heading=h.ivaimes7hdrg Main finding: L - length of bit vector reported k - randomization rounds N - distinct users p,q - parameters of randomization 1. noise is reduced by sqrt(k) 2. Breaking point If N*q^L >= 1, the observer can't determine the presence of an outlier - privacy guarantee is 0 regardless of k. 3. Tipping point If N >= (p/q)^L, the observer may suspect the collection contains a real outlier, but the observer can't distinguish synthetic output from a real user and the noise of the collection, hence attack against RR is impossible. Privacy guarantee is 2k, but the observer can't conduct the attack against randomized responses from the distinct user. 4. When L is large, it's possible to decrease p to cover up for privacy and increase k to cover up for precision. Hence, enabling prescribed precision without privacy loss. 5. Regardless of L. Mutually exclusive categorical bit vectors occupy only 2 bits of privacy. Categorical vectors that have no more than S bits, occupy 2*S bits of privacy. 6. It's possible to collect search queries and user histories by randomizing string representations and running a dictionary attack against randomized corpus of strings 7. Privacy bounds can be further reduced case by case bases to satisfy infrastructure cost limitations.

maxim zhilyaev

Assignee

Comment 5

•

10 years ago

(In reply to maxim zhilyaev from comment #4) > > 3. Tipping point > If N >= (p/q)^L, the observer may suspect the collection contains a real > outlier, but the observer can't distinguish synthetic output from a real > user and the noise of the collection, hence attack against RR is impossible. > Privacy guarantee is 2k, but the observer can't conduct the attack against > randomized responses from the distinct user. > Correction: Privacy guarantee is k*ln(2)

Hermina

Updated

•

10 years ago

Blocks: 1202820

Hermina

Updated

•

10 years ago

No longer blocks: 1202820

Hermina

Updated

•

10 years ago

Blocks: 1202820

Hermina

Updated

•

10 years ago

Blocks: 1205313

Hermina

Updated

•

10 years ago

No longer blocks: 1205313

Doug Turner (:dougt)

Updated

•

10 years ago

Flags: needinfo?(ekr)

u597032

Updated

•

2 years ago

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

(k-randomization reporting) [META] Investigate k-rounds randomization reporting

Categories

(Content Services Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: mzhilyaev, Assigned: mzhilyaev)

References

Details

(Whiteboard: [story])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Updated

Updated

Updated

Updated

Updated

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Updated

Updated

Updated

Updated