Closed Bug 952677 Opened 12 years ago Closed 12 years ago

Remove whiteboard from sensitive groups

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ekyle, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

Changes to remove the whiteboard 12 years ago Kyle Lahnakoski [:ekyle] 1.05 MB, patch		Details \| Diff \| Splinter Review
Changes to screen whiteboard 12 years ago Kyle Lahnakoski [:ekyle] 482.20 KB, patch	mcote : review+	Details \| Diff \| Splinter Review
Fix problems found my look_for_leaks.py 12 years ago Kyle Lahnakoski [:ekyle] 372 bytes, patch		Details \| Diff \| Splinter Review
Fix whiteboard problem 12 years ago Kyle Lahnakoski [:ekyle] 5.99 KB, patch	mcote : review+	Details \| Diff \| Splinter Review

Kyle Lahnakoski [:ekyle]

Reporter

Description

•

12 years ago

The action item from the security review for the private cluster was to ensure the whiteboard for very sensitive bug groups was removed. This is the patch that deals with that This patch also completes the code that monitors the public cluster for leaked private data into the public cluster. The code works using a combination of the metrics cluster and the public cluster, but the metrics cluster has some serious data damage: Upgrading the code for the private cluster is necessary.

Kyle Lahnakoski [:ekyle]

Reporter

Comment 1

•

12 years ago

Attached patch Changes to remove the whiteboard (obsolete) — Details — Splinter Review

The changes are here > https://github.com/klahnakoski/Bugzilla-ETL/compare/clear_whiteboard?expand=1&w=1 The test_incremental_etl_catches_tracking_flags is here > https://github.com/klahnakoski/Bugzilla-ETL/blob/a6cd09a2276a4728cd86d365130f73db34d57cf7/tests/test_etl.py#L369 The battery of tests to ensure the public cluster has no private data is here > https://github.com/klahnakoski/Bugzilla-ETL/blob/a6cd09a2276a4728cd86d365130f73db34d57cf7/tests/resources/python/look_for_leaks.py

Attachment #8350809 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 2

•

12 years ago

Can you give me an incremental diff over the patch from bug 952101? Looks like I've already reviewed most of this, and it'll be very tedious to tease out what has changed.

Kyle Lahnakoski [:ekyle]

Reporter

Comment 3

•

12 years ago

Attached patch Changes to screen whiteboard — Details — Splinter Review

My apologies for giving you the wrong diff last time, and thank you for looking at this on the holidays. The majority of this patch is converting from "<screened>" to "[screened]": This is to take advantage of the whiteboard parsing that exists in ES already, and to standardize all screened fields to the same

Attachment #8350809 - Attachment is obsolete: true

Attachment #8350809 - Flags: review?(mcote)

Attachment #8351182 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 4

•

12 years ago

Comment on attachment 8351182 [details] [diff] [review] Changes to screen whiteboard Review of attachment 8351182 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine, just some nits below. I didn't point out all the spacing problems, just a few. Most are a lack of sufficient horizontal whitespace. ::: bzETL/bz_etl.py @@ +193,5 @@ > #RECENT PUBLIC BUGS > possible_public_bugs=get_recent_private_bugs(db, param) > possible_public_bugs=set(Q.select(possible_public_bugs, "bug_id")) > + if param.allow_private_bugs: > + #PRIVVATE BUGS Typo. ::: bzETL/extract_bugzilla.py @@ +92,5 @@ > > +def get_screened_whiteboard(db): > + if not SCREENED_BUG_GROUP_IDS: > + groups = db.query("SELECT id FROM groups WHERE {{where}}", { > + "where":db.esfilter2sqlwhere({"terms":{"name": SCREENED_WHITEBOARD_BUG_GROUPS}}) Spacing. @@ +232,5 @@ > param.bugs_columns_SQL = SQL(",\n".join([lower(c) for c in bugs_columns])) > + param.bug_filter = db.esfilter2sqlwhere({"terms": {"b.bug_id": param.bug_list}}) > + param.screened_whiteboard = db.esfilter2sqlwhere({"and":[ > + {"exists":"m.bug_id"}, > + {"terms":{"m.group_id": SCREENED_BUG_GROUP_IDS}} Spacing. ::: bzETL/util/files.py @@ +70,4 @@ > > def write(self, data): > if not self.parent.exists: self.parent.create() > + with open(self._filename, "wb") as file: Just noticed this--probably bad to overwrite the built-in "file" type. ::: bzETL/util/queries/Q.py @@ +11,4 @@ > > import sys > import __builtin__ > +import math Sorting. @@ +166,5 @@ > + if isinstance(data, FlatList): > + if isinstance(field_name, basestring): > + # RETURN LIST OF VALUES > + if field_name.find(".") < 0: > + if data.path[0]==field_name: Spacing (and below). @@ +396,5 @@ > + # else: > + # # THIS COMPILES PYTHON TO MAKE A FUNCTION > + # where = CNV.esfilter2where(where) > + # > + # return [d for i, d in enumerate(data) if where(d, i, data)] Why is this here? ::: bzETL/util/queries/flat_list.py @@ +14,5 @@ > + IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT) > + """ > + list.__init__(self) > + self.data=data > + self.path=path Spacing. And remove leading and trailing blank lines. This class is kind of weird--you're just storing a couple extra attributes? You don't need to declare a class to just set arbitrary attributes... I guess this serves as some kind of documentation of what you're doing though? Maybe? ::: tests/test_etl.py @@ +165,5 @@ > ref = File(self.settings.comments_reference.filename).read() > if can != ref: > + for i, c in enumerate(can): > + found = -1 > + if can[i]!=ref[i]: Spacing. ::: tests/util/compare_es.py @@ +124,4 @@ > else: > c.attach_id=CNV.value2int(c.attach_id) > > + bug.attachments=Q.sort(bug.attachments, "attach_id") Might as well clean up spacing while you're here. ::: tests/util/elasticsearch.py @@ +53,5 @@ > + "from":0, > + "size":200000, > + "sort":[], > + "facets":{} > + }) Wonder if it would be better to build up the dict first with the common elements, then set the differing elements (and call the Q.select) in the if/else clause. Would make it more obvious what is different between the two, as well as making it easier to change common elements in the future.

Attachment #8351182 - Flags: review?(mcote) → review+

Kyle Lahnakoski [:ekyle]

Reporter

Comment 5

•

12 years ago

> ::: bzETL/util/files.py > @@ +70,4 @@ > > > > def write(self, data): > > if not self.parent.exists: self.parent.create() > > + with open(self._filename, "wb") as file: > > Just noticed this--probably bad to overwrite the built-in "file" type. Fixed > @@ +396,5 @@ > > + # else: > > + # # THIS COMPILES PYTHON TO MAKE A FUNCTION > > + # where = CNV.esfilter2where(where) > > + # > > + # return [d for i, d in enumerate(data) if where(d, i, data)] > > Why is this here? Oops! Fixed. > ::: bzETL/util/queries/flat_list.py > @@ +14,5 @@ > > + IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT) > > + """ > > + list.__init__(self) > > + self.data=data > > + self.path=path > > Spacing. And remove leading and trailing blank lines. > > This class is kind of weird--you're just storing a couple extra attributes? > You don't need to declare a class to just set arbitrary attributes... I > guess this serves as some kind of documentation of what you're doing though? > Maybe? The FlatList is the result of a nested filter query. When querying a set of documents, each with possible members in a list(), I must keep track of the original document and each of the members that passes the filter criterion. Taken a step further, we can filter children of children, each row in the FlatList is a path through the tree. self.path is the set of names necessary to interpret the self.data; it tracks the namespace so a later "select" clause can pull the attributes out of the result. Nested queries look like this: https://github.com/klahnakoski/Bugzilla-ETL/blob/cccb89f3ab27c4abc1f61baa2199aa1a781e71b0/tests/resources/python/look_for_leaks.py#L102 > private_attachments = Q.run({ > "from": bugs_w_private_attachments, > "select": "attachments.attach_id", > "where": {"or": [ > {"exists": "bug_group"}, > {"terms": {"attachments.attachments\.isprivate": ['1', True, 1]}} > ]} > }) This does not use it's full potential (you can also select bug attributes), but I think you can see the double nested for loop and special case checking this query replaces. It also demonstrates the advantages of Struct of normal dict() access: Struct gracefully handles the case when "attachments" does not exist, and Struct also handles quoting of document paths (like "attachments.attachments\.isprivate") FlatList will evolve as I write code that needs more features. Even it's name is kinda dumb. > ::: tests/util/elasticsearch.py > @@ +53,5 @@ > > + "from":0, > > + "size":200000, > > + "sort":[], > > + "facets":{} > > + }) > > Wonder if it would be better to build up the dict first with the common > elements, then set the differing elements (and call the Q.select) in the > if/else clause. Would make it more obvious what is different between the > two, as well as making it easier to change common elements in the future. Good idea. Fixed.

Kyle Lahnakoski [:ekyle]

Reporter

Comment 6

•

12 years ago

Attached patch Fix problems found my look_for_leaks.py (obsolete) — Details — Splinter Review

Here is the changeset: https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b The incremental ETL was not screening properly. This was caught my the look_for_leaks.py that I run client-side. It is fixed, and I added a test to confirm (which I should have done earlier).

Attachment #8357749 - Flags: review?(mcote)

Kyle Lahnakoski [:ekyle]

Reporter

Comment 7

•

12 years ago

Attached patch Fix whiteboard problem — Details — Splinter Review

Actual diff! Here is the changeset: https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b The incremental ETL was not screening properly. This was caught my the look_for_leaks.py that I run client-side. It is fixed, and I added a test to confirm (which I should have done earlier).

Attachment #8357853 - Flags: review?(mcote)

Kyle Lahnakoski [:ekyle]

Reporter

Updated

•

12 years ago

Attachment #8357749 - Attachment is obsolete: true

Attachment #8357749 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 8

•

12 years ago

Comment on attachment 8357853 [details] [diff] [review] Fix whiteboard problem Review of attachment 8357853 [details] [diff] [review]: ----------------------------------------------------------------- ::: bzETL/extract_bugzilla.py @@ +573,4 @@ > {"exists": "m.bug_id"}, > {"terms": {"m.group_id": SCREENED_BUG_GROUP_IDS}} > ]}) > + param.whiteboard_field=STATUS_WHITEBOARD_FIELD_ID Spacing.

Attachment #8357853 - Flags: review?(mcote) → review+

Kyle Lahnakoski [:ekyle]

Reporter

Comment 9

•

12 years ago

The latest deployment is still leaking whiteboard into the private cluster. This is not affecting the public cluster (because all private items are removed), and is not worse than what Metrics does already. Work continues.

Kyle Lahnakoski [:ekyle]

Reporter

Comment 10

•

12 years ago

Continuous review of whiteboard is showing all sensitive groups' whiteboard as "[screened]"

Kyle Lahnakoski [:ekyle]

Reporter

Comment 11

•

12 years ago

The problem was caused by bugs being part of two bug_groups: one sensitive, on not. The fix is to reduce the join to only the sensitive groups: https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-e7f5c575eeda495b16d7432bbe35b033L560 Here is the test that reveals the problem, and confirms the fix: https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-469bf8513f3c4ac87f66514572ce704fR455 This is not a patch, it is part of the main patch in https://bugzilla.mozilla.org/show_bug.cgi?id=879835#c27 More explicit about using "attachments.ispatch" (used in metrics cluster) vs "attachments_ispatch" (used in new clusters) https://github.com/klahnakoski/Bugzilla-ETL/commit/453afe1bfb3c45c289218510cd52e7749c4b0df1#diff-f0d2fd7c40ef1414520726f5c1f63583L152

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Remove whiteboard from sensitive groups

Categories

(Testing :: General, defect)

Tracking

(Not tracked)

People

(Reporter: ekyle, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type