Closed
Bug 952677
Opened 11 years ago
Closed 10 years ago
Remove whiteboard from sensitive groups
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ekyle, Unassigned)
References
Details
Attachments
(2 files, 2 obsolete files)
482.20 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
5.99 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
The action item from the security review for the private cluster was to ensure the whiteboard for very sensitive bug groups was removed. This is the patch that deals with that This patch also completes the code that monitors the public cluster for leaked private data into the public cluster. The code works using a combination of the metrics cluster and the public cluster, but the metrics cluster has some serious data damage: Upgrading the code for the private cluster is necessary.
Reporter | ||
Comment 1•11 years ago
|
||
The changes are here > https://github.com/klahnakoski/Bugzilla-ETL/compare/clear_whiteboard?expand=1&w=1 The test_incremental_etl_catches_tracking_flags is here > https://github.com/klahnakoski/Bugzilla-ETL/blob/a6cd09a2276a4728cd86d365130f73db34d57cf7/tests/test_etl.py#L369 The battery of tests to ensure the public cluster has no private data is here > https://github.com/klahnakoski/Bugzilla-ETL/blob/a6cd09a2276a4728cd86d365130f73db34d57cf7/tests/resources/python/look_for_leaks.py
Attachment #8350809 -
Flags: review?(mcote)
Comment 2•11 years ago
|
||
Can you give me an incremental diff over the patch from bug 952101? Looks like I've already reviewed most of this, and it'll be very tedious to tease out what has changed.
Reporter | ||
Comment 3•11 years ago
|
||
My apologies for giving you the wrong diff last time, and thank you for looking at this on the holidays. The majority of this patch is converting from "<screened>" to "[screened]": This is to take advantage of the whiteboard parsing that exists in ES already, and to standardize all screened fields to the same
Attachment #8350809 -
Attachment is obsolete: true
Attachment #8350809 -
Flags: review?(mcote)
Attachment #8351182 -
Flags: review?(mcote)
Comment 4•11 years ago
|
||
Comment on attachment 8351182 [details] [diff] [review] Changes to screen whiteboard Review of attachment 8351182 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine, just some nits below. I didn't point out all the spacing problems, just a few. Most are a lack of sufficient horizontal whitespace. ::: bzETL/bz_etl.py @@ +193,5 @@ > #RECENT PUBLIC BUGS > possible_public_bugs=get_recent_private_bugs(db, param) > possible_public_bugs=set(Q.select(possible_public_bugs, "bug_id")) > + if param.allow_private_bugs: > + #PRIVVATE BUGS Typo. ::: bzETL/extract_bugzilla.py @@ +92,5 @@ > > +def get_screened_whiteboard(db): > + if not SCREENED_BUG_GROUP_IDS: > + groups = db.query("SELECT id FROM groups WHERE {{where}}", { > + "where":db.esfilter2sqlwhere({"terms":{"name": SCREENED_WHITEBOARD_BUG_GROUPS}}) Spacing. @@ +232,5 @@ > param.bugs_columns_SQL = SQL(",\n".join([lower(c) for c in bugs_columns])) > + param.bug_filter = db.esfilter2sqlwhere({"terms": {"b.bug_id": param.bug_list}}) > + param.screened_whiteboard = db.esfilter2sqlwhere({"and":[ > + {"exists":"m.bug_id"}, > + {"terms":{"m.group_id": SCREENED_BUG_GROUP_IDS}} Spacing. ::: bzETL/util/files.py @@ +70,4 @@ > > def write(self, data): > if not self.parent.exists: self.parent.create() > + with open(self._filename, "wb") as file: Just noticed this--probably bad to overwrite the built-in "file" type. ::: bzETL/util/queries/Q.py @@ +11,4 @@ > > import sys > import __builtin__ > +import math Sorting. @@ +166,5 @@ > + if isinstance(data, FlatList): > + if isinstance(field_name, basestring): > + # RETURN LIST OF VALUES > + if field_name.find(".") < 0: > + if data.path[0]==field_name: Spacing (and below). @@ +396,5 @@ > + # else: > + # # THIS COMPILES PYTHON TO MAKE A FUNCTION > + # where = CNV.esfilter2where(where) > + # > + # return [d for i, d in enumerate(data) if where(d, i, data)] Why is this here? ::: bzETL/util/queries/flat_list.py @@ +14,5 @@ > + IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT) > + """ > + list.__init__(self) > + self.data=data > + self.path=path Spacing. And remove leading and trailing blank lines. This class is kind of weird--you're just storing a couple extra attributes? You don't need to declare a class to just set arbitrary attributes... I guess this serves as some kind of documentation of what you're doing though? Maybe? ::: tests/test_etl.py @@ +165,5 @@ > ref = File(self.settings.comments_reference.filename).read() > if can != ref: > + for i, c in enumerate(can): > + found = -1 > + if can[i]!=ref[i]: Spacing. ::: tests/util/compare_es.py @@ +124,4 @@ > else: > c.attach_id=CNV.value2int(c.attach_id) > > + bug.attachments=Q.sort(bug.attachments, "attach_id") Might as well clean up spacing while you're here. ::: tests/util/elasticsearch.py @@ +53,5 @@ > + "from":0, > + "size":200000, > + "sort":[], > + "facets":{} > + }) Wonder if it would be better to build up the dict first with the common elements, then set the differing elements (and call the Q.select) in the if/else clause. Would make it more obvious what is different between the two, as well as making it easier to change common elements in the future.
Attachment #8351182 -
Flags: review?(mcote) → review+
Reporter | ||
Comment 5•10 years ago
|
||
> ::: bzETL/util/files.py > @@ +70,4 @@ > > > > def write(self, data): > > if not self.parent.exists: self.parent.create() > > + with open(self._filename, "wb") as file: > > Just noticed this--probably bad to overwrite the built-in "file" type. Fixed > @@ +396,5 @@ > > + # else: > > + # # THIS COMPILES PYTHON TO MAKE A FUNCTION > > + # where = CNV.esfilter2where(where) > > + # > > + # return [d for i, d in enumerate(data) if where(d, i, data)] > > Why is this here? Oops! Fixed. > ::: bzETL/util/queries/flat_list.py > @@ +14,5 @@ > > + IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT) > > + """ > > + list.__init__(self) > > + self.data=data > > + self.path=path > > Spacing. And remove leading and trailing blank lines. > > This class is kind of weird--you're just storing a couple extra attributes? > You don't need to declare a class to just set arbitrary attributes... I > guess this serves as some kind of documentation of what you're doing though? > Maybe? The FlatList is the result of a nested filter query. When querying a set of documents, each with possible members in a list(), I must keep track of the original document and each of the members that passes the filter criterion. Taken a step further, we can filter children of children, each row in the FlatList is a path through the tree. self.path is the set of names necessary to interpret the self.data; it tracks the namespace so a later "select" clause can pull the attributes out of the result. Nested queries look like this: https://github.com/klahnakoski/Bugzilla-ETL/blob/cccb89f3ab27c4abc1f61baa2199aa1a781e71b0/tests/resources/python/look_for_leaks.py#L102 > private_attachments = Q.run({ > "from": bugs_w_private_attachments, > "select": "attachments.attach_id", > "where": {"or": [ > {"exists": "bug_group"}, > {"terms": {"attachments.attachments\.isprivate": ['1', True, 1]}} > ]} > }) This does not use it's full potential (you can also select bug attributes), but I think you can see the double nested for loop and special case checking this query replaces. It also demonstrates the advantages of Struct of normal dict() access: Struct gracefully handles the case when "attachments" does not exist, and Struct also handles quoting of document paths (like "attachments.attachments\.isprivate") FlatList will evolve as I write code that needs more features. Even it's name is kinda dumb. > ::: tests/util/elasticsearch.py > @@ +53,5 @@ > > + "from":0, > > + "size":200000, > > + "sort":[], > > + "facets":{} > > + }) > > Wonder if it would be better to build up the dict first with the common > elements, then set the differing elements (and call the Q.select) in the > if/else clause. Would make it more obvious what is different between the > two, as well as making it easier to change common elements in the future. Good idea. Fixed.
Reporter | ||
Comment 6•10 years ago
|
||
Here is the changeset: https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b The incremental ETL was not screening properly. This was caught my the look_for_leaks.py that I run client-side. It is fixed, and I added a test to confirm (which I should have done earlier).
Attachment #8357749 -
Flags: review?(mcote)
Reporter | ||
Comment 7•10 years ago
|
||
Actual diff! Here is the changeset: https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b The incremental ETL was not screening properly. This was caught my the look_for_leaks.py that I run client-side. It is fixed, and I added a test to confirm (which I should have done earlier).
Attachment #8357853 -
Flags: review?(mcote)
Reporter | ||
Updated•10 years ago
|
Attachment #8357749 -
Attachment is obsolete: true
Attachment #8357749 -
Flags: review?(mcote)
Comment 8•10 years ago
|
||
Comment on attachment 8357853 [details] [diff] [review] Fix whiteboard problem Review of attachment 8357853 [details] [diff] [review]: ----------------------------------------------------------------- ::: bzETL/extract_bugzilla.py @@ +573,4 @@ > {"exists": "m.bug_id"}, > {"terms": {"m.group_id": SCREENED_BUG_GROUP_IDS}} > ]}) > + param.whiteboard_field=STATUS_WHITEBOARD_FIELD_ID Spacing.
Attachment #8357853 -
Flags: review?(mcote) → review+
Reporter | ||
Comment 9•10 years ago
|
||
The latest deployment is still leaking whiteboard into the private cluster. This is not affecting the public cluster (because all private items are removed), and is not worse than what Metrics does already. Work continues.
Reporter | ||
Comment 10•10 years ago
|
||
Continuous review of whiteboard is showing all sensitive groups' whiteboard as "[screened]"
Reporter | ||
Comment 11•10 years ago
|
||
The problem was caused by bugs being part of two bug_groups: one sensitive, on not. The fix is to reduce the join to only the sensitive groups: https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-e7f5c575eeda495b16d7432bbe35b033L560 Here is the test that reveals the problem, and confirms the fix: https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-469bf8513f3c4ac87f66514572ce704fR455 This is not a patch, it is part of the main patch in https://bugzilla.mozilla.org/show_bug.cgi?id=879835#c27 More explicit about using "attachments.ispatch" (used in metrics cluster) vs "attachments_ispatch" (used in new clusters) https://github.com/klahnakoski/Bugzilla-ETL/commit/453afe1bfb3c45c289218510cd52e7749c4b0df1#diff-f0d2fd7c40ef1414520726f5c1f63583L152
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•