Closed Bug 952677 Opened 11 years ago Closed 10 years ago

Remove whiteboard from sensitive groups

Categories

(Testing :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ekyle, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

The action item from the security review for the private cluster was to ensure the whiteboard for very sensitive bug groups was removed.  This is the patch that deals with that

This patch also completes the code that monitors the public cluster for leaked private data into the public cluster.  The code works using a combination of the metrics cluster and the public cluster, but the metrics cluster has some serious data damage:  Upgrading the code for the private cluster is necessary.
Can you give me an incremental diff over the patch from bug 952101?  Looks like I've already reviewed most of this, and it'll be very tedious to tease out what has changed.
My apologies for giving you the wrong diff last time, and thank you for looking at this on the holidays.  

The majority of this patch is converting from "<screened>" to "[screened]":  This is to take advantage of the whiteboard parsing that exists in ES already, and to standardize all screened fields to the same
Attachment #8350809 - Attachment is obsolete: true
Attachment #8350809 - Flags: review?(mcote)
Attachment #8351182 - Flags: review?(mcote)
Comment on attachment 8351182 [details] [diff] [review]
Changes to screen whiteboard

Review of attachment 8351182 [details] [diff] [review]:
-----------------------------------------------------------------

Seems fine, just some nits below.  I didn't point out all the spacing problems, just a few.  Most are a lack of sufficient horizontal whitespace.

::: bzETL/bz_etl.py
@@ +193,5 @@
>      #RECENT PUBLIC BUGS
>      possible_public_bugs=get_recent_private_bugs(db, param)
>      possible_public_bugs=set(Q.select(possible_public_bugs, "bug_id"))
> +    if param.allow_private_bugs:
> +        #PRIVVATE BUGS

Typo.

::: bzETL/extract_bugzilla.py
@@ +92,5 @@
>  
> +def get_screened_whiteboard(db):
> +    if not SCREENED_BUG_GROUP_IDS:
> +        groups = db.query("SELECT id FROM groups WHERE {{where}}", {
> +            "where":db.esfilter2sqlwhere({"terms":{"name": SCREENED_WHITEBOARD_BUG_GROUPS}})

Spacing.

@@ +232,5 @@
>          param.bugs_columns_SQL = SQL(",\n".join([lower(c) for c in bugs_columns]))
> +        param.bug_filter = db.esfilter2sqlwhere({"terms": {"b.bug_id": param.bug_list}})
> +        param.screened_whiteboard = db.esfilter2sqlwhere({"and":[
> +            {"exists":"m.bug_id"},
> +            {"terms":{"m.group_id": SCREENED_BUG_GROUP_IDS}}

Spacing.

::: bzETL/util/files.py
@@ +70,4 @@
>  
>      def write(self, data):
>          if not self.parent.exists: self.parent.create()
> +        with open(self._filename, "wb") as file:

Just noticed this--probably bad to overwrite the built-in "file" type.

::: bzETL/util/queries/Q.py
@@ +11,4 @@
>  
>  import sys
>  import __builtin__
> +import math

Sorting.

@@ +166,5 @@
> +    if isinstance(data, FlatList):
> +        if isinstance(field_name, basestring):
> +            # RETURN LIST OF VALUES
> +            if field_name.find(".") < 0:
> +                if data.path[0]==field_name:

Spacing (and below).

@@ +396,5 @@
> +    # else:
> +    #     # THIS COMPILES PYTHON TO MAKE A FUNCTION
> +    #     where = CNV.esfilter2where(where)
> +    #
> +    # return [d for i, d in enumerate(data) if where(d, i, data)]

Why is this here?

::: bzETL/util/queries/flat_list.py
@@ +14,5 @@
> +        IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT)
> +        """
> +        list.__init__(self)
> +        self.data=data
> +        self.path=path

Spacing.  And remove leading and trailing blank lines.

This class is kind of weird--you're just storing a couple extra attributes?  You don't need to declare a class to just set arbitrary attributes... I guess this serves as some kind of documentation of what you're doing though?  Maybe?

::: tests/test_etl.py
@@ +165,5 @@
>          ref = File(self.settings.comments_reference.filename).read()
>          if can != ref:
> +            for i, c in enumerate(can):
> +                found = -1
> +                if can[i]!=ref[i]:

Spacing.

::: tests/util/compare_es.py
@@ +124,4 @@
>          else:
>              c.attach_id=CNV.value2int(c.attach_id)
>  
> +    bug.attachments=Q.sort(bug.attachments, "attach_id")

Might as well clean up spacing while you're here.

::: tests/util/elasticsearch.py
@@ +53,5 @@
> +            "from":0,
> +            "size":200000,
> +            "sort":[],
> +            "facets":{}
> +        })

Wonder if it would be better to build up the dict first with the common elements, then set the differing elements (and call the Q.select) in the if/else clause.  Would make it more obvious what is different between the two, as well as making it easier to change common elements in the future.
Attachment #8351182 - Flags: review?(mcote) → review+
> ::: bzETL/util/files.py
> @@ +70,4 @@
> >  
> >      def write(self, data):
> >          if not self.parent.exists: self.parent.create()
> > +        with open(self._filename, "wb") as file:
> 
> Just noticed this--probably bad to overwrite the built-in "file" type.

Fixed


> @@ +396,5 @@
> > +    # else:
> > +    #     # THIS COMPILES PYTHON TO MAKE A FUNCTION
> > +    #     where = CNV.esfilter2where(where)
> > +    #
> > +    # return [d for i, d in enumerate(data) if where(d, i, data)]
> 
> Why is this here?

Oops!  Fixed.

> ::: bzETL/util/queries/flat_list.py
> @@ +14,5 @@
> > +        IT IS EXPECTED len(data[i]) == len(path)+1 (data[i][0] IS THE ORIGINAL ROW OBJECT)
> > +        """
> > +        list.__init__(self)
> > +        self.data=data
> > +        self.path=path
> 
> Spacing.  And remove leading and trailing blank lines.
> 
> This class is kind of weird--you're just storing a couple extra attributes? 
> You don't need to declare a class to just set arbitrary attributes... I
> guess this serves as some kind of documentation of what you're doing though?
> Maybe?

The FlatList is the result of a nested filter query.  When querying a set of documents, each with possible members in a list(), I must keep track of the original document and each of the members that passes the filter criterion.  Taken a step further, we can filter children of children, each row in the FlatList is a path through the tree.  self.path is the set of names necessary to interpret the self.data; it tracks the namespace so a later "select" clause can pull the attributes out of the result.

Nested queries look like this: https://github.com/klahnakoski/Bugzilla-ETL/blob/cccb89f3ab27c4abc1f61baa2199aa1a781e71b0/tests/resources/python/look_for_leaks.py#L102

>   private_attachments = Q.run({
>       "from": bugs_w_private_attachments,
>       "select": "attachments.attach_id",
>       "where": {"or": [
>           {"exists": "bug_group"},
>           {"terms": {"attachments.attachments\.isprivate": ['1', True, 1]}}
>       ]}
>   })

This does not use it's full potential (you can also select bug attributes), but I think you can see the double nested for loop and special case checking this query replaces.  It also demonstrates the advantages of Struct of normal dict() access: Struct gracefully handles the case when "attachments" does not exist, and Struct also handles quoting of document paths (like "attachments.attachments\.isprivate")

FlatList will evolve as I write code that needs more features.  Even it's name is kinda dumb.



> ::: tests/util/elasticsearch.py
> @@ +53,5 @@
> > +            "from":0,
> > +            "size":200000,
> > +            "sort":[],
> > +            "facets":{}
> > +        })
> 
> Wonder if it would be better to build up the dict first with the common
> elements, then set the differing elements (and call the Q.select) in the
> if/else clause.  Would make it more obvious what is different between the
> two, as well as making it easier to change common elements in the future.

Good idea.  Fixed.
Here is the changeset:
https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b

The incremental ETL was not screening properly.  This was caught my the look_for_leaks.py that I run client-side.  It is fixed, and I added a test to confirm (which I should have done earlier).
Attachment #8357749 - Flags: review?(mcote)
Actual diff!

Here is the changeset:
https://github.com/klahnakoski/Bugzilla-ETL/commit/711810f08951a731dc543c10a0973fc34ed17c6b

The incremental ETL was not screening properly.  This was caught my the look_for_leaks.py that I run client-side.  It is fixed, and I added a test to confirm (which I should have done earlier).
Attachment #8357853 - Flags: review?(mcote)
Attachment #8357749 - Attachment is obsolete: true
Attachment #8357749 - Flags: review?(mcote)
Comment on attachment 8357853 [details] [diff] [review]
Fix whiteboard problem

Review of attachment 8357853 [details] [diff] [review]:
-----------------------------------------------------------------

::: bzETL/extract_bugzilla.py
@@ +573,4 @@
>          {"exists": "m.bug_id"},
>          {"terms": {"m.group_id": SCREENED_BUG_GROUP_IDS}}
>      ]})
> +    param.whiteboard_field=STATUS_WHITEBOARD_FIELD_ID

Spacing.
Attachment #8357853 - Flags: review?(mcote) → review+
The latest deployment is still leaking whiteboard into the private cluster.  This is not affecting the public cluster (because all private items are removed), and is not worse than what Metrics does already.

Work continues.
Continuous review of whiteboard is showing all sensitive groups' whiteboard as "[screened]"
The problem was caused by bugs being part of two bug_groups: one sensitive, on not.  The fix is to reduce the join to only the sensitive groups:   https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-e7f5c575eeda495b16d7432bbe35b033L560

Here is the test that reveals the problem, and confirms the fix: https://github.com/klahnakoski/Bugzilla-ETL/commit/bd520f1423d9bed618471795515678cf628afcdf#diff-469bf8513f3c4ac87f66514572ce704fR455

This is not a patch, it is part of the main patch in https://bugzilla.mozilla.org/show_bug.cgi?id=879835#c27

More explicit about using "attachments.ispatch" (used in metrics cluster) vs "attachments_ispatch" (used in new clusters)
https://github.com/klahnakoski/Bugzilla-ETL/commit/453afe1bfb3c45c289218510cd52e7749c4b0df1#diff-f0d2fd7c40ef1414520726f5c1f63583L152
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: