Closed Bug 1109803 Opened 6 years ago Closed 3 years ago

Allowing starring/classifying intermittent failures against security/private bugs

Categories

(Tree Management :: Treeherder, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mccr8, Unassigned)

References

(Blocks 1 open bug)

Details

Sometimes we get intermittent oranges that indicate a security issue, like a crash on a jemalloc-poison value, or a use-after-free detected by ASan.  A sheriff notices the crash is happening, then files a bug.  It gets starred a few times, then somebody notices that it is security related, and hides it.

The problem is that now the bug will only rarely get starred if it recurs, because it must be manually starred by a sheriff who notices the failure and remembers that there is some hidden bug for it.

One solution is to file a private bug, and just leave the other one for starring public.  This has the drawback that somebody can just search TBPL for intermittent ASan failures, which may provide hints as to security issues.

Another solution would be to somehow integrate these bugs a little more into the tree herder starring infrastructure.  Doing something like showing the subject line for hidden intermittent oranges to logged in sheriffs seems like it isn't exposing a lot of information.  It could even just have some kind of "requires manual star" annotation so the starring bot wouldn't need posting access to these bugs.  Anyways, this would require a lot of very careful security considerations for what kind of access a random bot has to bugs (maybe this isn't really fixable in a safe way), but it would be nice if we could somehow get these bugs starred.
Priority: -- → P4
Priority: P4 → P5
I think there are two ways automatic classification could be allowed:

1) Continue to use bugs. Would need:
 - a bugzilla account with security permissions
 - treeherder to use that account when fetching intermittent failures to put in the bugs cache
 - orangefactor to use that account when commenting on the bug with the daily/weekly summaries
 - figuring out how to not reveal too much information about the bug when displaying it as a suggestion in treeherder
 - figuring out how to not reveal too much information about the bug when displaying stats about it in orangefactor

2) Wait until the "bugs are no longer the source of truth" crash-stats like model. 

The problems with #1 are:
 - Treeherder/OrangeFactor then have credentials for an account with security bug access. OrangeFactor is EOL and still using Python 2.6 and no doubt has vulnerabilities.
 - one way to secure those credentials would be to have BMO only allow them to perform specific actions to bugs with the intermittent-failure keyword. However the chances of us getting that feature added are slim
 - even if the credentials are secured, the treeherder bugs cache still contains sensitive bugs, which means we can no longer allow read-only replica access to it (eg for activedata, redash)
 - orangefactor currently has no authentication mechanism, so we wouldn't be able to show stats for only sheriffs
(...and so on. As such, this option isn't viable IMO.)

#2 is bug 1179263 and very much long term (and would need lots of other pieces).

An alternative for the meantime would be to:
- make the bug filer tool warn if a bug about to be filed contains certain strings
- for failure lines that contain those same set of strings, in addition to any existing bug suggestions also show a "this failure might have private intermittent-failure bugs filed: <pre-filled BMO quicksearch link>", that sheriffs could click
Bugs are not going to be the source of truth in the future. For now we don't have the resources to try and tackle this.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
The Softvision contractors don't have access to restricted bugs, that's why a search approach won't work for them and they will likely create a new bug for every instance of the failure.

There are also 15 of them which makes them less used to filing bugs which should be restricted, so a warning after a match with keywords would already be helpful.

In the medium term, Bugherder (which is used to update bugs after they land on central, beta, etc.) will need similar bug access to support updates of restricted bugs.
Blocks: 1178522
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #3)
> The Softvision contractors don't have access to restricted bugs

I can see how that's problematic :-(

It's worth noting though that in its current state this bug isn't really actionable, so will need someone (likely from a team with more headcount & full understanding of the use-cases/threat models) to flesh it out first. 

One of the big pre-requisites is deciding what threat models this is hoping to address, and ensuring that any proposed solution does so without just moving them elsewhere. 

For example, all log data on S3 is public, as is all pulse notifications, Treeherder's API and so on. Therefore unless those are made private too, having a bug marked as private possibly doesn't gain much over leaving it open and having the patches created in a different bug. (See also bug 1136954 for prior discussion on how much work it would be to make certain things private.)

> In the medium term, Bugherder (which is used to update bugs after they land
> on central, beta, etc.) will need similar bug access to support updates of
> restricted bugs.

Fwiw this will require a significant rewrite of Bugherder to be able to do this (which is unfortunate given that the sole maintainer of Bugherder no longer works at Mozilla), since it currently is a static site that makes all requests client side.(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #3)
> The Softvision contractors don't have access to restricted bugs, that's why
> a search approach won't work for them and they will likely create a new bug
> for every instance of the failure.
> 
> There are also 15 of them which makes them less used to filing bugs which
> should be restricted, so a warning after a match with keywords would already
> be helpful.
> 
> In the medium term, Bugherder (which is used to update bugs after they land
> on central, beta, etc.) will need similar bug access to support updates of
> restricted bugs.

(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #3)
> The Softvision contractors don't have access to restricted bugs, that's why
> a search approach won't work for them and they will likely create a new bug
> for every instance of the failure.
> 
> There are also 15 of them which makes them less used to filing bugs which
> should be restricted, so a warning after a match with keywords would already
> be helpful.
> 
> In the medium term, Bugherder (which is used to update bugs after they land
> on central, beta, etc.) will need similar bug access to support updates of
> restricted bugs.
Eugh there's something up with either my Nightly build or Bugzilla, I clicked the reply button several times since it didn't response, but there only appeared to be one copy of the text when I pressed submit?!
I'm having the same issue with replies. They'll show up eventually. I don't know if it is Nightly or Bugzilla that is having the issue.
(In reply to Ed Morley [:emorley] from comment #4)
> For example, all log data on S3 is public, as is all pulse notifications,
> Treeherder's API and so on. Therefore unless those are made private too,
> having a bug marked as private possibly doesn't gain much over leaving it
> open and having the patches created in a different bug.

To give a more detailed example scenario...

If Treeherder supported classifying failures against a private bug, what should non-authorised users see in Treeherder for a failure that's been classified as such by a sheriff? The alternatives seem to be:
 1) The bug number but with the summary redacted
 2) An empty classification of "intermittent" but no bug number
 3) No classification at all

Now (1) and (2) make it immediately obvious that the failure was related to a security bug. 

And (3) would confuse all other Treeherder users, plus still make it pretty obvious that it's related to a security bug (since the sheriffed repositories are expected to have zero unclassified failures, so anything outstanding on older pushes is would be unusual).

So either way, the malicious actor now knows which jobs failed in a way that's related to a security bug. They can then either:
 a) look at the failure message in the Treeherder job details panel
 b) follow the link to the raw public log on S3

For (a), one approach might be to hide the error lines entirely from the Treeherder API unless authorised. However this will:
 * confuse other Treeherder users
 * not prevent leaks before the failure was classified (the error summary will need to be visible until it's classified against a private bug)
 * not solve anything unless the raw log is also made private

For (b), there seem to be two approaches:

 i) Have all logs in S3 be private. This would:
    * need significant changes to the taskcluster and treeherder pieces, and break things like the taskcluster unified log viewer (which is client side) unless it's rewritten to use auth
    * mean that contributors can't see any log on Treeherder at all (or most of them, depending on whether we let them view logs for their own try pushes etc)

 ii) Have most logs on S3 be public, but retrospectively hide the ones relating to a failure that was classified against a security bug. However:
    * this would presumably require significant changes on the taskcluster side, and may not even be possible
    * it doesn't stop leaking the logs prior to classification (and marking the log as private would only serve to paint a target on it, had someone downloaded it prior)
    * this would still require most of the same taskcluster/treeherder changes as (i), albeit avoid the issue of stopping contributors from being able to view logs


Therefore without handling all of the above, classifying failures against private bugs would only serve to paint a target against jobs/failures that can then have their logs freely downloaded and analysed.
See Also: → 1369067
Summary: Try to come up with some way of allowing starring of intermittent sec bugs → Allowing starring/classifying intermittent failures against security/private bugs
Flags: needinfo?(dburns)
Based on comment 7 this is a non-trivial issue to fix that realistically we can't do anytime soon without TC/hg.m.o updates first. Closing this for now and if the necessary dependencies get worked on then we can discuss about reopening this issue.
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Flags: needinfo?(dburns)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.