Open Bug 1136954 Opened 7 years ago Updated 2 years ago

Build a private/hidden try server for security sensitive bug testing

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: Gavin, Unassigned)

References

Details

(Keywords: sec-want)

The goal is to have a try server that can be used for testing security fixes.

Requirements as I see them:
- all capabilities of the normal try server (same test jobs/configuration/etc.)
- backed by a private scm_level_3 Hg repository that is effectively write-only
- all build artifacts (logs, builds) should not be public (i.e. not uploaded to FTP)
- corresponding treeherder page behind LDAP auth (probably keyed off scm_level_3?)

Chris, do you have a good sense for how much work this is to set up?
Flags: needinfo?(catlee)
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #0)
> - all build artifacts (logs, builds) should not be public (i.e. not uploaded
> to FTP)

I'm presuming you'd be fine with the treeherder nodes (eg log parsers) having access to wherever the logs ended up? 

> - corresponding treeherder page behind LDAP auth (probably keyed off
> scm_level_3?)

This would need to be tied into Treeherder's persona auth. However having the ability to determine someone's ldap permissions from the treeherder service persona handling is something we want for other use cases too (ie retriggers/cancellation without having to auth twice, using the self-serve http auth).
(In reply to Ed Morley [:edmorley] from comment #1)
> I'm presuming you'd be fine with the treeherder nodes (eg log parsers)
> having access to wherever the logs ended up? 

Sure, as long as the treeherder nodes are reasonably secured (I assume they are).
Keywords: sec-want
This would be useful for B2G device image builds as well, which Legal has told us previously can't be run on our Try infra.
This would be a bunch of extra setup to do, and I don't think we're going to have time to get to it any time soon.
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #4)
> This would be a bunch of extra setup to do, and I don't think we're going to
> have time to get to it any time soon.

So then the question is who do we talk to in order to get this approved and added to, say, Q2 goals?
Flags: needinfo?(catlee)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #3)
> This would be useful for B2G device image builds as well, which Legal has
> told us previously can't be run on our Try infra.

Would L3 access be enough for that, or would it also be necessary to limit push access to contributors under MoCo NDA?
(In reply to Al Billings [:abillings] from comment #5)
> (In reply to Chris AtLee [:catlee] from comment #4)
> > This would be a bunch of extra setup to do, and I don't think we're going to
> > have time to get to it any time soon.
> 
> So then the question is who do we talk to in order to get this approved and
> added to, say, Q2 goals?

Doug. Q2 probably won't happen either by the looks of things.

I'm going to suggest we plan for this to happen once we're comfortably switched over to task cluster, which is currently underway.
Flags: needinfo?(catlee)
Adding Doug to this bug since his name has been invoked.
Flags: needinfo?(dougt)
i agree with the priorities here.
Flags: needinfo?(dougt)
QA Contact: pmoore → mshal
There has been no further action as of yet. I was just reminded of this because a dev accidentally pushed a security bug to a try server *with* tests and that's the scenario that this is meant to address (among others).

Chris, how is the task cluster work going?
Flags: needinfo?(catlee)
See Also: → 1140647
We'll have Linux Firefox and Android builds available on TC by EOQ. We're hoping to have OSX builds and some linux tests available in Q3.
Depends on: bb-to-tc
Flags: needinfo?(catlee)
What is the gating issue on tests?
Duplicate of this bug: 1207939
(In reply to Chris AtLee [:catlee] from comment #4)
> This would be a bunch of extra setup to do, and I don't think we're going to
> have time to get to it any time soon.

Since it has been six months, I figure it may be time to ask if we can get this on a quarter's goals. This keeps coming up as an issue.
Flags: needinfo?(catlee)
Duplicate of this bug: 1290512
Summary: build a try-private for security bug testing → Build a private try server for security bug testing
Duplicate of this bug: 1373889
Summary: Build a private try server for security bug testing → Build a private/hidden try server for security sensitive bug testing
This comes up a few times a year around security bugs. Any chance of it happening?
Greg, what are our options now for having private tasks, logs and artifacts in Taskcluster?

Last time we spoke about this I think we agreed that the public existence of tasks wasn't a problem.

I don't recall if we wanted to try and hide the task payload so that references to a private hg repo and revision would also be private.

We would also need a private view for treeherder. Ed, do you know how much work would be involved to do that?
Flags: needinfo?(garndt)
Flags: needinfo?(emorley)
Flags: needinfo?(catlee)
Priority: -- → P3
(In reply to Chris AtLee [:catlee] from comment #18)
> We would also need a private view for treeherder. Ed, do you know how much
> work would be involved to do that?

Off the top of my head, adding support to Treeherder would require:
1) Figuring out how to make each of these data sources available for consumption by downstream tooling in a secure way:
  - Repo push data (eg hg.mozilla.org pushlog or the GitHub API for a private GitHub repo)
  - Job/artifact metadata (I'm presuming this would be via TC API using TC auth, so easy)
  - Artifact content (such as logs on S3) - using IAM credentials?
2) Updating Treeherder ETL for importing each of these. Depending on what approaches are used for (1) will affect the complexity here.
3) Updating Treeherder schema/API/UI to support only displaying data to the appropriate logged in users (we already use Taskcluster auth for SSO, and we could limit by entire repo to save having make schema modifications on the massive jobs table for example)

From (1), the repo push data piece seems like the most unknown. 

However importing this data into the main Treeherder tables will completely break our ability to make the Treeherder data public (eg the read-only mirror made available via Redash) - which for me at least, makes it a non-starter.

Given that a new "private try" would presumably be taskcluster-only, how about we not ingest it into Treeherder at all, and instead just use the Taskcluster task group inspector instead? (eg https://tools.taskcluster.net/task-group-inspector/#/dpENsFW-Qd6Q8y725uXbGQ?_k=zqy23l) 

This would also solve the "how to ingest push data" problem, since other than for running the job itself, the system wouldn't have to worry about the repo/commits, they'd just be linked directly to a single push type view (rather than a Treeherder like view showing multiple pushes on one repo).
Flags: needinfo?(emorley)
Would running an entirely independent locked-down Treeherder instance be a crazy idea otherwise?
I think interpreting the results of a push without Treeherder would be extremely difficult.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)
> Would running an entirely independent locked-down Treeherder instance be a
> crazy idea otherwise?

That is one option, though you'd lose much of the benefit of Treeherder, especially once the auto-classification feature progresses, since you won't have enough of a data set. It's then twice as much for me to dev-ops.

(In reply to Chris AtLee [:catlee] from comment #21)
> I think interpreting the results of a push without Treeherder would be
> extremely difficult.

It would be harder, yes. But perhaps that could be offset by auto-retriggering tests that failed, and using retriggers to work around the hassle of identifying intermittent failures?

I really think keeping the scope of this system as small as possible is the way to actually make it viable.
I think the non-log-parsing parts will need some extra work too. For example both Taskcluster and Treeherder feed the normally public S3 log URL to the taskcluster unified log viewer [1], which then fetches the log incrementally client side as it's scrolled through. Presumably there would need to be a TC proxy that checks the user's scopes and 302s to the signed S3 log URL to facilitate this.
(In reply to Chris AtLee [:catlee] from comment #18)
> Greg, what are our options now for having private tasks, logs and artifacts
> in Taskcluster?

Private artifacts are supported by all workers.  However, we should come up with a solid namespacing of these artifacts so that we can assess who gets the scopes to read/write artifacts there.

Private logs are only supported by docker-worker, but I do not think would be hard to add to the other workers.


> I don't recall if we wanted to try and hide the task payload so that
> references to a private hg repo and revision would also be private.

I don't recall either, but I'm not sure if knowing a revision is at all helpful to an external party.
Flags: needinfo?(garndt)
We'd likely want to stand up a separate Mercurial server because I don't want to taint hg.mozilla.org with private repos: it's far easier assuming that all repo data on a machine is public.

We have also considered moving to a different storage model for the try and review repos that could facilitate "private" storage. Although I'd need to talk to someone about the security model to flush out access requirements because depending on the requirements, it may not be worth doing this just for private data.

Doing either is a fair bit of work. We're currently figuring out the architecture for hg.mo post SCL3. So if private try runs will be a requirement in the next few years, I'd appreciate someone telling us so we can plan for it in the future architecture.
So I was thinking about this and I wonder if there's an easier interim solution: punt on treating the job results/logs/error summaries/bug suggestions/binaries/... as private, and instead for now only hide the version control side (diffs, commit messages).

ie:
* stand up a private-hg.mozilla.org that's behind auth (eg Auth0 SSO for hgweb, SSH key + correct LDAP group for Hg read/write)
* adjust taskcluster so that a relevant SSH key is used to pull from this repo when building
* adjust whatever generates the Hg push pulse notifications to sanitise the commit messages/files changed for each push (eg just include author, datetime, SHA and no more)
* Treeherder then ingests the push/job result/parses the log as normal (except the push has certain fields blanked out)

This saves having to deal with everything else, but still gives much of the benefit without having to wait another year or two.

Thoughts?
Depends on: 1420510
Duplicate of this bug: 551745
You need to log in before you can comment on or make changes to this bug.