Closed Bug 1220692 Opened 9 years ago Closed 8 years ago

Provide some scoping on access to Balrog

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dustin, Unassigned)

References

Details

> figure out what this balrog workerType is doing and why the task that uses it (testing/taskcluster/tasks/builds/b2g_aries_spark_ota_base.yml) doesn't appear in these task graphs

There is a "balrog" workerType defined and used in in-tree task graphs.  It accesses the balrogVPNProxy feature.  I believe it's associated with the builds that failed in bug 1220338.

The question is, how can we allow this operation to succeed without opening universal access to balrog?  Do these balrog changes need to occur for every try push, or just for level-3 tree pushes?
Looping wander in on this as he's worked on the aires tasks.  I believe that balrog access is only required for pushes to m-c but Wander can correct me if I'm wrong.
Flags: needinfo?(wcosta)
there is also "funsize-balrog" worker used by funsize
One option here is to ugprade the balrog proxy so that, aside from just getting you VPN access to balrog, it can somehow filter acceptable balrog requests via scopes -- so you can grant a task the ability to post updates to balrog for OTA updates to a dogfooding device (or whatever aries is) without also granting it the ability to upgrade all Firefox users to Edge.

Another option may be to just extend balrog so that it acts as a worker, consuming tasks from the queue and making the given updates, checking scopes along the way.
(In reply to Greg Arndt [:garndt] from comment #1)
> Looping wander in on this as he's worked on the aires tasks.  I believe that
> balrog access is only required for pushes to m-c but Wander can correct me
> if I'm wrong.

Balrog access is required for ota (aka nightly) builds.
Flags: needinfo?(wcosta)
Blocks: 1226240
Nick, what do you think here?  What's the best way to control access to balrog at a finer-grained resolution?
Flags: needinfo?(nthomas)
In talking with catlee, I think the way forward here is to eliminate the VPN and have releng handle Balrog updates going forward. I'm not clear on the details.. we probably still will have tasks that need special scopes to accomplish this, but I wanted to mention it in the event that it affects the details for implementation. cc'ing jlund for details.
Flags: needinfo?(jlund)
Yeah, one option in that case is to make the balrog API publicly accessible, so TC can "push" changes to balrog via a proxy.  The other option is for balrog to "pull" changes by executing tasks.  We could probably pre-generate those tasks in the decision task, but could also create them dynamically in whatever code is currently making the balrog calls.
I'm guessing the VPN proxy creates a network path while the worker has secrets which grant it access to the API. Balrog has a model where permissions can be limited to products and/or http method, see https://github.com/mozilla/balrog/blob/master/auslib/db.py#L1055. This is still somewhat broad access if you grant product 'B2G', eg something in taskcluster could stomp on buildbot and vice versa, at least while we're in transition. Later it might mean you can't protect some parts of B2G. Or you want to modify nightly-style updates but maybe shouldn't have access to beta/release. 

AFAIK the main use case is updates for level 3 repos, either directly in builds or via funsize. Try support would only be there for people working on changes (recently gerard-majax/lissy was doing this, but ideally this should talk to another server, or something).

We would probably have to do another sec review to make the balrog API publicly accessible, the previous one assumed it was behind VPN.
Flags: needinfo?(nthomas)
I feel a little silly - it didn't occur to me that there were credentials involved, too :)

So, the balrogVPNProxy feature allows network access to balrog, as Nick said.  It does not do any sort of authentication to the service itself.  For funsize tasks, that comes from encrypted environment variables.  For other services, I don't know.

The net result is that this docker-worker feature is just a layer, so having it granted to a broader-than-desired set of tasks isn't such a big deal.

This bug, then, can be considered a lower-priority feature request: reconfigure balrog access from docker worker to not require on-image credentials, but to instead rely on the proxy to attach appropriate credentials to the request based on the scope.  One option would be to set up balrog to accept TC creds and translate scopes into balrog access.
(In reply to Selena Deckelmann :selenamarie :selena from comment #6)
> In talking with catlee, I think the way forward here is to eliminate the VPN
> and have releng handle Balrog updates going forward. I'm not clear on the
> details.. we probably still will have tasks that need special scopes to
> accomplish this, but I wanted to mention it in the event that it affects the
> details for implementation. cc'ing jlund for details.

having releng (buildbot infra) submit to balrog outside of TC has worked as a proof of concept with nightly promotion however, it's a stop gap and, IMO, something we should try to avoid extending if we can.

happy to help here where requested :)
Flags: needinfo?(jlund)
(the issue with on-image credentials is, those images are generally running untrusted code, so there's a risk of credential disclosure)
Assignee: dustin → nobody
Component: Integration → Platform and Services
This seems to have attracted very little interest.  I suppose our current balrog security model is adequate, then (network access is available to any level-3 job, and credentials are stored in private docker images).
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
RelEng are working on a balrog worker based on scriptworker that will submit to Balrog directly, without requiring credentials in a private docker image.
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.