Open Bug 1324744 Opened 3 years ago Updated 5 months ago

one click loaner has user root for the shell and user worker for the interactive display- not very easy to hack around

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

REOPENED

People

(Reporter: jmaher, Unassigned, Mentored)

Details

I got a one click loaner for a linux64 debug bc7 task, and after selecting option 2, I had to do a few things:
1) ~/workspace/build/tests was set as root, I switched it to worker so I could edit and hack
2) After 30 minutes of hacking around I am not able to get |mach mochitest browser/base/content/test/general/browser_addCertException.js| to work
Thanks for filing, this reminded me that there was another similar bug on file. I believe they are the same issue so duping to that one. There is a workaround in comment 1.

As for the root issue, that's expected. Since you run in the interactive loaner as root, any files that get create (like the tests folder) belong to root. However, you said you weren't able to edit those files? That's another bug then, as you should already be running as root so should be able to modify anything :/
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1312739
should I file a new bug for the interactive desktop being logged in as worker while the files are unpacked as root, or should we use this bug?
I guess we can re-use this.. though it would probably belong under the Taskcluster product somewhere. Fwiw, I am unable to reproduce the root issue.
the confusion is:
1) get a shell and setup with the wizard. this downloads and unpacks things as user/group 'root'
2) open up an interactive display to run the tests, find out that we are logged in as 'worker' and cannot edit files, su has no clear path due to needing a pwd.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: one click loaner doesn't work well for |mach mochitest <path>| → one click loaner has user root for the shell and user worker for the interactive display- not very easy to hack around
Thanks makes sense. Jonas, could we login to the interactive display as root?

Alternatively, we could run the wizard as the worker user.. but this seems a bit hacky and might cause other problems.
Component: General → Tools
Flags: needinfo?(jopsen)
Product: Testing → Taskcluster
The interactive display doesn't "run as" anything -- it just provides access to the X session.  I suspect that running that X session as root will cause a lot of test failures.

Everything (including the wizard) should run as the worker user.  Early on in the task startup, we chown some things to root and then drop root privileges.  The wizard should follow suit.
Component: Tools → Task Configuration
The wizard just runs with whatever privileges the 'taskcluster-interactive-shell' script has, which I believe is called from somewhere in the tools repo:
https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop1604-test/taskcluster-interactive-shell

It's also a nice feature that developers are able to install whatever packages/tools they need into the loaners, I think we'd want to keep that ability.
so then we could chown the files after setup?
The wizard could potentially add a line to /etc/sudoers to allow open sudo in that case, before dropping privs.
Flags: needinfo?(jopsen)
How to approach this bug: get a one-click loaner and experiment with the shell access.  Based on comments above, think about ways to make that more user-friendly, then implement that.
Mentor: dustin
Hi, can I work on this bug?
@evavranici let's focus on the other bug I assigned to you earlier, for now.  There aren't a lot of open bugs available, so I'd like to make sure they are shared fairly.
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #12)
> @evavranici let's focus on the other bug I assigned to you earlier, for now.
> There aren't a lot of open bugs available, so I'd like to make sure they are
> shared fairly.

Sure.
Product: TaskCluster → Firefox Build System
This was made worse by bug 1323302, because if you now try to run the tests from the interactive shell, they fail with "Running Firefox Nightly as root in a regular user's session is not supported". What I had to do is 4 (exit), then `su - worker`, and `run-wizard` again.
(I also had to chown -R worker ~worker)

Since we're collecting work-arounds, I also needed su -p - worker -- -p passes the environment down, and the mozharness scripts in general need the environment.

I see vague directions in earlier comments, but no clear path for action. jmaher, can you suggest the immediate next steps or redirect? I expect to spend a good deal of time in interactive loaners this quarter and this is a bit of a show-stopper.

rwood, do you already now a better flow for working on Raptor in automation that gets around this?

tarek, you should be aware of this and know the work-arounds.

Flags: needinfo?(tarek)
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)

raptor runs on real hardware, so the interactive loaners (only for linux aws machines) will not be 100% identical. For hardware we need to do old fashioned ways of taking a machine out of automation and setting up VPN connections to the machine. Each OS has a different solution.

:nalexander, can you help outline some of the intended work you plan to do? Maybe what would be an ideal environment?

As for fixing interactive loaners, we should determine the need/priority- this is something the taskcluster team (:coop) would need to invest time in or at least aid in reviews and some technical discussions. I would like to hear if this is on a roadmap and where (I suspect it isn't on the H2 plans).

:coop, can you comment on state of interactive loaners across OS's (now that almost all jobs are using generic-worker) and any work planned?

Flags: needinfo?(jmaher) → needinfo?(coop)

Yep I've never ran Raptor on an interactive loaner so don't have input here, sorry - and I don't work on the CI hardware setup/integration, that is in the great hands of jmaher's team (thankfully!).

Flags: needinfo?(rwood)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)

:coop, can you comment on state of interactive loaners across OS's (now that almost all jobs are using generic-worker) and any work planned?

We hope to have time to address interactive loaner issues in Q4. As jobs migrate from docker-worker to generic-worker in Q3/Q4, it will be easier for us to fix things in one place.

Flags: needinfo?(coop)

The issue discussed here is completely orthogonal to workers.

(In reply to Mike Hommey [:glandium] from comment #20)

The issue discussed here is completely orthogonal to workers.

I'm not sure if that's replying to my comment, but the base configuration for interactive shells and loaners happens via the workers. I'm glad people have discovered workarounds, but if we want that stuff to work out-of-the-box, we'll need to fix those configs.

The move to generic-worker is an attempt to stop implementing functionality like this twice (or previously three times). There is known divergence between the existing worker implementations here.

What makes tasks run as non-root is not the worker, it's the script that runs in it. e.g. run-task.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)

raptor runs on real hardware, so the interactive loaners (only for linux aws machines) will not be 100% identical. For hardware we need to do old fashioned ways of taking a machine out of automation and setting up VPN connections to the machine. Each OS has a different solution.

:nalexander, can you help outline some of the intended work you plan to do? Maybe what would be an ideal environment?

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1562991 to track getting some hardware allocated for this work.

Flags: needinfo?(tarek)
You need to log in before you can comment on or make changes to this bug.