Closed Bug 1795063 Opened 2 years ago Closed 2 years ago

Migrate spidermonkey builds from AWS -> GCP

Categories

(Firefox Build System :: Task Configuration, task)

task

Tracking

(firefox-esr102 fixed, firefox108 fixed)

RESOLVED FIXED
108 Branch
Tracking Status
firefox-esr102 --- fixed
firefox108 --- fixed

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(2 files)

When I attempted the switch they all failed:
https://treeherder.mozilla.org/jobs?repo=try&revision=2c2e74e1232cc4427e5172bc4591eae5e754ec35

Failure log:
https://firefoxci.taskcluster-artifacts.net/Ru6NBH_WQJaa6iZGaS-BvQ/0/public/logs/live_backing.log

Relevant lines:

[task 2022-10-13T15:27:56.955Z] in directory /builds/worker/workspace/obj-spider, running ['setarch', 'x86_64', '-R', 'make', 'check']
[task 2022-10-13T15:27:56.956Z] setarch: failed to set personality to x86_64: Operation not permitted
[task 2022-10-13T15:27:56.966Z] exit status 0 for '(make-nonempty)'
[task 2022-10-13T15:27:56.966Z] exit status 1 for 'make check'

From an extremely quick Google, looks like this error can happen in docker in an unprivileged context. I wonder if we need to make a separate privileged pool in GCP or something.

So it does look like running this in docker's privileged mode would work, however:

A) Do we want to do that given the security hole that opens up, and
B) Why wasn't this failing before as the AWS b-linux pool is also not configured to run docker in privileged mode...

Can we ask the spidermonkey team how critical disabling ASLR is?

Are we using the same version of docker on both platforms?

Are we using the same version of docker on both platforms?

Good question, that could be the difference. I'd guess the GCP image is using a newer version of docker, I'll see if I can tease it out of the task logs.

Oh right duh, probably can't find that within the container.

Dave, do you know what versions of docker are installed in the following images?

  • docker-worker-v20220907-hvm-builder-trusted-bug1789506
  • monopacker-docker-worker-trusted-current-gcp-2022-08-12b

Steve, do you know if it's possible to remove the setarch -R calls (leaving ASLR enabled) in these tasks? If not, we likely need to grant these tasks privileged access to the docker host, which would involve creating a new pool and possibly introduce security risks.

Flags: needinfo?(sphink)
Flags: needinfo?(dhouse)

We can drop it. The sole purpose is for making failures more reproducible—if a failure (or the timing of a failure) depends on the particular randomization for a given run, then it's more likely to fail in the same way on a retrigger. But I don't know how common that is these days. Given that it sounds like it could be problematic, I'm fine with dropping it (or trying to do it and just logging a warning on failure) and seeing whether it proves to be an annoyance.

If we did want to do the work to support it, I would not recommend the big hammer of running the whole container in privileged mode (--privileged). It looks like you can alter the seccomp profile to allow only personality(ADDR_NO_RANDOMIZE).

Flags: needinfo?(sphink)

Let me know which way you want to go. I can write the patch to remove the setarch call (or I'm happy to review instead!) I wouldn't know where to put the docker run --security-opt seccomp=FILE.json alteration.

If we did want to do the work to support it, I would not recommend the big hammer of running the whole container in privileged mode (--privileged). It looks like you can alter the seccomp profile to allow only personality(ADDR_NO_RANDOMIZE).

Oh nice, I didn't know this was possible! Though to support this relops would probably need to generate a new image and they are pretty swamped at the moment, so my preference would be to remove setaddr for now (and file a bug to investigate adding it back at some point in the future).

I can write the patch (probably sometime next week), that way I can test whether there are any other issues running on GCP at the same time.

See Also: → 1795718
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/586b7867bbfe
Stop disabling ASLR in spidermonkey builds from automation, r=sfink
https://hg.mozilla.org/integration/autoland/rev/4d47085454d8
Migrate spidermonkey tasks from AWS -> GCP, r=MasterWayZ
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch

(In reply to Andrew Halberstadt [:ahal] from comment #4)

Oh right duh, probably can't find that within the container.

Dave, do you know what versions of docker are installed in the following images?

  • docker-worker-v20220907-hvm-builder-trusted-bug1789506
  • monopacker-docker-worker-trusted-current-gcp-2022-08-12b

You're right. The gcp d-w host is running a newer docker version, 19.03.13:

  • docker-worker-v20220907-hvm-builder-trusted-bug1789506
docker-worker-v20220907-hvm-builder-trusted-bug1789506:
  aws:
    us-east-1: ami-06ff4e113eb9b82f1
$ docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        d7080c1
 Built:             Wed Feb 20 02:27:13 2019
 OS/Arch:           linux/amd64
 Experimental:      false
  • monopacker-docker-worker-trusted-current-gcp-2022-08-12b
monopacker-docker-worker-trusted-gcp-2022-08-12b:
  fxci-level3-gcp: projects/taskcluster-imaging/global/images/docker-worker-gcp-trusted-u1804-2022-08-12b
$ docker version
Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Flags: needinfo?(dhouse)

Comment on attachment 9298874 [details]
Bug 1795063 - Stop disabling ASLR in spidermonkey builds from automation, r?sfink

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This patch is needed in order for spidermonkey tasks to run in GCP. Without we'll need to continue running these in AWS on esr102.
  • User impact if declined: None
  • Fix Landed on Version: 108
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This does not affect shippable builds.
Attachment #9298874 - Flags: approval-mozilla-esr102?

Comment on attachment 9298874 [details]
Bug 1795063 - Stop disabling ASLR in spidermonkey builds from automation, r?sfink

Approved for 102.7esr.

Attachment #9298874 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: