Closed Bug 1589703 Opened 6 years ago Closed 6 years ago

temporarily switch some dev GCP workers to using the CI Taskcluster instance

Categories

(Release Engineering :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Assigned: mtabara)

References

Details

The dedicated Firefox-CI Taskcluster instance is coming soon and in order for that to happen, we want to ensure the GCP scriptworkers are still working properly. They will be using a different Taskcluster ROOT URL and new clients ids/tokens.

The ideal scenario is that we take the opportunity to fix our naming in GCP as a whole and create two new pools. But since the go/no-go meeting is on Monday and we're under time-pressure, let's validate the new instance with at least two workers.

Plan: switch the beetmover/bouncer workers for DEV environment to use the new Firefox CI Taskcluster. What does that mean?:
a) we can keep the TC client IDs in the new Firefox CI instance
b) pass the new tokens to :oremj
c) overwrite directly from the init_worker.sh the new TASKCLUSTER_URL.

Upon completing these changes, we should be able to try push against the dev workers and test a staging release validates these two workers.

Depends on: 1589760

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #0)

The dedicated Firefox-CI Taskcluster instance is coming soon and in order for that to happen, we want to ensure the GCP scriptworkers are still working properly. They will be using a different Taskcluster ROOT URL and new clients ids/tokens.

The ideal scenario is that we take the opportunity to fix our naming in GCP as a whole and create two new pools. But since the go/no-go meeting is on Monday and we're under time-pressure, let's validate the new instance with at least two workers.

Plan: switch the beetmover/bouncer workers for DEV environment to use the new Firefox CI Taskcluster. What does that mean?:
a) we can keep the TC client IDs in the new Firefox CI instance

done, I've created TC clients to match what we currently have in Tascluster

b) pass the new tokens to :oremj

done

c) overwrite directly from the init_worker.sh the new TASKCLUSTER_URL.

I have a PR prepped as soon as the new credentials are in place, to push to Docker hub registry.

Upon completing these changes, we should be able to try push against the dev workers and test a staging release validates these two workers.

Pending for workers to be updated first.

Status update: @oremj updated credentials on CloudOps side and I pushed

diff --git a/docker.d/init.sh b/docker.d/init.sh
index 6e3812d..46a40c1 100755
--- a/docker.d/init.sh
+++ b/docker.d/init.sh
@@ -22,6 +22,8 @@ if [ "$ENV" == "prod" ]; then
   test_var_set 'ED25519_PRIVKEY'
 fi
 
+# XXX: bug 1589703 temp hack to test the new Firefox CI cluster
+export TASKCLUSTER_ROOT_URL=https://stage.taskcluster.nonprod.cloudops.mozgcp.net

to our `dev-{signingscript, beetmoverscript, balrogscript, bouncerscript}

Tweaking TASKCLUSTER_ROOT_URL in the docker/init.sh has no effect I presume, since that env var is not consumed. However, I found some interesting bits in scriptworker itself that were worth fixing - I pushed the fixes under https://github.com/MihaiTabara/scriptworker/tree/bug1589703, mainly:

mtabara@mozspace:[bug1589703]~/work/mozilla/clones/git/scriptworker$ git diff mozilla/master
diff --git a/scriptworker/constants.py b/scriptworker/constants.py
index b507f09..891517f 100644
--- a/scriptworker/constants.py
+++ b/scriptworker/constants.py
@@ -30,7 +30,7 @@ STATUSES = {
 # When adding new complex config, make sure all `list`s are `tuple`s, and all
 # `dict`s are `frozendict`s!  (This should get caught by config tests.)
 DEFAULT_CONFIG = frozendict({
-    "taskcluster_root_url": "https://taskcluster.net",
+    "taskcluster_root_url": "https://stage.taskcluster.nonprod.cloudops.mozgcp.net",
     # Worker identification
     "provisioner_id": "test-dummy-provisioner",
     "worker_group": "test-dummy-workers",
diff --git a/scriptworker/cot/verify.py b/scriptworker/cot/verify.py
index 9d0eda4..7a2664d 100644
--- a/scriptworker/cot/verify.py
+++ b/scriptworker/cot/verify.py
@@ -2060,7 +2060,7 @@ async def _async_verify_cot_cmdln(opts, tmp):
         context.queue = context.queue or Queue(
             session=session,
             options={
-                'rootUrl': os.environ.get('TASKCLUSTER_ROOT_URL', 'https://taskcluster.net'),
+                'rootUrl': os.environ.get('TASKCLUSTER_ROOT_URL', 'https://stage.taskcluster.nonprod.cloudops.mozgcp.net'),
             },
         )
         context.task = await context.queue.task(opts.task_id)
@@ -2135,7 +2135,7 @@ async def _async_create_test_workdir(task_id, path, queue=None):
         context.queue = queue or context.queue or Queue(
             session=session,
             options={
-                'rootUrl': os.environ.get('TASKCLUSTER_ROOT_URL', 'https://taskcluster.net'),
+                'rootUrl': os.environ.get('TASKCLUSTER_ROOT_URL', 'https://stage.taskcluster.nonprod.cloudops.mozgcp.net'),
             },
         )
         context.task = await context.queue.task(task_id)

==> released a new github release based on this.

And then I tweaked the dev workers to point to that in https://github.com/MihaiTabara/scriptworker-scripts/tree/firefoxci

diff --git a/bouncerscript/requirements/base.in b/bouncerscript/requirements/base.in
index a791f64..d3d8491 100644
--- a/bouncerscript/requirements/base.in
+++ b/bouncerscript/requirements/base.in
@@ -1,2 +1,2 @@
-scriptworker
+https://github.com/MihaiTabara/scriptworker/archive/bug1589703.zip
 mozilla-version
diff --git a/bouncerscript/setup.py b/bouncerscript/setup.py
index 8eacb5a..9970501 100644
--- a/bouncerscript/setup.py
+++ b/bouncerscript/setup.py
@@ -7,10 +7,6 @@ project_dir = os.path.abspath(os.path.dirname(__file__))
 with open(os.path.join(project_dir, 'version.txt')) as f:
     version = f.read().rstrip()

-# We allow commented lines in this file
-with open(os.path.join(project_dir, 'requirements/base.in')) as f:
-    requirements = [line.rstrip('\n') for line in f if not line.startswith('#')]
-

 setup(
     name='bouncerscript',
@@ -31,7 +27,10 @@ setup(
         ],
     },
     license='MPL2',
-    install_requires=requirements,
+    install_requires=[
+        "scriptworker",
+        "mozilla-version",
+    ],
     classifiers=(
         'Programming Language :: Python :: 3.6',
         'Programming Language :: Python :: 3.7',
diff --git a/docker.d/init.sh b/docker.d/init.sh
index 6e3812d..46a40c1 100755
--- a/docker.d/init.sh
+++ b/docker.d/init.sh
@@ -22,6 +22,8 @@ if [ "$ENV" == "prod" ]; then
   test_var_set 'ED25519_PRIVKEY'
 fi

+# XXX: bug 1589703 temp hack to test the new Firefox CI cluster
+export TASKCLUSTER_ROOT_URL=https://stage.taskcluster.nonprod.cloudops.mozgcp.net
 #
 # Validate content of certain variables
 #

Note to self:

  • I've regenerated the deps via pip-compile-multi --upgrade --generate-hashes base --generate-hashes test
  • the setup.py change was needed so that tox passes without complaining about install_requires not having valid strings in it. That's because tox uses requirements/test.txt to install the requirements first. and then setup.py is executed in that environment, so the the desired dependency is already satisfied

GCP dev workers bouncerscript and beetmoverscript are now talking to the new Firefox CI instance:

Still debugging balrogscript and signingscript, as we're currently hitting some auth issues.

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #5)

GCP dev workers bouncerscript and beetmoverscript are now talking to the new Firefox CI instance:

Still debugging balrogscript and signingscript, as we're currently hitting some auth issues.

Signingscript now working too.

Balrogscript TC client seemed to have been rotated again after three days, unlike the others, hence I re-did that and passed the credentials to Jeremy. Once he updates their side, we should see it querying successful jobs.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee: nobody → mtabara
You need to log in before you can comment on or make changes to this bug.