Closed Bug 1716392 Opened 4 years ago Closed 4 years ago

Automate biweekly runs of passwordmgr-related-realms-updater

Categories

(Cloud Services :: General, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brian, Assigned: cvalaas)

References

Details

Attachments

(1 file)

The team behind the https://github.com/mozilla/passwordmgr-remote-settings-updater tool needs it to run once biweekly.

There is a docker image for it available at https://hub.docker.com/repository/docker/mozilla/passwordmgr-related-realms-updater

This could be implemented similarly to the existing https://github.com/mozilla-services/cloudops-infra/tree/master/projects/ccadb2onecrl job.

Previous work on this was tracked in https://jira.mozilla.com/browse/SE-1735

Sven is handing this off to Chris, who should be able to get help from teammates on Services SRE for setting this up. In particular I'm cc'ing Wei, since I know he can help with provisioning credentials and setting permissions Remote Settings.

I already provisioned credentials for this, but the permissions need to be updated, since the scope has been expanded. The username is related-realms-publisher, and the password is in hiera-sops/app/kinto.{stage,prod}.yaml. I've filed https://github.com/mozilla-services/remote-settings-permissions/pull/235 to expand the permissions of the bot user to both the collections the script is supposed to update. (Originally we planned to use that bot user only for one of the collections, hence the name.)

The Jenkis job should be pretty similar to the ccadb2onecrl one that Brian linked above. One difference is that we want to run the Docker image twice in this case, once for stage and once for prod, so there will be an additional stage in the Jenkinsfile.

One more thing – the ticket description says "biweekly runs". In my opinion, we could make that daily, or maybe Monday to Friday. The script checks whether there were any changes, and only updates the data if acutally needed. I don't see any advantage of running this only every two weeks.

Tim, I think you requested to run this every fortnight. Do you have any concerns about running this daily instead?

Sven, the only concern I have about running this job daily is that by running the job daily, this timeline forces myself or Dimi to review the collection the day the review request comes in. I didn't test the script's behavior if there is an outstanding review request on Remote Settings...so I'm not sure if Remote Settings will create a duplicate request or update the currently requested review.

Other than that, I don't have any concerns about running this job daily. Let me know if you need any other information from me!

Hi folkx!

What are the values for FX_REMOTE_SETTINGS_WRITER_SERVER[1] for stage and prod? And are they secrets?

thanks!

[1] https://github.com/mozilla/passwordmgr-remote-settings-updater/blob/main/README.md

Flags: needinfo?(tgiles)

Hey :cvalaas, FX_REMOTE_SETTINGS_WRITER_SERVER should be "https://settings-writer.prod.mozaws.net/v1" for prod, and "https://settings-writer.stage.mozaws.net/v1" for stage. These are not secrets as they are seen in the Remote Settings documentation.

Let me know if you need more information from me, thanks!

Flags: needinfo?(tgiles)
Blocks: 1686071

Hi :cvalaas, do you think this work will be done by the end of the week? I didn't realize that QA was starting their testing next week and I need to have this data in Remote Settings so that they can test Bug 1686071. Just trying to keep my QA contact in the loop!

Thanks for the help!

Flags: needinfo?(cvalaas)

:tgiles possibly!

Unfortunately, I don't know what I don't know (and this is my first time through this process), but seems doable!

Sven mentioned you in the PR (https://github.com/mozilla-services/cloudops-infra/pull/3214) I made for this:

It looks like the Docker image still has the name passwordmgr-related-realms-updater. It would probably make sense to coordinate with @TGiles to get that renamed to be in line with the project name here.

Are you able to rename the docker image to passwordmgr-remote-settings-updater ?

Flags: needinfo?(cvalaas) → needinfo?(tgiles)

:cvallas, thanks for the transparency, I appreciate it! :)

I'm not able to see the actual PR unfortunately, so thank you for quoting the relevant information! I don't know why I don't have access to that org, but I probably don't need access in the long run.

I'm looking into renaming the docker image and will keep you posted. I'm surprised the circleCI configuration doesn't handle this naming, I know I didn't have to specify a name before.

Keeping NI open so I don't forget to follow up

I can't see this project (https://github.com/mozilla/passwordmgr-remote-settings-updater) in Circle-CI, so maybe it's controlled in that UI somewhere?

Looks like it is set in the Circle CI UI (the DOCKERHUB_REPO variable): https://app.circleci.com/settings/project/github/mozilla/passwordmgr-remote-settings-updater/environment-variables

I can try changing that and see what happens, although I don't know if any other steps are needed (like on the Dockerhub side).

Wish I was more help here but I'm not too familiar with our docker and circleCI setup, I leaned on Sven's help getting that part of the repository set up. Hopefully changing the DOCKERHUB_REPO variable and rebuilding will be all that needs to happen, fingers crossed at least hah

Flags: needinfo?(tgiles)

like on the Dockerhub side

Yes, you'd have to create a new repo with the new name on the dockerhub side and grant the appropriate permissions. So good to learn how to do that, but possibly a bad rabbit hole to go down right now if the initial run of this is time-sensitive.

Yeah, the initial run is relatively time-sensitive. I need to make sure I have the data that this job generates by end of day Friday, so I can review and merge it into Remote Settings so QA isn't crunched when testing this next week.

Please let me know if there's anything else I can do to help out!

I've set up the Jenkins job and done a few test runs. After solving some issues of my own making, I've hit this error:

[...]
2021-06-22 22:06:37,216 INFO Running: docker pull mozilla/passwordmgr-related-realms-updater:v0.0.1
2021-06-22 22:06:43,092 INFO Running: docker create --name 54f81ab5-59b6-4185-8eac-b236578bb773 mozilla/passwordmgr-related-realms-updater:v0.0.1
2021-06-22 22:06:43,915 INFO Running: d o c k e r c p 5 4 f 8 1 a b 5 - 5 9 b 6 - 4 1 8 5 - 8 e a c - b 2 3 6 5 7 8 b b 7 7 3 : / a p p / v e r s i o n . j s o n - | t a r x O
2021-06-22 22:06:43,968 ERROR Error: No such container:path: 54f81ab5-59b6-4185-8eac-b236578bb773:/app/version.json
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

2021-06-22 22:06:43,969 INFO Running: docker rm 54f81ab5-59b6-4185-8eac-b236578bb773
[...]

Obviously it's looking for a version.json file which doesn't exist in the image.
I see that the Circle CI config (https://github.com/mozilla/passwordmgr-remote-settings-updater/blob/507a47e889fe8dd4a356cda3e08aec15774b65ce/.circleci/config.yml#L23) is supposed to create that file, so I'm guessing that maybe the Dockerfile (https://github.com/mozilla/passwordmgr-remote-settings-updater/blob/main/Dockerfile) needs a COPY ./version.json /app/version.json line in it?

Flags: needinfo?(sven)

I think you are right that the file needs to be explicitly copied, and we should also provide a placeholder version.json file inside the repo so the image can be built locally. However, the error message looks like something is trying to extract version.json as a tar archive. I don't understand why this is happening, and it will fail even when the file exists.

Flags: needinfo?(sven)

I've gone ahead and created a quick PR for adding the COPY step and the version.json. Didn't want to push to main without making sure this is what needs to happen. Docker on my Windows machine is acting up so I can't verify the changes right now, going to get Docker set up on my Mac and see if I can verify some of the changes at least.

Just looked at the docker cp help. It extracts the files as a tar archive, so that explains the piping thru tar.
I think we can push the PR to main, do another release, and try Jenkins again.

Jenkins was able to get the image running. The command (node /app/update-script.js) ran for about 10 minutes before exiting (this was running against stage). No output was seen.

Part of the Jenkins logs:

[...]
$ docker top 27960c46c463e76e23276b85f244489f45623d47905b6fb9677544ac69e37e6c -eo pid,comm
ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements).
Alternatively you can force image entrypoint to be disabled by adding option --entrypoint=''.
[...]
[Pipeline] sh

  • node /app/update-script.js

wrapper script does not seem to be touching the log file in /home/jenkins/slave/workspace/pipelines/utils/passwordmgr-remote-settings-updater@tmp/durable-37614c23
(JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
$ docker stop --time=1 27960c46c463e76e23276b85f244489f45623d47905b6fb9677544ac69e37e6c
[...]

Should there be output? How long should the script take?

Flags: needinfo?(tgiles)

Strange. Thanks for the update. Yeah there should be console.log output during the process...and it shouldn't take 10 minutes to run, maybe a minute or so. I'm debugging the script now and will keep you posted!

Flags: needinfo?(tgiles)

Found out Jenkins doesn't like ENTRYPOINTs. So after some quick changes we got this working.
It's currently set to run on the 1st and the 15th of each month at 17:00 (UTC, I'd presume).

I think this can be closed?

Sounds good to me. Thanks for all the help Chris!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Hey :cvalaas, are you able to see when this job runs? I haven't seen any updates from the "passwordmgr-related-realms-updater" account since we initially resolved this. I'm not sure if there's an issue in the update script or an environment issue or what, but I think a good first step is being able to determine if the Jenkins job is running as expected.

Flags: needinfo?(cvalaas)

Hello, I'm out for a month or so(?) probably, so I'm CC'ing :thealy to get this assigned to someone else.

Flags: needinfo?(cvalaas) → needinfo?(thealy)

I'll probably be taking ownership of the job soon, so I took a quick look. The job is running every two weeks, but it's failing every time. It should notify cvalaas in Slack when it fails, but it looks like the Jenkins/Slack integration is broken as well.

The reason for the failure appears to be the branch configuration in Jenkins. The job was configured to use the branch /refs/heads/master of cloudops-infra. I changed the branch name to refs/heads/master, without the leading slash, and now it seems to be working fine.

I'll try to get the Slack notifications fixed as well.

Tom, I think this is fixed for the time being. I'll be in touch about taking ownership of the Jenkins job.

Flags: needinfo?(thealy)

The job seems to be working fine from my end, just to confirm. I received the review requests from the automated "passwordmgr-related-realms-updater" account, guess we'll see in two weeks if the job is green or not. Thanks all for the help!

Attached image Jenkins runs

For what it's worth, here is a history of the Jenkins runs of the job. Runs #8 and #9 were successful, but they were manually triggered and did not run from the master branch of the cloudops-infra repo. All scheduled runs after that failed because of the branch misconfiguration. Today, the manually triggered runs succeeded (I accidentally triggered the job twice).

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: