Checkout is the #1 reason for taskcluster failures

RESOLVED FIXED in 3.27

Status

NSS
Test
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: mt, Assigned: mt)

Tracking

(Depends on: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Assignee)

Description

2 years ago
Created attachment 8780357 [details] [diff] [review]
retry_checkout-1.patch

Maybe this would be better if the checkout was retried first. Maybe then we can reduce the number of automatic retries.

Tim, I imagine that we would need to rebuild the docker images with this, so no real rush.
Attachment #8780357 - Flags: review?(ttaubert)
(In reply to Martin Thomson [:mt:] from comment #0)
> Maybe this would be better if the checkout was retried first. Maybe then we
> can reduce the number of automatic retries.

That's one approach, we probably should do that anyway.

One additional thing we should do is to fix bug 1277203, on m-c most of the task runners keep a local cache of the repository and share that with the docker container. That would reduce checkout times and likely reduce the number of failures.

> Tim, I imagine that we would need to rebuild the docker images with this, so
> no real rush.

Unfortunately, yeah. I'd love to convert our CI to build images automatically (like m-c does) so everyone can update the docker image. And we could test changes to docker images on try too. I'll probably morph bug 1275501 to deal with this.
Attachment #8780357 - Flags: review?(ttaubert) → review+
Franziskus can help with rebuilding the ARM docker image, I don't have access to the RPis from home unfortunately.
Depends on: 1277203
Pushed a new docker image v0.0.22 with the checkout.sh changes:

https://hub.docker.com/r/ttaubert/nss-ci/tags/

You also need to update the docker image version here:

http://searchfox.org/nss/source/automation/taskcluster/decision_task.yml#60
http://searchfox.org/nss/source/automation/taskcluster/decision_task.yml#67

With a try run I think this should be good to go :)
(Assignee)

Comment 4

a year ago
Created attachment 8780950 [details] [diff] [review]
retry_checkout-1.patch

Carry r+

Try: https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=d79fe0683124
Attachment #8780357 - Attachment is obsolete: true
Attachment #8780950 - Flags: review+
(Assignee)

Comment 5

a year ago
https://hg.mozilla.org/projects/nss/rev/fb22fa026a30
Assignee: nobody → martin.thomson
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → 3.27
(Assignee)

Comment 6

a year ago
I'm seeing this problem happen again on Windows check-ins.  Those don't use the same setup, do they?  Is there anything we can do to fix those?
Flags: needinfo?(ttaubert)
Cloning NSPR should probably be done in the same way (scripts/build.sh and windows/build.sh). On Windows the NSS checkout is currently done in _build_base.yml. This should probably be moved to a a script for retries.
Status: RESOLVED → REOPENED
Flags: needinfo?(ttaubert)
Resolution: FIXED → ---
(Assignee)

Comment 8

a year ago
Created attachment 8781348 [details] [diff] [review]
bug1294604-1.patch

This should help with NSPR checkout.

I'm not sure how the windows stuff works: they don't appear to be docker images, so when does _build_base.yml get read?  How would I ensure that the script is present so that it can be run?  Are the commands run with cmd.exe?
Attachment #8781348 - Flags: review?(franziskuskiefer)
Comment on attachment 8781348 [details] [diff] [review]
bug1294604-1.patch

Review of attachment 8781348 [details] [diff] [review]:
-----------------------------------------------------------------

Windows doesn't use docker but a generic worker on normal windows machines with mozilla-build (afaik). I'm not sure how to deal with that. But we could try to check out NSS on Windows in the same way. We should be in the mozilla-build env when we execute the |command| from _build_base.yml. (We don't have the repo there yet so have to rewrite the loop.)

::: automation/taskcluster/scripts/build.sh
@@ +12,5 @@
>      exec su worker $0
>  fi
>  
>  # Clone NSPR if needed.
> +hg_clone https://hg.mozilla.org/projects/nspr nspr

we should do the same on Windows [1]

[1] http://searchfox.org/nss/rev/0557da6ac1ddfa0a62bf9a1489e484b0c80ba9b8/automation/taskcluster/windows/build.sh#9
Attachment #8781348 - Flags: review?(franziskuskiefer) → review+
(Assignee)

Comment 10

a year ago
Created attachment 8781854 [details] [diff] [review]
bug1294604-1.patch

Let's try this out.

Try: https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=2a920f1ae9f4ac4fe6bf80077c44fa2077560ee6

First few goes at this ran afoul of strange windows cmd.exe quoting rules.  The commands are run with cmd.exe, not powershell and not bash, so single quotes are out.
Attachment #8781348 - Attachment is obsolete: true
Attachment #8781854 - Flags: review?(franziskuskiefer)
Comment on attachment 8781854 [details] [diff] [review]
bug1294604-1.patch

Review of attachment 8781854 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, try run is looking good as well.
Attachment #8781854 - Flags: review?(franziskuskiefer) → review+
(Assignee)

Comment 12

a year ago
https://hg.mozilla.org/projects/nss/rev/cc982d5a9904
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.