Closed Bug 1294604 Opened 8 years ago Closed 8 years ago

Checkout is the #1 reason for taskcluster failures

Categories

(NSS :: Test, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mt, Assigned: mt)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Attached patch retry_checkout-1.patch (obsolete) — Splinter Review
Maybe this would be better if the checkout was retried first. Maybe then we can reduce the number of automatic retries.

Tim, I imagine that we would need to rebuild the docker images with this, so no real rush.
Attachment #8780357 - Flags: review?(ttaubert)
(In reply to Martin Thomson [:mt:] from comment #0)
> Maybe this would be better if the checkout was retried first. Maybe then we
> can reduce the number of automatic retries.

That's one approach, we probably should do that anyway.

One additional thing we should do is to fix bug 1277203, on m-c most of the task runners keep a local cache of the repository and share that with the docker container. That would reduce checkout times and likely reduce the number of failures.

> Tim, I imagine that we would need to rebuild the docker images with this, so
> no real rush.

Unfortunately, yeah. I'd love to convert our CI to build images automatically (like m-c does) so everyone can update the docker image. And we could test changes to docker images on try too. I'll probably morph bug 1275501 to deal with this.
Attachment #8780357 - Flags: review?(ttaubert) → review+
Franziskus can help with rebuilding the ARM docker image, I don't have access to the RPis from home unfortunately.
Pushed a new docker image v0.0.22 with the checkout.sh changes:

https://hub.docker.com/r/ttaubert/nss-ci/tags/

You also need to update the docker image version here:

http://searchfox.org/nss/source/automation/taskcluster/decision_task.yml#60
http://searchfox.org/nss/source/automation/taskcluster/decision_task.yml#67

With a try run I think this should be good to go :)
https://hg.mozilla.org/projects/nss/rev/fb22fa026a30
Assignee: nobody → martin.thomson
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.27
I'm seeing this problem happen again on Windows check-ins.  Those don't use the same setup, do they?  Is there anything we can do to fix those?
Flags: needinfo?(ttaubert)
Cloning NSPR should probably be done in the same way (scripts/build.sh and windows/build.sh). On Windows the NSS checkout is currently done in _build_base.yml. This should probably be moved to a a script for retries.
Status: RESOLVED → REOPENED
Flags: needinfo?(ttaubert)
Resolution: FIXED → ---
Attached patch bug1294604-1.patch (obsolete) — Splinter Review
This should help with NSPR checkout.

I'm not sure how the windows stuff works: they don't appear to be docker images, so when does _build_base.yml get read?  How would I ensure that the script is present so that it can be run?  Are the commands run with cmd.exe?
Attachment #8781348 - Flags: review?(franziskuskiefer)
Comment on attachment 8781348 [details] [diff] [review]
bug1294604-1.patch

Review of attachment 8781348 [details] [diff] [review]:
-----------------------------------------------------------------

Windows doesn't use docker but a generic worker on normal windows machines with mozilla-build (afaik). I'm not sure how to deal with that. But we could try to check out NSS on Windows in the same way. We should be in the mozilla-build env when we execute the |command| from _build_base.yml. (We don't have the repo there yet so have to rewrite the loop.)

::: automation/taskcluster/scripts/build.sh
@@ +12,5 @@
>      exec su worker $0
>  fi
>  
>  # Clone NSPR if needed.
> +hg_clone https://hg.mozilla.org/projects/nspr nspr

we should do the same on Windows [1]

[1] http://searchfox.org/nss/rev/0557da6ac1ddfa0a62bf9a1489e484b0c80ba9b8/automation/taskcluster/windows/build.sh#9
Attachment #8781348 - Flags: review?(franziskuskiefer) → review+
Let's try this out.

Try: https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=2a920f1ae9f4ac4fe6bf80077c44fa2077560ee6

First few goes at this ran afoul of strange windows cmd.exe quoting rules.  The commands are run with cmd.exe, not powershell and not bash, so single quotes are out.
Attachment #8781348 - Attachment is obsolete: true
Attachment #8781854 - Flags: review?(franziskuskiefer)
Comment on attachment 8781854 [details] [diff] [review]
bug1294604-1.patch

Review of attachment 8781854 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, try run is looking good as well.
Attachment #8781854 - Flags: review?(franziskuskiefer) → review+
https://hg.mozilla.org/projects/nss/rev/cc982d5a9904
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: