Closed
Bug 1497309
Opened 7 years ago
Closed 7 years ago
Please upgrade generic-worker on localprovisioner/nss-macos-10-12 to version 11.0.1
Categories
(NSS :: Build, defect)
NSS
Build
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: pmoore, Assigned: pmoore)
Details
It looks like the mac workers are currently running generic-worker 10.2.3:
https://tools.taskcluster.net/groups/V-l02xoETLKBx7O7RCsZNQ/tasks/MvhqwpY4Q024AHGm8CgZ8A/runs/0/logs/public%2Flogs%2Flive_backing.log#L12
This release is a little over a year-old[1] so it would be great to upgrade this to a more recent release, such as 11.0.1.
Is there a staging environment I could test this on?
Thanks!
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(franziskuskiefer)
| Assignee | ||
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
No staging unfortunately. But traffic is low enough to do it on the existing machine (you should still have access) and nss-try. You could also use the second machine for testing if you want (it's not used right now).
Ping me if you need any help.
Flags: needinfo?(franziskuskiefer)
| Assignee | ||
Comment 3•7 years ago
|
||
Thanks Franziskus.
Funnily enough, we just had a sentry crash report come in for worker nss1-1/macosstadium on host administrators-Mac-mini-98.local, so probably an upgrade would be good.
https://sentry.prod.mozaws.net/operations/generic-worker/issues/4849688/events/24847726/json/
runtime error: invalid memory address or nil pointer dereference
/home/travis/go/src/runtime/panic.go in gopanic at line 489
/home/travis/gopath/src/github.com/taskcluster/generic-worker/sentry.go in func1 at line 36
/home/travis/gopath/src/github.com/getsentry/raven-go/client.go in CapturePanicAndWait at line 745
/home/travis/gopath/src/github.com/taskcluster/generic-worker/sentry.go in ReportCrashToSentry at line 47
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in HandleCrash at line 499
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in func1 at line 505
/home/travis/go/src/runtime/asm_amd64.s in call32 at line 514
/home/travis/go/src/runtime/panic.go in gopanic at line 489
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in func1 at line 982
/home/travis/go/src/runtime/asm_amd64.s in call32 at line 514
/home/travis/go/src/runtime/panic.go in gopanic at line 489
/home/travis/go/src/runtime/panic.go in panicmem at line 63
/home/travis/go/src/runtime/signal_unix.go in sigpanic at line 290
/home/travis/gopath/src/github.com/taskcluster/taskcluster-client-go/http.go in String at line 46
/home/travis/gopath/src/github.com/taskcluster/generic-worker/artifacts.go in uploadArtifact at line 438
/home/travis/gopath/src/github.com/taskcluster/generic-worker/artifacts.go in uploadLog at line 419
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in func3 at line 1014
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in Run at line 1126
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in FindAndRunTask at line 674
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in RunWorker at line 569
/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go in main at line 332
/home/travis/go/src/runtime/proc.go in main at line 185
This error was caused in taskcluster-client-go, which was called by this line in generic-worker:
https://github.com/taskcluster/generic-worker/blob/v10.2.3/artifacts.go#L438
438 log.Print(t.CallSummary.String())
Based on the date+time of release of generic-worker v10.2.3, we can map to the corresponding commit of taskcluster-client-go, and find the line that caused the failure:
https://github.com/taskcluster/taskcluster-client-go/blob/1b02af5dfac584c998413247dda6ef1a8e2175b5/http.go#L46
46 return fmt.Sprintf("\nCALL SUMMARY\n============\nRequest Headers:\n%#v\nRequest Body:\n%v\nResponse Headers:\n%#v\nResponse Body:\n%v\nAttempts: %v", cs.HTTPRequest.Header, cs.HTTPRequestBody, cs.HTTPResponse.Header, cs.HTTPResponseBody, cs.Attempts)
From this we can determine that the problem was that one of the following values was nil:
* cs.HTTPRequest
* cs.HTTPResponse
From inspecting the code, there are several failure cases where cs.HTTPResponse would be nil. The most likely candidate I see is here:
* https://github.com/taskcluster/taskcluster-client-go/blob/1b02af5dfac584c998413247dda6ef1a8e2175b5/http.go#L116-L120
If the http connection cannot be made after several retries, the http response could potentially be nil.
The fix is to not assume that cs.HTTPRequest and cs.HTTPResponse are non-nil.
This was done in the following commit:
https://github.com/taskcluster/taskcluster-client-go/commit/1f7c623bcb7e78c1f1237e7c5535b3f4409ff23e
In other words, this worker crash is fixed, and we can see from release timestamps, that this fix would have made it into generic-worker 10.7.8.
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → pmoore
| Assignee | ||
Comment 4•7 years ago
|
||
Currently no tasks are running, so I will perform the upgrade now...
| Assignee | ||
Comment 5•7 years ago
|
||
Upgrade performed. Now testing:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=3f29f34404baa4c7f84007901a92d91122c3e124&group_state=expanded
| Assignee | ||
Comment 6•7 years ago
|
||
And this time with `-t all -u all`:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&group_state=expanded&revision=a570f4b78a1798dc9740d0073438a38a9dd25789
| Assignee | ||
Comment 7•7 years ago
|
||
Upgraded to generic-worker 11.0.1:
> administrators-Mac-mini-98:~ administrator$ /usr/local/bin/generic-worker --version
> generic-worker 11.0.1 [ revision: https://github.com/taskcluster/generic-worker/commits/a0a5271ddad42022606ca47a6c16f84c128223ab ]
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Summary: Please upgrade generic-worker on localprovisioner/nss-macos-10-12 → Please upgrade generic-worker on localprovisioner/nss-macos-10-12 to version 11.0.1
You need to log in
before you can comment on or make changes to this bug.
Description
•