Generic-worker panic 10-64-hw workers gw version 16.2.0
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
People
(Reporter: markco, Unassigned)
References
Details
Generic-worker was exit with 69 contentiously on up to 15% of workers (see https://bugzilla.mozilla.org/show_bug.cgi?id=1595261#c2) .
It seems to be caused by leading 0 bytes in 1 of 3 files: tasks-resolved-count.txt, file-caches.json, or directory-caches.json.
To work around the gw wrapper script will remove the tasks-resolved-count.txt file after each exit,a nd it will remove the cahce json files if the gw exit is 69. Below are an explanation to how the files were identified.
Nov 09 15:58:59 T-W1064-MS-123.mdc1.mozilla.com generic-worker UTC Could not load file directory-caches.json into object *main.CacheMap - is it json?#15
Nov 09 15:58:59 T-W1064-MS-123.mdc1.mozilla.com generic-worker UTC goroutine 1 [running]: runtime/debug.Stack(0x0, 0xc042098980, 0x0) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/debug/stack.go:24 +0xae main.HandleCrash(0x921ac0, 0xc0420f6980) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:351 +0x2d main.RunWorker.func1(0xc04246de30) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:370 +0x59 panic(0x921ac0, 0xc0420f6980) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/panic.go:502 +0x237 main.(*CacheMap).LoadFromFile(0xd635f0, 0x9db409, 0x15, 0xc0420784d0, 0xc) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:146 +0x4a5 main.(*MountsFeature).Initialise(0xd81bc8, 0x1f, 0xc04246dae8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:155 +0xaa main.initialiseFeatures(0xc0420c8000, 0xc04237a440) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:87 +0x471 main.RunWorker(0x0) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:407 +0x36c main.main() #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:157 +0x7ba#015
Nov 09 16:12:17 T-W1064-MS-201.mdc1.mozilla.com-1 generic-worker UTC goroutine 1 [running]: runtime/debug.Stack(0x0, 0xc042098948, 0x0) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/debug/stack.go:24 +0xae main.HandleCrash(0x921ac0, 0xc0423f5f60) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:351 +0x2d main.RunWorker.func1(0xc04246be30) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:370 +0x59 panic(0x921ac0, 0xc0423f5f60) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/panic.go:502 +0x237 main.(*CacheMap).LoadFromFile(0xd635f8, 0x9d80d4, 0x10, 0xc042078a20, 0x8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:146 +0x4a5 main.(*MountsFeature).Initialise(0xd81bc8, 0x1f, 0xc04246bae8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:154 +0x66 main.initialiseFeatures(0xc0420c8000, 0xc04237a340) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:87 +0x471 main.RunWorker(0x0) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:407 +0x36c main.main() #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:157 +0x7ba#015
tasks-resolved-count.txt:
From pmoore in slack:
and it looks like the file tasks-resolved-count.txt has two leading zero bytes
328 func ReadTasksResolvedFile() uint {
329 b, err := ioutil.ReadFile("tasks-resolved-count.txt")
330 if err != nil {
331 return 0
332 }
333 i, err := strconv.Atoi(string(b))
334 if err != nil {
335 panic(err)
336 }
337 return uint(i)
338 }
the failure is at line 335
we see this in papertrail: main.ReadTasksResolvedFile(0x8ffac0) #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:335
it reads the content of file tasks-resolved-count.txt and tries to convert it to an integer
strconv.Atoi: parsing "\x00\x00": invalid syntax
see https://golang.org/pkg/strconv/#hdr-Numeric_Conversions
golang.orggolang.org
strconv - The Go Programming Language
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
The most common numeric conversions are Atoi (string to int) and Itoa (int to string).
Updated•5 years ago
|
Comment 1•5 years ago
|
||
I don't know what touched this file, but I have a feature in place in bug 1596044 to prevent that unprivileged processes can modify its contents.
This may help, if whatever is touching the file is not a privileged process. If a privileged process is modifying this file, it may not help.
Comment 2•5 years ago
|
||
Note, bug 1596044 won't solve the root cause of this issue, it is just a measure to protect the generic-worker state files, to reduce the chance that something could interfere with them. Since several processes run on these machines with elevated privileges, it can't guarantee protection.
Comment 3•5 years ago
|
||
Mark, it might be worth trying with generic-worker 16.5.6 to see if this helps.
Reporter | ||
Comment 4•5 years ago
|
||
I open up Bug 1606337 for the upgrade. It may be some time before I can get to it.
Updated•4 years ago
|
Description
•