Open Bug 1595642 Opened 5 years ago Updated 3 years ago

Generic-worker panic 10-64-hw workers gw version 16.2.0

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: markco, Unassigned)

References

Details

Generic-worker was exit with 69 contentiously on up to 15% of workers (see https://bugzilla.mozilla.org/show_bug.cgi?id=1595261#c2) .

It seems to be caused by leading 0 bytes in 1 of 3 files: tasks-resolved-count.txt, file-caches.json, or directory-caches.json.

To work around the gw wrapper script will remove the tasks-resolved-count.txt file after each exit,a nd it will remove the cahce json files if the gw exit is 69. Below are an explanation to how the files were identified.

Nov 09 15:58:59 T-W1064-MS-123.mdc1.mozilla.com generic-worker UTC Could not load file directory-caches.json into object *main.CacheMap - is it json?#15
Nov 09 15:58:59 T-W1064-MS-123.mdc1.mozilla.com generic-worker UTC goroutine 1 [running]: runtime/debug.Stack(0x0, 0xc042098980, 0x0) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/debug/stack.go:24 +0xae main.HandleCrash(0x921ac0, 0xc0420f6980) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:351 +0x2d main.RunWorker.func1(0xc04246de30) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:370 +0x59 panic(0x921ac0, 0xc0420f6980) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/panic.go:502 +0x237 main.(*CacheMap).LoadFromFile(0xd635f0, 0x9db409, 0x15, 0xc0420784d0, 0xc) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:146 +0x4a5 main.(*MountsFeature).Initialise(0xd81bc8, 0x1f, 0xc04246dae8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:155 +0xaa main.initialiseFeatures(0xc0420c8000, 0xc04237a440) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:87 +0x471 main.RunWorker(0x0) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:407 +0x36c main.main() #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:157 +0x7ba#015

Nov 09 16:12:17 T-W1064-MS-201.mdc1.mozilla.com-1 generic-worker UTC goroutine 1 [running]: runtime/debug.Stack(0x0, 0xc042098948, 0x0) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/debug/stack.go:24 +0xae main.HandleCrash(0x921ac0, 0xc0423f5f60) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:351 +0x2d main.RunWorker.func1(0xc04246be30) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:370 +0x59 panic(0x921ac0, 0xc0423f5f60) #11/home/travis/.gimme/versions/go1.10.8.src/src/runtime/panic.go:502 +0x237 main.(*CacheMap).LoadFromFile(0xd635f8, 0x9d80d4, 0x10, 0xc042078a20, 0x8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:146 +0x4a5 main.(*MountsFeature).Initialise(0xd81bc8, 0x1f, 0xc04246bae8) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/mounts.go:154 +0x66 main.initialiseFeatures(0xc0420c8000, 0xc04237a340) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:87 +0x471 main.RunWorker(0x0) #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:407 +0x36c main.main() #11/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:157 +0x7ba#015

tasks-resolved-count.txt:
From pmoore in slack:
and it looks like the file tasks-resolved-count.txt has two leading zero bytes
328 func ReadTasksResolvedFile() uint {
329 b, err := ioutil.ReadFile("tasks-resolved-count.txt")
330 if err != nil {
331 return 0
332 }
333 i, err := strconv.Atoi(string(b))
334 if err != nil {
335 panic(err)
336 }
337 return uint(i)
338 }
the failure is at line 335

we see this in papertrail: main.ReadTasksResolvedFile(0x8ffac0) #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:335
it reads the content of file tasks-resolved-count.txt and tries to convert it to an integer
strconv.Atoi: parsing "\x00\x00": invalid syntax
see https://golang.org/pkg/strconv/#hdr-Numeric_Conversions
golang.orggolang.org
strconv - The Go Programming Language
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
The most common numeric conversions are Atoi (string to int) and Itoa (int to string).

Component: General → Workers

I don't know what touched this file, but I have a feature in place in bug 1596044 to prevent that unprivileged processes can modify its contents.

This may help, if whatever is touching the file is not a privileged process. If a privileged process is modifying this file, it may not help.

See Also: → 1596044

Note, bug 1596044 won't solve the root cause of this issue, it is just a measure to protect the generic-worker state files, to reduce the chance that something could interfere with them. Since several processes run on these machines with elevated privileges, it can't guarantee protection.

Mark, it might be worth trying with generic-worker 16.5.6 to see if this helps.

Flags: needinfo?(mcornmesser)

I open up Bug 1606337 for the upgrade. It may be some time before I can get to it.

Depends on: 1606337
Flags: needinfo?(mcornmesser)
QA Whiteboard: [lang=go]

Not actively working on this right now.

Assignee: pmoore → nobody
You need to log in before you can comment on or make changes to this bug.