Migrate valgrind task from AWS -> GCP
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox110 fixed)
Tracking | Status | |
---|---|---|
firefox110 | --- | fixed |
People
(Reporter: ahal, Assigned: jcristau)
References
Details
Attachments
(1 file)
This bug will track migrating the valgrind task to GCP.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 1•1 year ago
|
||
When I run this on try, the task fails:
https://firefox-ci-tc.services.mozilla.com/tasks/SxEpRKsKR6mPgUqz9b2tJA
Log snippet:
[task 2022-12-02T17:06:41.092Z] 17:06:41 INFO - --29980-- memcheck GC: 31966 nodes, 31853 survivors (99.6%)
[task 2022-12-02T17:06:41.092Z] 17:06:41 INFO - --29980-- memcheck GC: 45206 new table size (stepup)
[task 2022-12-02T17:06:42.243Z] 17:06:42 INFO - --29980-- WARNING: Serious error when reading debug info
[task 2022-12-02T17:06:42.244Z] 17:06:42 INFO - --29980-- When reading debug info from /memfd:mozilla-ipc (deleted):
[task 2022-12-02T17:06:42.244Z] 17:06:42 INFO - --29980-- failed to stat64/stat this file
[task 2022-12-02T17:06:47.545Z] 17:06:47 INFO - PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "valgrind", "value": 41.65303177700025, "lowerIsBetter": true, "shouldAlert": false, "subtests": [], "extraOptions": ["taskcluster-projects/887720501152/machineTypes/n2-custom-16-73728"]}]}
[task 2022-12-02T17:06:47.545Z] 17:06:47 INFO - TEST-PASS | valgrind-test | valgrind found no errors
[task 2022-12-02T17:06:47.545Z] 17:06:47 INFO - TEST-UNEXPECTED-FAIL | valgrind-test | non-zero exit code from Valgrind: -11
[task 2022-12-02T17:06:48.029Z] 17:06:48 ERROR - Return code: 2
Though I'm not sure what this means, or what could have changed in the host image to cause it. Mike, your name seems to come up the most in blame for this task, do you know what's going on or who I can ping to help me debug?
Assignee | ||
Comment 2•1 year ago
|
||
Do we know the kernel version on the aws workers? It's possible it's too old to support memfd and we use a different code path there?
Assignee | ||
Comment 3•1 year ago
|
||
Nevermind, ignoring the stat failure on memfd doesn't actually stop valgrind from crashing.
Assignee | ||
Comment 4•1 year ago
|
||
I've reproduced this on an interactive task, unfortunately even after attaching gdb to the valgrind process I don't get useful information:
Attaching to process 19563
Reading symbols from /usr/libexec/valgrind/memcheck-amd64-linux...
Reading symbols from /usr/lib/debug/.build-id/9b/1fa60c727acfa38c726ec45680af7bf2edd433.debug...
0x00000000580c2b17 in get_slowcase (img=0x100c75fcf0, off=<optimized out>) at m_debuginfo/image.c:810
810 m_debuginfo/image.c: No such file or directory.
(gdb) c
Continuing.
[Detaching after fork from child process 19589]
Program received signal SIGSEGV, Segmentation fault.
0x000000100ba1ba57 in ?? ()
(gdb) bt
#0 0x000000100ba1ba57 in ?? ()
#1 0x0000001008fadf30 in ?? ()
#2 0x0000001008fadf18 in ?? ()
#3 0x0000001008fadf30 in ?? ()
#4 0x0000000000001c10 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x0000001009819db0 in ?? ()
#7 0x0000000000000000 in ?? ()
Mike or Julian any advice on how to figure this out?
Comment 5•1 year ago
|
||
I suggest trying with valgrind-3.20.0. 3.19 has (severe) problems reading
Dwarf5 debuginfo, and what seems to have happened here is a crash
in the debuginfo reader. Dwarf5 support is much improved in 3.20.
Assignee | ||
Comment 6•1 year ago
|
||
I tried valgrind-3.20.0 per Julian's advice, unfortunately that didn't improve things.
Then I tried running the task on a different worker type (gecko-t/t-linux-kvm-gcp
instead of gecko-1/b-linux-gcp
), and that appears to work.
Some differences between those pools:
- different VM image; I can't tell what the actual changes are
- machine-type n2-standard-16 (t-linux-kvm-gcp) vs n2-custom-16-73728 (b-linux-gcp)
- kvm and nested virtualization enabled in t-linux-kvm-gcp
- different disk configuration, hopefully irrelevant
Assignee | ||
Comment 7•1 year ago
|
||
For some reason when running on b-linux-gcp workers, valgrind crashes, but it
runs OK on t-linux-kvm-gcp, so use that.
Pushed by jcristau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5a65f0968b51 [ci] Migrate 'valgrind' tasks from AWS -> GCP, r=MasterWayZ,ahal,glandium
Comment 9•1 year ago
|
||
bugherder |
Description
•