Open Bug 979683 Opened 10 years ago Updated 2 years ago

Valgrind-on-TBPL: improve coverage

Categories

(Testing :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: n.nethercote, Unassigned)

References

(Depends on 1 open bug)

Details

Valgrind test coverage is currently poor. The Valgrind test job just runs the PGO code, which is hardly anything.

We should test more stuff. This likely depends on making the Valgrind run occur on a test machine (bug 977240), instead of a build machine as it currently does.

Running entire test suites (e.g. mochi) is probably infeasible, because Valgrind is too slow. When deciding what to run, it's probably worth keeping in mind the things that Valgrind can do that ASAN cannot:

- Valgrind can detect uses of undefined values (e.g. due to failures to initialize values).

- Valgrind can check dynamically-generated code.

The second of these suggests that a focus on tests that execute the JITs a lot would be worthwhile.
But maybe the Cpp/Jit tests would be a good start? XPCShell is another idea. Except none of those start the browser, so maybe we'd want reftests and/or crashtests too.

I'm not entirely convinced that we couldn't run the entire test suite through Valgrind on some kind of periodic basis (with proper timeouts set, of course!), but we'd certainly need to have some level of understanding as to what the sheriffing expectations would be for such a job given the frequency of which it would be running. We'd probably want to use a different chunking scheme than other platforms too.
Actually, wouldn't it be worth running the entire set of test suites with valgrind, not on all pushes because it would require too much hardware, but once a day?
Well, currently the Valgrind job runs on every push and failure is grounds for backing out. This is an excellent thing. If slower, occasional tests are run *as well*, that'd be great, but I won't accept the removal of on-every-push testing.
I don't see any reason why the limited per-push job and periodic "full" run are mutually exclusive. My main concern is how long the "full" run would take and what that would mean from a sheriffing standpoint.
> If slower, occasional tests are run *as well*, that'd be great

Though then we get into the usual problem: if it doesn't run on every push, when it breaks it's not clear which patch is at fault, so there isn't a clear backout strategy, so it's red and nobody knows what to do, and the sheriffs get sick of that and hide it, and then nobody ever sees it, and so it isn't very useful.
> Though then we get into the usual problem: if it doesn't run on every push,
> when it breaks it's not clear which patch is at fault, so there isn't a
> clear backout strategy, so it's red and nobody knows what to do, and the
> sheriffs get sick of that and hide it, and then nobody ever sees it, and so
> it isn't very useful.

Definitely agreed!

We might want to run the jstests/jit-tests, along with xpcshell ones, the former because we've found js bugs w/ Valgrind for some time now, the latter because it's a lower overhead to start up than the full browser.

Or another idea:

Create --enable-valgrind js shells and run jit-tests on them. Shortest time to start up.
From an entirely selfish point of view, I would love to invoke about:memory under Valgrind, so that all the data structure traversal done by the memory reporters gets checked.
I don't think running the tests with the expectation that sheriffs
will treat failures as they do other test failures is reasonable
unless you are willing to devote a *significant* number of machines to
the effort. It might be possible to run a full suite once per day,
with a semi-reasonable number of machines if they are run in
parallel.

I do think that running valgrind on as much of our tests as possible
is useful even if we can't get full test coverage on a single build in
the same time frame as we do with our other tests. If we can find
issues that are missed by our asan enabled tests or our fuzzers before
they make it into a release build, it is useful even if they do not
provide immediate feedback on which patch caused the problem.

I have been running several test suites under valgrind
in a round robin fashion in the bughunter system. As each chunk
is completed, a new firefox build is created if the current build is
over a day old and the next test/chunk is tested. This does not give
the type of coverage you want, but over time does provide a set
of valgrind errors that are not caught by the other approaches.

Previously I ran with 20 chunks, a per test
timeout of 1 day and a chunk timeout of 5 days using debug builds with
jemalloc disabled.  Some chunks took only a couple of hours while
others took several days.

I recently switched to using --tool=memcheck
--vex-iropt-register-updates=allregs-at-each-insn
--smc-check=all-non-file --soname-synonyms=somalloc=NONE using a
recent trunk valgrind build with jemalloc enabled debug firefox builds. I've
increased the chunking to 40, kept the per test timeout of 1 day and
increased the chunk timeout to 7 days.

Since so much has changed in how the builds are generated and the
tests run, I just cleared the stored results and started over fresh so
I don't have results at the moment but should have some in a few
days. If you are interested, it is available at
<http://sisyphus.bughunter.ateam.phx1.mozilla.com/bughunter>. You will
need a MozillaVPN account with access to the machine and must be a
member of the security group. Ping me if you are interested.

At the very least, you can use the system to gauge what you might need
to extend the current valgrind testing beyond its current scope. Otherwise,
I will continue to use it to file valgrind issues as my schedule permits.
(In reply to Bob Clary [:bc:] from comment #8)
> I recently switched to using --tool=memcheck
> --vex-iropt-register-updates=allregs-at-each-insn
> --smc-check=all-non-file --soname-synonyms=somalloc=NONE using a
> recent trunk valgrind build with jemalloc enabled debug firefox builds. I've
> increased the chunking to 40, kept the per test timeout of 1 day and
> increased the chunk timeout to 7 days.

I am really surprised at the large timescales you mention.  I have often
completed runs of Mochitest on on core comfortably in less than 16 cpu
hours, sometimes significantly less.

Are your builds with -O (that is, -O1) or above?  Testing builds at -O0
is very slow because gcc does minimal register allocation and so Memcheck
winds up having to pointlessly check huge numbers of stack accesses.
-O1 and above gets you reasonable procedure-level register allocation and
significantly increases (doubles?) the performance of Memcheck.
> completed runs of Mochitest on on core comfortably in less than 16 cpu
                                 ^one
To be more specific: if Memcheck is causing a per-thread slowdown vs
native of more than about 25:1, for long runs (hence, amortising out
the JITting costs) of -O or -O2 compiled code, I would like to
investigate.

Bob, can you make any ballpark estimate of the ratio native-cpu-time
vs memcheck-cpu-time for one of your long runs?  To see if it's in 
the 25:1 area?  Sounds like it's much higher.
These are debug builds with -g -O on a VM with 8G and 2 cores. I'll try to estimate the slow down factor when I have a fuller set of builds. iirc reftest at least was one where some of the chunks took a very long time.
Is this something we still want to do? :)
> Is this something we still want to do? :)

Yes. The hard part is finding someone with time, motivation and the appropriate skills :)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.