Closed Bug 493791 Opened 15 years ago Closed 14 years ago

Periodically run tests under valgrind

Categories

(Release Engineering :: General, defect, P4)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

Details

(Whiteboard: [automation][unittest][oldbug])

Attachments

(5 files)

We should periodically run some or all of our tests under valgrind on Linux and possibly on Mac.
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Whiteboard: [automation][unittest]
Keywords: student-project
Assignee: nobody → catlee
Priority: P3 → P4
Whiteboard: [automation][unittest] → [automation][unittest][oldbug]
Where should output from the valgrind tests go?  The regular branch-specific tinderbox?

Do we have suppression files we can use?

What constitutes a test failure here?
I'm not sure we want "periodic" tests reporting to TBPL.

This testing will require special builds (with --enable-valgrind --disable-jemalloc). https://developer.mozilla.org/en/Debugging_Mozilla_with_valgrind recommends optimization level "-g -O" as a tradeoff between speed and sanity.

To build a suppressions file, we'll need to run all the tests and see what Valgrind complains about.  It can't be the first step.  You can start with fuzzing/dom/known/mozilla-central/valgrind.txt if you want, but you'll need to add to it.  Expect to need suppressions that are specific to library versions, and try to test on the same OS version as the automated testing machine.

The easiest way to detect failures is to set --error-exitcode=77 and watch for that exit code.
(In reply to comment #3)
> I'm not sure we want "periodic" tests reporting to TBPL.
> 
> This testing will require special builds (with --enable-valgrind
> --disable-jemalloc).
> https://developer.mozilla.org/en/Debugging_Mozilla_with_valgrind recommends
> optimization level "-g -O" as a tradeoff between speed and sanity.

Can we use our existing debug builds, or existing optimized builds?  What does --enable-valgrind do?

> To build a suppressions file, we'll need to run all the tests and see what
> Valgrind complains about.  It can't be the first step.  You can start with
> fuzzing/dom/known/mozilla-central/valgrind.txt if you want, but you'll need to
> add to it.  Expect to need suppressions that are specific to library versions,
> and try to test on the same OS version as the automated testing machine.
> 
> The easiest way to detect failures is to set --error-exitcode=77 and watch for
> that exit code.
--enable-valgrind adds code to Firefox to help Valgrind understand what Firefox is doing.  For example, we let it know that it doesn't need to complain about "use of uninitialized memory" when we're doing conservative stack scanning in the JS GC.

http://mxr.mozilla.org/mozilla-central/search?string=MOZ_VALGRIND
http://mxr.mozilla.org/mozilla-central/search?string=JS_VALGRIND

Our existing builds (opt and debug) don't work because they're not built with the necessary flags (--enable-valgrind and --disable-jemalloc).  Also, our "opt" is too stripped for Valgrind and our "debug" is slower than needed.
--smc-check=all for the valgrind run is also necessary, realistically:

* JM doesn't notify Valgrind about code discards when patching
  pics and wotnot, even with --enable-valgrind, which can cause
  valgrind to fail to make forward progress in such situations

* on OSX, I suspect some of the system libraries of doing runtime
  code generation (particularly to do with graphics) and I've had
  to use this flag to get sane behaviour

The perf overhead from --smc-check=all is much smaller in 3.6.0
than in any previous version, primarily so as to handle the above
two scenarios better.  So I'd recommend using it rather than
earlier versions.
As a first step, I suggest getting the browser running under valgrind, starting up, visiting one page that has some JS code, and shutting down. This would have caught the bugs I'm about to file, and actually covers a substantial amount of code.
The 32-bit and 64-bit linux slaves currently have valgrind-3.2.1 installed.

The OSX 10.6 slaves don't currently have valgrind installed.
(In reply to comment #7)
> As a first step, I suggest getting the browser running under valgrind, starting
> up, visiting one page that has some JS code, and shutting down. This would have
> caught the bugs I'm about to file, and actually covers a substantial amount of
> code.

I suggested that an easy place to start would be running profileserver.py, the script that we use for PGO profiling, since it starts the browser and runs sunspider. Plus, since it's built on top of automation.py, it should be pretty easy to reuse the --debugger=valgrind support that's in there.
The suppressions file is just a sample that gets valgrind running without errors on my machine.  I'll fill in results from build machines later.
Attachment #491661 - Flags: feedback?(ted.mielczarek)
Attachment #491661 - Flags: feedback?(bhearsum)
Comment on attachment 491655 [details] [diff] [review]
Add debugger support to pgoserver.py

Build succeeded on try.  Log is here: http://stage.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/catlee@mozilla.com-3cd17c61df08/tryserver-win32/tryserver-win32-build5073.txt.gz

Looks ok to me!.
Attachment #491655 - Flags: review?(ted.mielczarek)
Comment on attachment 491661 [details] [diff] [review]
Script for compiling / running valgrind build

Having the mozconfig part of this script isn't great if we end up running this on multiple branches, fine for now.

It would be good to call purge_builds.py here, this build will be of non-trivial size, and could be an issue especially on the 64-bit linux slaves. It might be good to have clobberer support, too...it would suck to have to clobber these manually when issues come up.

Everything else seems fine.
Attachment #491661 - Flags: feedback?(bhearsum) → feedback+
Attachment #491655 - Flags: review?(ted.mielczarek) → review+
Attachment #491661 - Flags: feedback?(ted.mielczarek) → feedback+
Keywords: checkin-needed
Whiteboard: [automation][unittest][oldbug] → [automation][unittest][oldbug] pgoserver.py patch needs checkin
Attachment #493995 - Flags: review?(bhearsum)
Attachment #494033 - Flags: review?(bhearsum)
Attachment #494036 - Flags: review?(bhearsum)
Comment on attachment 494036 [details] [diff] [review]
Script for compiling/running valgrind

I noticed that there's valgrind builders reporting to clobberer with branch set as "None" -- shouldn't it be set to `basename $HG_REPO` since properties.branch isn't used for this job?

Everything else seems ok.
Attachment #494036 - Flags: review?(bhearsum) → review+
Comment on attachment 493995 [details] [diff] [review]
valgrind project support in misc.py

Per IRC, it seems like we'll be firing 1 job per builder/scheduler master with this....I'm not entirely sure what the best solution is. We could live with it, but that seems like a lot of wasted cycles testing the same or similar things. We could skip adding the Scheduler for the builder masters, maybe?
Attachment #494033 - Flags: review?(bhearsum) → review+
Comment on attachment 493995 [details] [diff] [review]
valgrind project support in misc.py

Catlee reminded me that the schedulers are already not added to the builder masters.
Attachment #493995 - Flags: review?(bhearsum) → review+
Flags: needs-reconfig?
scratch that, pgoserver.py patch needs to land first.
Flags: needs-reconfig?
Comment on attachment 491655 [details] [diff] [review]
Add debugger support to pgoserver.py

On mozilla-central: changeset:   58657:8bc10b6a2222
Attachment #491655 - Flags: checked-in+
The profileserver.py patch landed without issue.
Comment on attachment 493995 [details] [diff] [review]
valgrind project support in misc.py

Checked in on default
changeset:   1217:a081db84352d
Attachment #493995 - Flags: review+
Comment on attachment 494036 [details] [diff] [review]
Script for compiling/running valgrind

changeset:   1041:a8b7c04281d7
Attachment #494036 - Flags: checked-in+
Comment on attachment 494033 [details] [diff] [review]
configs for valgrind project

Landed on default
changeset:   3378:27f10622e0dc
Attachment #494033 - Flags: checked-in+
Flags: needs-reconfig?
Whiteboard: [automation][unittest][oldbug] pgoserver.py patch needs checkin → [automation][unittest][oldbug]
Getting landed this morning.
Flags: needs-reconfig? → needs-reconfig+
Success!

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291740104.1291747847.11018.gz&fulltext=1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I see there's a valgrind-linux column on the m-c tinderbox page.  Is that the only place?

(In reply to comment #3)
> I'm not sure we want "periodic" tests reporting to TBPL.

Why not?  That's what happens with the nightly builds.  I think for a lot of people, if a test doesn't report to TBPL it doesn't exist.
(In reply to comment #28)
> I see there's a valgrind-linux column on the m-c tinderbox page.  Is that the
> only place?

Currently yes.

> (In reply to comment #3)
> > I'm not sure we want "periodic" tests reporting to TBPL.
> 
> Why not?  That's what happens with the nightly builds.  I think for a lot of
> people, if a test doesn't report to TBPL it doesn't exist.

I'm not opposed.  I think the concern is that they're not as regular as nightly builds since they run when we have spare cycles in the pool.

File a bug against webtools:Tinderboxpushlog if you want it to pick up the valgrind runs.
(In reply to comment #29)
>
> File a bug against webtools:Tinderboxpushlog if you want it to pick up the
> valgrind runs.

Done: bug 617431.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: