Closed Bug 397724 Opened 17 years ago Closed 17 years ago

Use trace-refcnt and trace-malloc to check for leaks while running mochitest/reftest

Categories

(Testing :: Mochitest, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sicking, Assigned: coop)

References

Details

Attachments

(1 file)

It would be great to enable refcnt logging while running the mochitest and reftest test suites. This would be a much better test base than the handful of pages that we currently run. An additional benefit would be that we early get some testing of new features that are added.

Extra nice would be if we could make the test harness run the tests one at a time and shut down the browser in between. This would allow us to see which test that leaks. This could definitely be done as a second step though.
bug 393993 is about adding simpler leak-gauge.pl-based testing and it also tracks the current leaks in the test suite which need to be fixed (or ignored by the leak test harness) before we make the tests fail on leaks.
Depends on: 393993
I'm not sure shutting down in between each page is especially worthwhile.  It'll slow down the tests tremendously, and as the default behavior will be PASS, it should be obvious when a newly-committed group of tests causes leaks.  At that point, the committer of the tests should back them out and be able to determine what leaked.

The time spent by everyone waiting for tests to complete is worth more than the time of the person who committed the leaky tests.  Mochitest runs should optimize for speed, not for determining which of a set of (possibly only one!) tests leaks.
I think that by default all tests should be run in one go, but it should be easy for the developer to run all the tests one by one locally -- so that he can easily determine which test started to leak due to his patch.
Flags: blocking1.9?
just to capture some of the discussion from last week's leaks meeting, this requires debug builds with --enable-trace-malloc. Any other build options required or processing to be done after the runs? 
Assignee: nobody → ccooper
Yeah, we also need to collect the data output at browser shutdown and have it sent to tinderbox. We do that on the "Linux fxdbug-linux-tbox Dep" box right now.
Summary: Check for refcnt leaks while running mochitest/reftest → Use trace-refcnt and trace-malloc to check for leaks while running mochitest/reftest
Do we have a preference as to which platform gets tackled first for this?, i.e. is the OS for this bug accurate as WinXP?
we'll probably want to do all platforms. I usually start with linux for these setups and then move to mac and windows, but that's up to you.
OS: Windows XP → All
Hardware: PC → All
Most leaks are cross platform so whatever is easiest for you to start with would be great! Thanks a lot for your help here!
Priority: -- → P3
Re-assigning to server-ops to get the slaves cloned for this. Please assign it back once you're done.

Note that I just filed bug 402123 to have the slaves cloned for the staging environment, so feel free to do these simultaneously if that makes sense.
Assignee: ccooper → server-ops
Depends on: 402123
Can you list off for me which VMs need to be cloned?  I'm not seeing anything obvious from reading this bug report.  Thanks!
Assignee: server-ops → ccooper
Initially qm-centos5-01 and qm-winxp01. Let us know when you're ready to clone 'em and we can bring them down quietly and without a struggle.
Dave: mrz indicated in bug 402123 that only two of these slaves are VMs. qm-winxp01. We'll need a clone of those two, but we'll also need new hardware to mirror the other slaves.

VMs:
* qm-centos5-01
* qm-winxp01

Phyiscal boxes (One's an Apple Xserve and the other is an HP DL360):
* qm-xserve01
* qm-win2k3-01
Assignee: ccooper → server-ops
Tossing back to coop, clone is in another bug.
Assignee: server-ops → ccooper
+'ing.  Setting P4.  Would like to have something here that tests for leaks.
Flags: blocking1.9? → blocking1.9+
Priority: P3 → P4
Blocks: 399096
Depends on: 403636
Damon: I'll be bumping this to P2 and working on it as soon as I have hardware (bug 403636).
Priority: P4 → P3
I have two of four VMs/machines to work with now, so I'm going to start.
Status: NEW → ASSIGNED
Priority: P3 → P2
(In reply to comment #15)
> Damon: I'll be bumping this to P2 and working on it as soon as I have hardware
> (bug 403636).
> 

Thanks the help.
dbaron's BloatTest2 in tinderbox measures/reports 3 things:

1) Lk: bytes malloc-ed and not free-d
2) MH: Max Heap
3) A:  Allocations

In buildbot, we run the following tests as part of the unittest process:

a) make check
b) reftests
c) MochiTest
d) Mochichrome
e) BrowserChrome

What's the matrix of these that we want for this bug? "All" is a valid answer.
I think this bug is asking for 1-3 *plus* the nsTraceRefcnt bloat test (RLk), for b, c, d, and e.  (Though 3 really isn't particularly useful, and could easily be dropped if we need the space, but it's an easy side-effect of getting 1 and 2.)
The CentOS box is up:

http://qm-unittest02:2006

I fiddled with turning on the Mac OSX and WinXP boxes tonight, but I'm getting some weird errors.

To do:
* standardize tinderbox reporting between tests
* diagnose Mac and Windows XP connection errors
* setup Win 2K3 leak test slave
* land createTestingProfile.py script in a suitable place (bug 393410)
* fix runtests.pl to allow for malloc and shutdown leak logging
Coop, any updates here? This would be extra great to get running now that we have crashtest up and running and can host tests for assertions and leaks.
Mac and WinXP are up now too at: 

http://10.2.73.58:2006/ (internal only right now, sorry)

What's the end goal here, presentation-wise? Do I need to get these results into Tinderbox somehow or can they live here?

Note that CentOS is currently crashing in mochitest:

http://10.2.73.58:2006/Linux%20qm-leak-centos5-01%20dep%20Leak%2BUnit%20test/builds/445/step-mochitest%20leak/0 

...and WinXP is failing and exiting in mochitest:

http://10.2.73.58:2006/WinXP%20qm-leak-winxp01%20dep%20Leak%2BUnit%20test/builds/149/step-mochitest%20leak/0

To do:
* tinderbox reporting?
* setup Win 2K3 leak test slave and get IT to take an image
* have runtests.pl patch reviewed
Internal to what network?  I can't access it using the building S wireless network 'MozillaCorp'.
(In reply to comment #23)
> Internal to what network?

MPT VPN
The goal is to have this on the main tinderbox page, and at least reporting the amount of memory leaked, i.e. both the RLk and the Lk numbers of the current tinderboxes.

I guess since we'd shut down the browser between each test (right?) we'd need to add up the amount of leaked memory from all runs.

Ultimately I think we should have one box for each platform, and have them all run the full suite of tests. But a great first step would be to have just one platform up, running as many as you can of mochitest, reftest and crashtest.

Can you post the crash and exit logs for mochitest somewhere where it's more easily reachable? Ideally this is when tinderbox should be going orange :)

Is the runtest.pl patch attached anywhere?
(In reply to comment #25)
> Can you post the crash and exit logs for mochitest somewhere where it's more
> easily reachable? Ideally this is when tinderbox should be going orange :)

I'll just turn on the Tinderbox mailer again and send it to MozillaTest tree for now. If people could help diagnose the CentOS and WinXP failures, that would that would be very helpful. 

These slaves are named qm-leak* are are just appearing on the MozillaTest tree now.

> Is the runtest.pl patch attached anywhere?

I was going to file a separate bug, but I suppose I can do that here. Patch forthcoming.
I don't think it's necessary to shut down the browser between each test on Tinderbox.  We don't do that for our existing leak tests and we don't do that for "loading this page causes a crash on shutdown" regression tests.

It would be good to have a way to do that locally, though.
What I mean is that we'll likely shut down the browser between test suites, not individual test cases. So we'll run mochitest, shut down, run reftest, shut down, etc. Of course ideal would be if we didn't do that, but that's unlikely worth the extra work.
This patch allows me to set the (e.g.) following command-line args for runtests.pl and have them passed on to the browser for leak logging:

 --browser-arg=--trace-malloc=logs/mochitest-malloc.log --browser-arg=--shutdown-leaks=logs/mochitest-shutdownleaks.log

There args are handled in a similar way to the existing setenv option in runtests.pl. 

Any future browser args we need could also be passed thru using this option.
Attachment #293879 - Flags: review?(sayrer)
Comment on attachment 293879 [details] [diff] [review]
Pass browser args from the command line thru runtests.pl to the browser

Haven't seen sayrer around, so switching review request.
Attachment #293879 - Flags: review?(sayrer) → review?(jonas)
Comment on attachment 293879 [details] [diff] [review]
Pass browser args from the command line thru runtests.pl to the browser

Fix the tab in sub main. r=me with that
Attachment #293879 - Flags: review?(jonas) → review+
Landed with tab fix.

Checking in runtests.pl.in;
/cvsroot/mozilla/testing/mochitest/runtests.pl.in,v  <--  runtests.pl.in
new revision: 1.34; previous revision: 1.33
done
Depends on: 372581
These bugs are all up-and-running now and reporting to the MozillaTest tinderbox tree (qm-leak*) until they become stable enough to live elsewhere.

Specific bugs can be filed against the various machines from here out.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
No longer blocks: mlkTests
Moving a bunch of Core :: Testing bugs to Testing :: Mochitest to clear out the former, which is obsolete now that we have more specialized categories for such bugs; filter on the string "MochitestMmMm" to delete all these notifications.
Component: Testing → Mochitest
Flags: blocking1.9+
Product: Core → Testing
QA Contact: testing → mochitest
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: