Use trace-refcnt and trace-malloc to check for leaks while running mochitest/reftest

RESOLVED FIXED

Status

P2
normal
RESOLVED FIXED
12 years ago
10 years ago

People

(Reporter: sicking, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

It would be great to enable refcnt logging while running the mochitest and reftest test suites. This would be a much better test base than the handful of pages that we currently run. An additional benefit would be that we early get some testing of new features that are added.

Extra nice would be if we could make the test harness run the tests one at a time and shut down the browser in between. This would allow us to see which test that leaks. This could definitely be done as a second step though.

Comment 1

12 years ago
bug 393993 is about adding simpler leak-gauge.pl-based testing and it also tracks the current leaks in the test suite which need to be fixed (or ignored by the leak test harness) before we make the tests fail on leaks.
Depends on: 393993
I'm not sure shutting down in between each page is especially worthwhile.  It'll slow down the tests tremendously, and as the default behavior will be PASS, it should be obvious when a newly-committed group of tests causes leaks.  At that point, the committer of the tests should back them out and be able to determine what leaked.

The time spent by everyone waiting for tests to complete is worth more than the time of the person who committed the leaky tests.  Mochitest runs should optimize for speed, not for determining which of a set of (possibly only one!) tests leaks.

Comment 3

12 years ago
I think that by default all tests should be run in one go, but it should be easy for the developer to run all the tests one by one locally -- so that he can easily determine which test started to leak due to his patch.

Updated

12 years ago
Flags: blocking1.9?
just to capture some of the discussion from last week's leaks meeting, this requires debug builds with --enable-trace-malloc. Any other build options required or processing to be done after the runs? 
Assignee: nobody → ccooper
Yeah, we also need to collect the data output at browser shutdown and have it sent to tinderbox. We do that on the "Linux fxdbug-linux-tbox Dep" box right now.

Updated

12 years ago
Summary: Check for refcnt leaks while running mochitest/reftest → Use trace-refcnt and trace-malloc to check for leaks while running mochitest/reftest
(Assignee)

Comment 6

12 years ago
Do we have a preference as to which platform gets tackled first for this?, i.e. is the OS for this bug accurate as WinXP?
we'll probably want to do all platforms. I usually start with linux for these setups and then move to mac and windows, but that's up to you.
OS: Windows XP → All
Hardware: PC → All
Most leaks are cross platform so whatever is easiest for you to start with would be great! Thanks a lot for your help here!
(Assignee)

Updated

12 years ago
Priority: -- → P3
(Assignee)

Comment 9

12 years ago
Re-assigning to server-ops to get the slaves cloned for this. Please assign it back once you're done.

Note that I just filed bug 402123 to have the slaves cloned for the staging environment, so feel free to do these simultaneously if that makes sense.
Assignee: ccooper → server-ops
Depends on: 402123
Can you list off for me which VMs need to be cloned?  I'm not seeing anything obvious from reading this bug report.  Thanks!
Assignee: server-ops → ccooper
Initially qm-centos5-01 and qm-winxp01. Let us know when you're ready to clone 'em and we can bring them down quietly and without a struggle.
(Assignee)

Comment 12

12 years ago
Dave: mrz indicated in bug 402123 that only two of these slaves are VMs. qm-winxp01. We'll need a clone of those two, but we'll also need new hardware to mirror the other slaves.

VMs:
* qm-centos5-01
* qm-winxp01

Phyiscal boxes (One's an Apple Xserve and the other is an HP DL360):
* qm-xserve01
* qm-win2k3-01
Assignee: ccooper → server-ops
Tossing back to coop, clone is in another bug.
Assignee: server-ops → ccooper
+'ing.  Setting P4.  Would like to have something here that tests for leaks.
Flags: blocking1.9? → blocking1.9+
Priority: P3 → P4

Updated

12 years ago
Blocks: 399096
(Assignee)

Updated

12 years ago
Depends on: 403636
(Assignee)

Comment 15

12 years ago
Damon: I'll be bumping this to P2 and working on it as soon as I have hardware (bug 403636).
Priority: P4 → P3
(Assignee)

Comment 16

12 years ago
I have two of four VMs/machines to work with now, so I'm going to start.
Status: NEW → ASSIGNED
Priority: P3 → P2
(In reply to comment #15)
> Damon: I'll be bumping this to P2 and working on it as soon as I have hardware
> (bug 403636).
> 

Thanks the help.
(Assignee)

Comment 18

12 years ago
dbaron's BloatTest2 in tinderbox measures/reports 3 things:

1) Lk: bytes malloc-ed and not free-d
2) MH: Max Heap
3) A:  Allocations

In buildbot, we run the following tests as part of the unittest process:

a) make check
b) reftests
c) MochiTest
d) Mochichrome
e) BrowserChrome

What's the matrix of these that we want for this bug? "All" is a valid answer.
I think this bug is asking for 1-3 *plus* the nsTraceRefcnt bloat test (RLk), for b, c, d, and e.  (Though 3 really isn't particularly useful, and could easily be dropped if we need the space, but it's an easy side-effect of getting 1 and 2.)
(Assignee)

Comment 20

11 years ago
The CentOS box is up:

http://qm-unittest02:2006

I fiddled with turning on the Mac OSX and WinXP boxes tonight, but I'm getting some weird errors.

To do:
* standardize tinderbox reporting between tests
* diagnose Mac and Windows XP connection errors
* setup Win 2K3 leak test slave
* land createTestingProfile.py script in a suitable place (bug 393410)
* fix runtests.pl to allow for malloc and shutdown leak logging
Coop, any updates here? This would be extra great to get running now that we have crashtest up and running and can host tests for assertions and leaks.
(Assignee)

Comment 22

11 years ago
Mac and WinXP are up now too at: 

http://10.2.73.58:2006/ (internal only right now, sorry)

What's the end goal here, presentation-wise? Do I need to get these results into Tinderbox somehow or can they live here?

Note that CentOS is currently crashing in mochitest:

http://10.2.73.58:2006/Linux%20qm-leak-centos5-01%20dep%20Leak%2BUnit%20test/builds/445/step-mochitest%20leak/0 

...and WinXP is failing and exiting in mochitest:

http://10.2.73.58:2006/WinXP%20qm-leak-winxp01%20dep%20Leak%2BUnit%20test/builds/149/step-mochitest%20leak/0

To do:
* tinderbox reporting?
* setup Win 2K3 leak test slave and get IT to take an image
* have runtests.pl patch reviewed

Comment 23

11 years ago
Internal to what network?  I can't access it using the building S wireless network 'MozillaCorp'.
(In reply to comment #23)
> Internal to what network?

MPT VPN
The goal is to have this on the main tinderbox page, and at least reporting the amount of memory leaked, i.e. both the RLk and the Lk numbers of the current tinderboxes.

I guess since we'd shut down the browser between each test (right?) we'd need to add up the amount of leaked memory from all runs.

Ultimately I think we should have one box for each platform, and have them all run the full suite of tests. But a great first step would be to have just one platform up, running as many as you can of mochitest, reftest and crashtest.

Can you post the crash and exit logs for mochitest somewhere where it's more easily reachable? Ideally this is when tinderbox should be going orange :)

Is the runtest.pl patch attached anywhere?
(Assignee)

Comment 26

11 years ago
(In reply to comment #25)
> Can you post the crash and exit logs for mochitest somewhere where it's more
> easily reachable? Ideally this is when tinderbox should be going orange :)

I'll just turn on the Tinderbox mailer again and send it to MozillaTest tree for now. If people could help diagnose the CentOS and WinXP failures, that would that would be very helpful. 

These slaves are named qm-leak* are are just appearing on the MozillaTest tree now.

> Is the runtest.pl patch attached anywhere?

I was going to file a separate bug, but I suppose I can do that here. Patch forthcoming.

Comment 27

11 years ago
I don't think it's necessary to shut down the browser between each test on Tinderbox.  We don't do that for our existing leak tests and we don't do that for "loading this page causes a crash on shutdown" regression tests.

It would be good to have a way to do that locally, though.
What I mean is that we'll likely shut down the browser between test suites, not individual test cases. So we'll run mochitest, shut down, run reftest, shut down, etc. Of course ideal would be if we didn't do that, but that's unlikely worth the extra work.
(Assignee)

Comment 29

11 years ago
This patch allows me to set the (e.g.) following command-line args for runtests.pl and have them passed on to the browser for leak logging:

 --browser-arg=--trace-malloc=logs/mochitest-malloc.log --browser-arg=--shutdown-leaks=logs/mochitest-shutdownleaks.log

There args are handled in a similar way to the existing setenv option in runtests.pl. 

Any future browser args we need could also be passed thru using this option.
Attachment #293879 - Flags: review?(sayrer)
(Assignee)

Comment 30

11 years ago
Comment on attachment 293879 [details] [diff] [review]
Pass browser args from the command line thru runtests.pl to the browser

Haven't seen sayrer around, so switching review request.
Attachment #293879 - Flags: review?(sayrer) → review?(jonas)
Comment on attachment 293879 [details] [diff] [review]
Pass browser args from the command line thru runtests.pl to the browser

Fix the tab in sub main. r=me with that
Attachment #293879 - Flags: review?(jonas) → review+
(Assignee)

Comment 32

11 years ago
Landed with tab fix.

Checking in runtests.pl.in;
/cvsroot/mozilla/testing/mochitest/runtests.pl.in,v  <--  runtests.pl.in
new revision: 1.34; previous revision: 1.33
done

Updated

11 years ago
Depends on: 372581
(Assignee)

Comment 33

11 years ago
These bugs are all up-and-running now and reporting to the MozillaTest tinderbox tree (qm-leak*) until they become stable enough to live elsewhere.

Specific bugs can be filed against the various machines from here out.
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
No longer blocks: 408905
Moving a bunch of Core :: Testing bugs to Testing :: Mochitest to clear out the former, which is obsolete now that we have more specialized categories for such bugs; filter on the string "MochitestMmMm" to delete all these notifications.
Component: Testing → Mochitest
Flags: blocking1.9+
Product: Core → Testing
QA Contact: testing → mochitest
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.