Closed Bug 801911 Opened 11 years ago Closed 6 years ago

Run jstests and jit-tests in Valgrind tbpl builds on test slaves

Categories

(Release Engineering :: General, defect, P3)

All
Linux
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: gkw, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: valgrind, Whiteboard: [valgrind])

Attachments

(2 files, 5 obsolete files)

We should also expand to running jstests and jit-tests in Valgrind tbpl builds.
Keywords: valgrind
Whiteboard: [valgrind]
On the test slaves I assume?
Priority: -- → P3
(In reply to Chris AtLee [:catlee] from comment #1)
> On the test slaves I assume?

Yes.
Summary: Run jstests and jit-tests in Valgrind tbpl builds → Run jstests and jit-tests in Valgrind tbpl builds on test machines
Summary: Run jstests and jit-tests in Valgrind tbpl builds on test machines → Run jstests and jit-tests in Valgrind tbpl builds on test slaves
Attached file jstests run log (obsolete) —
time python jstests.py -j1 -t 180 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77  --leak-check=full --gen-suppressions=all --show-possibly-lost=no --track-orns=yes --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc/dist/Nigh.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/jstestsVg-90cea19e27e2.txt


I might need to set the timeout a little longer..
> time python jstests.py -j1 -t 180 --valgrind --valgrind-args="--dsymutil=yes
> --smc-check=all-non-file --error-exitcode=77  --leak-check=full
> --gen-suppressions=all --show-possibly-lost=no --track-orns=yes
> --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-
> architecture.sup
> --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s
> -d ../../../obj-ff-64-opt-mc/dist/Nigh.app/Contents/MacOS/js 2>&1 | tee
> ~/Desktop/jstestsVg-90cea19e27e2.txt
> 

Actual command:

time python jstests.py -j1 -t 180 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77  --leak-check=full --gen-suppressions=all --show-possibly-lost=no --track-orns=yes --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc/dist/Nightly.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/jstestsVg-90cea19e27e2.txt
Corrected again:

time python jstests.py -j1 -t 180 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77 --leak-check=full --gen-suppressions=all --show-possibly-lost=no --track-origins=yes --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc/dist/Nightly.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/jstestsVg-90cea19e27e2.txt

(This took about 11 hours to complete on a Mac)

I'm now retrying with a timeout of 1200 - there were some timeouts in the results.
time python jstests.py -j1 -t 1200 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77 --leak-check=full --gen-suppressions=all --show-possibly-lost=no --track-origins=yes --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc/dist/Nightly.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/2jstestsVg-90cea19e27e2.txt

This took about 14 hours - note that this was -j1 on Mac. Having more than -j1 on Mac will seemingly result in race conditions for .dSYM folders, so -j2 and above should be run on Linux instead. (haven't tried running yet on Linux though)
Attachment #680370 - Attachment is obsolete: true
Valgrind is generally faster and more reliable on Linux than on Mac, and mochitests stresses Valgrind hard.
time python jstests.py -j1 -t 1200 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77 --leak-check=full --gen-suppressions=all --show-possibly-lost=no --suppressions=/Users/fuzz3/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/Users/fuzz3/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc/dist/Nightly.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/3noTrackOriginsjstestsVg-90cea19e27e2.txt

This took about 510 mins (8.5 hrs) without --track-origins=yes.

With --track-origins=yes taking about 840 mins (14 hours), enabling --track-origins=yes will take about 65% longer.

This was tested with -j1 on Mac.
Results from log in comment 8:

REGRESSIONS
    js1_8_1/regress/regress-452498-135-a.js
TIMEOUTS
    js1_8_1/extensions/regress-477187.js
    js1_8_1/regress/regress-479430-01.js
    js1_8_1/regress/regress-479430-02.js
    js1_8_1/regress/regress-479430-03.js
    js1_8_5/regress/regress-620376-1.js

I filed bug 810753 on the test failure in js1_8_1/regress/regress-452498-135-a.js - not exactly a test failure but that it expects exit code 6 which conflicts with --error-exitcode=77.

I'm not yet sure if the timeout failures will all go away if we keep increasing the timeout. Probably should retest on Linux.
Are jstests run via js shell or xpcshell on tbpl?
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #8)
> With --track-origins=yes taking about 840 mins (14 hours), enabling
> --track-origins=yes will take about 65% longer.

--track-origins=yes doesn't increase Memcheck's ability to detect problems.
It only improves the diagnostic information presented when an uninitialised
value is detected.

So I'd suggest run without it.  My view was always that --track-origins=yes
is something to use once you know you have a problem and are going about
debugging it.
> --track-origins=yes doesn't increase Memcheck's ability to detect problems.
> It only improves the diagnostic information presented when an uninitialised
> value is detected.

True, I guess I was only worried about the situation when we had a non-reproducible uninitialised value error case, but they are somewhat rare.
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #10)
> Are jstests run via js shell or xpcshell on tbpl?

Neither: they are run in the browser.  The test runner passes the flag -reftest, the location of the manifest, and a specially crafted profile to firefox.  This causes firefox to enter a special testing mode which knows how to parse a manifest and run the tests it finds.
time python -u jit-test/jit_test.py --no-slow --tbpl -t 1200 --valgrind-all ../../obj-ff-64-opt-mc/dist/Nightly.app/Contents/MacOS/js 2>&1 | tee ~/Desktop/jittestVg-90cea19e27e2.txt

This relies on the patch in bug 810767.
> Created attachment 681347 [details]
> jit-test run log - now with timeout of 1200s (20mins)

This run took:

real    2576m15.311s
user    2509m48.824s
sys     52m58.068s

or about 43 hours or about 1.8 days to complete.
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #15)
> > Created attachment 681347 [details]
> > jit-test run log - now with timeout of 1200s (20mins)
> 
> This run took:
> 
> real    2576m15.311s
> user    2509m48.824s
> sys     52m58.068s
> 
> or about 43 hours or about 1.8 days to complete.

(on a Mac Mini running Mac OS X 10.7.x)
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #15)
> or about 43 hours or about 1.8 days to complete.

That's terrible.  How long does it take natively?  For single threaded
code I compiled at -O or above, wouldn't expect to see a slowdown
factor of more than about 25x with Memcheck.
> That's terrible.  How long does it take natively?  For single threaded
> code I compiled at -O or above, wouldn't expect to see a slowdown
> factor of more than about 25x with Memcheck.

I'm guessing running natively takes <= 1 hour, but I haven't tried it yet.

Two things I spotted at a quick glance of the log:

(1) The error messages are repeated 10 times per test. Is each test being run 10 times?
(2) Towards the middle of the log, the time taken "resets" from 86396.4s to 1.1s near 56%. Why is this happening?
For me, I think it takes more than an hour to run natively with --tbpl and --no-slow.

(1) I think --tbpl tests ten different configurations, whereas the default tests two.  Are there different flags in front of the test in each of the 10 log entries?
(2) The progress bar can only display 5 digits and things get weird around that barrier.  I forget the details, but the code in progressbar.py that was copied into jstests from jittests was extremely buggy in that case.
> (1) I think --tbpl tests ten different configurations, whereas the default
> tests two.  Are there different flags in front of the test in each of the 10
> log entries?

We'd like to replicate tbpl behaviour - how do I tell if there are "different flags in front of the test in each of the 10 log entries"? The log in

https://bug801911.bugzilla.mozilla.org/attachment.cgi?id=681347

doesn't seem to show those flags.

> (2) The progress bar can only display 5 digits and things get weird around
> that barrier.  I forget the details, but the code in progressbar.py that was
> copied into jstests from jittests was extremely buggy in that case.

Do you mean code in progressbar.py that was copied into *jittests* from *jstests* instead? I think jstests is more old-school than jittests, isn't it?
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #20)
> > (1) I think --tbpl tests ten different configurations, whereas the default
> > tests two.  Are there different flags in front of the test in each of the 10
> > log entries?
> 
> We'd like to replicate tbpl behaviour - how do I tell if there are
> "different flags in front of the test in each of the 10 log entries"? The
> log in
> 
> https://bug801911.bugzilla.mozilla.org/attachment.cgi?id=681347
> 
> doesn't seem to show those flags.

Ah, sorry, I thought you meant the failure log, not the raw test output.  I don't know where you can find the list then.
 
> > (2) The progress bar can only display 5 digits and things get weird around
> > that barrier.  I forget the details, but the code in progressbar.py that was
> > copied into jstests from jittests was extremely buggy in that case.
> 
> Do you mean code in progressbar.py that was copied into *jittests* from
> *jstests* instead? I think jstests is more old-school than jittests, isn't
> it?

I could be wrong, but I thought that the progressbar was backported from the newer, nicer-at-the-time jittests suite.  In any case, the code for both progressbars was basically identical when I started hacking on the jstest suite.
FAILURES:
    --no-jm /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion --no-jm --no-ti /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion --no-ti /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion --no-ti -a -d /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion --no-jm /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion -a /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion -a -d /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --no-ion -d /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/debug/Script-sourceMapURL.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/ion/bug678625.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/ion/bug679493-2.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/ion/bug679493.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/ion/bug680619.js
    --ion-eager /home/fuzz2lin/trees/mozilla-central/js/src/jit-test/tests/jaeger/recompile/staticoverflow.js
TIMEOUTS:

real    1918m22.807s
user    1851m49.540s
sys     53m5.503s


Ran this on Linux with --track-origins=yes, takes about 1.3 days (jit-test is not multithreaded, see bug 638219.

Will file bugs later.
Comment on attachment 681347 [details]
Mac jit-test run log - now with timeout of 1200s (20mins) (non-verbose and without --track-origins=yes)

This is no longer useful because this is non-verbose (does not show the CLI arguments for failing tests) and does not use --track-origins=yes. We seem to have some conditional jump errors that need that Valgrind flag to pinpoint the problem.
Attachment #681347 - Attachment description: jit-test run log - now with timeout of 1200s (20mins) → Mac jit-test run log - now with timeout of 1200s (20mins) (non-verbose and without --track-origins=yes)
Attachment #681347 - Attachment is obsolete: true
(In reply to Gary Kwong [:gkw] from comment #22)
> Created attachment 693096 [details]
> Linux jit-test run log - now with timeout of 1200s (20mins)

Command used was:

time python -u js/src/jit-test/jit_test.py --no-slow --tbpl -t 1200 --valgrind-all /home/fuzz2lin/Desktop/jsfunfuzz-dbg-64-mozilla-central-115787-edd45de440ba-Jc33kF/js-opt-64-edd45de440ba-linux 2>&1 | tee ~/Desktop/jittestVgWithTrackOrigins-edd45de440ba.txt
> Command used was:

Note to self: In a local patch, I also removed "-q" (to see which testcase and what CLI arguments were used), added "--track-origins=yes" and suppression parameters to both the in-tree Valgrind suppression file and my local one.
Command used:

cd /home/fuzz2lin/trees/mozilla-central/js/src/tests ;
export LD_LIBRARY_PATH=/home/fuzz2lin/trees/mozilla-central/obj-ff-64-opt-mc-basedOn-9de611848111/dist/bin/ ;
time python jstests.py -j5 -t 1200 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77 --leak-check=full --gen-suppressions=all --show-possibly-lost=no --suppressions=/home/fuzz2lin/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/home/fuzz2lin/trees/mozilla-central/build/valgrind/x86_64-redhat-linux-gnu.sup --suppressions=/home/fuzz2lin/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc-basedOn-9de611848111/dist/bin/js 2>&1 | tee ~/Desktop/noTrackOriginsjstestsVg-9de611848111.txt

Using -j8 on a 4-core hyperthreaded to 8 core mac mini on Ubuntu Linux 12.10 took about 52 mins.

Using -j5 took about 74 mins. (I ran these 2 runs just to be sure nothing bad shows up)

The jstests look okay, no bugs found.
Attachment #680448 - Attachment is obsolete: true
Attachment #680496 - Attachment is obsolete: true
-j8 command:

cd /home/fuzz2lin/trees/mozilla-central/js/src/tests ;
export LD_LIBRARY_PATH=/home/fuzz2lin/trees/mozilla-central/obj-ff-64-opt-mc-basedOn-9de611848111/dist/bin/ ;
time python jstests.py -j8 -t 1200 --valgrind --valgrind-args="--dsymutil=yes --smc-check=all-non-file --error-exitcode=77 --leak-check=full --gen-suppressions=all --show-possibly-lost=no --suppressions=/home/fuzz2lin/trees/mozilla-central/build/valgrind/cross-architecture.sup --suppressions=/home/fuzz2lin/trees/mozilla-central/build/valgrind/x86_64-redhat-linux-gnu.sup --suppressions=/home/fuzz2lin/fuzzing/known/mozilla-central/valgrind.txt" -s -d ../../../obj-ff-64-opt-mc-basedOn-9de611848111/dist/bin/js 2>&1 | tee ~/Desktop/noTrackOriginsjstestsVg-j8-9de611848111.txt

These results are without --track-origins=yes.
Comment on attachment 694119 [details] [diff] [review]
Patch for jit-test local changes

I put it here in the bug for archival purposes.
Attachment #694119 - Attachment is obsolete: true
Product: mozilla.org → Release Engineering
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.