Closed
Bug 549224
Opened 14 years ago
Closed 1 year ago
[META] valgrind: mochitests-plain: fix all Memcheck-detectable memory errors (x64-Linux)
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jseward, Assigned: jseward)
References
Details
(Keywords: meta, valgrind)
Attachments
(2 files)
It would be nice to ship 1.9.3 with zero memory errors as detectable by run time detection tools. As a starting point, this metabug covers errors detectable by Valgrind's Memcheck tool, for a complete run of mochitests-plain, on x86_64-Linux. Using the techniques described at https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind#Tips_for_improving_performance_and_accuracy_of_Valgrind%27s.c2.a0Memcheck_tool it is possible to complete a mochitest-plain run of Fx in less than 7 CPU hours on a fast machine with 3GB memory. This metabug tracks the individual bugs harvested from, or observable from, such runs. Why x86_64-Linux and not a more common target? Because Valgrind can run Fx on that platform relatively fast (30 x slowdown), as per comments in abovementioned URL. Most of the bugs found are in cross-platform code, so this exercise is of value to all platforms. Future work may expand the set of platforms and tools for which this exercise is viable. --- SETUP ----------------------------------------------- You must use the suppressions and mozconfig files attached to this bug. Platform: Ubuntu 9.10 x86_64, but any recent 64-bit Linux would do. Recent Valgrind trunk as per http://www.valgrind.org/downloads/repository.html. Don't use old versions, they are slower and less stable than trunk. Make sure your /usr/include/valgrind/{valgrind,memcheck}.h are either installed from the Valgrind build, or are symlinks to it. The Fx build will need them for code-discard notifications. I built Fx with vanilla FSF gcc-4.3.4 with "-g -O2". For unknown reasons (possible Valgrind bug) a build with gcc-4.4.x at -O2 segfaults when run on Valgrind, so don't use that. Note that gcc-4.4.1 is the default compiler on 9.10. Make sure you get a 64-bit build, not a 32-bit one. Use the mozconfig file attached to this bug. The most critical things are to disable JEMalloc and to build at a high optimisation level. Build; check the build works. Make sure you have a local DBUS: killall dbus-daemon eval `dbus-launch` \ && export DBUS_SESSION_BUS_ADDRESS && export DBUS_SESSION_BUS_PID --- RUN ------------------------------------------------- Start up a VNC server -- I use 960 x 720 x 16-bit depth. Run the entire suite using the following command. You will need to set DISPLAY correctly, and make --suppressions point to the attached .supp file. (DISPLAY=:1.0 make -C ff-opt mochitest-plain EXTRA_TEST_ARGS='--close-when-done --debugger=vTRUNK --debugger-args="--tool=memcheck --error-limit=no --stats=yes --vex-guest-chase-cond=yes --suppressions=/home/sewardj/MOZ/mochitest-mc.supp --trace-children=yes --child-silent-after-fork=yes '--trace-children-skip=/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel'"') 2>&1 | tee spew5-memcheck-jit-enabled Progress is very non-linear. Most of the time goes in tests numbered 50000-65000 approximately. Once past the low 60ks it finishes quite rapidly (160k tests in total). Peruse results. To rerun a specific test, for debugging purposes, add TEST_PATH to the above command line, eg: ... make -C ff-opt mochitest-plain \ TEST_PATH=dom/tests/mochitest/whatwg/test_bug500328.html \ EXTRA_TEST_ARGS=... Mochitests prints the name of a test only after it's done. So if memcheck emits a bunch of complaints, look for the name immediately after those complaints, not before.
Assignee | ||
Comment 1•14 years ago
|
||
Assignee | ||
Comment 2•14 years ago
|
||
Assignee | ||
Updated•14 years ago
|
Assignee | ||
Comment 3•14 years ago
|
||
Some observations: As of now (following filing of bug 549779) I'd guess I have found about 2/3 of the flaws detectable like this. All but one of them are to do with use of undefined values, most of which were created by stack allocations. They are all in XP code. It surprised me that there were not more invalid-address errors (reading/writing in a bad place). This might be because invalid address errors are easier to track down, or it might be because uninitialised value errors are regarded as less dangerous (a fallacy! they can just as easily lead to bizarre behaviour and crashing). Some of these flaws have, I suspect, been around a long time. Some of them involve lengthy and obscure control-flow paths which makes them pretty hard to track down. Bug 549236 is an example of both. I have a temporary rollup patch which "fixes" most of the problems so that further checking can go on without being swamped by noise from the so-far-discovered problems, until such time as proper fixes for them are made. Ping me if you want a copy.
Updated•2 years ago
|
Severity: normal → S3
Comment 4•1 year ago
|
||
Closing inactive metabugs
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•