valgrind/memgrind uninitialized memory issues for C-C thunderbird
Categories
(Thunderbird :: General, defect)
Tracking
(Not tracked)
People
(Reporter: ishikawa, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Updated•3 years ago
|
Comment 3•6 months ago
|
||
Likely the specific issues are obsolete, but it'd definitely be nice to do a new run over with valgrind/memgrind and see what else pops up.
Reporter | ||
Comment 4•6 months ago
|
||
(In reply to Ben Campbell from comment #3)
Likely the specific issues are obsolete, but it'd definitely be nice to do a new run over with valgrind/memgrind and see what else pops up.
As of now, this is the situation of valgrind + C-C TB.
-
Last year, I could no longer interactively run C-C TB under valgrind due to Bug 1880148
I gave up on running C-C TB under valgrind and switched to
ASAN build to see if I can find issues. Bug 1929721 -
However, ASAN build is good at finding out buffer overrun, etc., but it cannot find the real uninitialized issue very well.
-
About a week ago, I was pleasantly surprised to find that I can, for now, run C-C TB + valgrind combination for an interactive session (!).
Details are in https://bugzilla.mozilla.org/show_bug.cgi?id=1880148#c14
So since then, I have been running GCC-compiled C-C TB under valgrind.
Basically, I can run xpcshell test with C-C TB + valgrind combination.
I found a suspicious issue. Bug 1952749 (I am not ruling out valgrind thread model may break some fundamental assumption about TB thread model.)
But mochitest is hopeless. There are still many timeout values that need tweaking to take care of the slowdown caused by valgrind.
This has been the case since we switched from mozmill to mochitest framework. Bug 1636945
At least now, C-C + TB combination is working and I can try experimenting the lengthening of timeout values. (It took me better part of several months to make timeout values of mozmill to make C-C TB run |make mozmill|. Ever since we have switched to mochitest, C-C TB + valgrind combination did not run smoothly very much, that is why I have not been able to extend the timeout values successfully.
So far, I found that
- some tests that usually timeout in normal time times even cannot be run successfully at all and probably simply better skipped.
I simply decided to skip all the tests under browser subdirectory. - I realized that there are many tests that simply fail due to a daemon that is started at the beginning of executing a new test fails due to the failure of binding to a TCP port.
I figureed this. When a test time out occurs, |mach| cannot seem to kill / terminate the daemons that were spawned under valgrind because they could not respond in a timely manner and |mach| seems to assume that the daemons died when timeout occurs and did not issue explicit kill command. The persistent daemon(s) interfere with the next test because at the beginning of next test, daemons are again spawned but they fail to start because they cannot bind to TPC port which is still occupied by the earlier still alive daemon. Failing to start the daemon finishes the test as failure and thus, basically, once a timeout occurs, no successive tests are executed meaningfully.
At least, I have figured it out. So I am either skipping the helpless test and/or lengthening the timeout value to make the first failling test work.
I realized timeout value for waiting for the test to produce any output at all had to be lengthened as well. Ouch. valgrind does slow down some actions very much.
But this is a great progress so far. At least, now I can figure out WHAT timeout value needs to be lengthened. Previous error mode of C-C + TB was very difficult to analyze and helpless. For example, Bug 1732848
However, at this pace, I don't think I can extend the timeout values by the end of 2025. The execution of |mach mochitest| when C-C TB is run under valgrind is very slow.
BTW, I am not using the built-in valgrind feature any more. Bug 1631197
The quoting mechanism |mach| that passes options to |valgrind| was once broken, then fixed (can't recall which bug number it was), and broken again and I cannot pass the hairy option of valgrind via |mach|.
I looked at the code and it was really convoluted and no wonder the fix is not easy.
So, I am using the wrapper program. But this poses a problem of NOT passing the information that the test is executed under valgrind, and I cannot use the information to increase the timeout value programmatically. Oh well. Once I figured out which timeout values need lengthening, I can probably come back to the quote mechanism again.
I am running a few tests at a time to check for timeout, etc. Still very slow progress.
In the past, there have been issues to try to run C-C TB under valgrind. E.g. Bug 1629433 Bug 1880148 Bug 1732848 Bug 1534109 Bug 1346534
Simply running C-C TB + valgrind sometimes become an adventure game, almost.
While somehow C-C TB + valgrind combination works, I wish I could verify the existence of the previously reported issues such as
Bug 809866
Bug 803816
Bug 1725959
Bug 1162079
Bug 1311202
Bug 965424
Bug 1735940
Bug 1020229 (this one)
ASAN build has its own problems by the way. Some issues make it crash prematurely and I could not list all the issues ASAN could find until
I locally replace the event-related library with an upstream version whoesale. Yeah, ASAN detected an issue INSIDE the even library in C-C + M-C tree and TB could not proceed further if the ASAN test mode is to crash it then and there.
In hind sight, maybe I could run it simply dump error info and continue, but once such an error that is detectable by ASAN build, I doubt the operation of C-C TB is trusted after that point. We need to remove the earler error to proceed to find the later issues.
In order to do that, I had to replace the event library, which is a major surgery. Bug 1909876 (ASAN detected heap-use-after-free and as such, I marked this as security-related.)
Several years ago, I read somewhere that FF tests under valgrind is executed monthly, and some people tested FF under valgrind while they surf the web.
It is hard to believe now with all the problems I have faced.
Nobody promised us a rose garden, but why is C-C TB so difficult to run under valgrind in contrast to FF (if what I read was real) ?
Hmm...
Description
•