Open Bug 803816 Opened 12 years ago Updated 1 year ago

Valgrind warnings about uninitialised memory use (Thunderbird) [meta]

Categories

(Thunderbird :: General, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: ishikawa, Unassigned)

References

(Depends on 5 open bugs)

Details

(Keywords: meta)

Attachments

(11 files, 2 obsolete files)

139.98 KB, text/plain
Details
24.93 KB, text/plain
Details
33.65 KB, text/plain
Details
52.44 KB, text/plain
Details
934 bytes, text/plain
Details
489.85 KB, text/plain
Details
2.35 MB, text/plain
Details
2.81 MB, text/plain
Details
2.16 MB, text/plain
Details
80.14 KB, text/plain
Details
1.77 MB, text/plain
Details
TB 19.0a1 (built from the latest comm-central thunderbird source directory: I fetched it on 20th Oct, 2012 [JST] )
caused valgrind to print many warnings of usage of uninitialised value(s).

(Actually the binary built from the source fetched on 17th Sept, 18.0a1 also
had many similar warning. I refreshed the source to attach the log from
the latest source.)

I am attaching a log from the valgrind runs.
The first run in the log is to run thunderbird and then thunderbird showed the
choice of thunderbird as the default e-mail client, and then main window.
I quit thunderbird  by closing the main window button.

On the next run, thunderbird did not show the selection of thunderbird
as the system default (I already chose it in the first run), and
when the main window, I selected help and then tried to show the version number
from there.
Unfortunately, thunderbird under valgrind crashed there. I am not sure if this is the problem of thunderbird. Thunderbird would not crash if I ran it WITHOUT valgrind, and so it is likely valgrind has an issue there, or
it may be some problems related to XIM input of Japanese that I have been debugging (
Using XIM with Firefox will cause firefox menu cannot popup correctly.
https://bugzilla.mozilla.org/show_bug.cgi?id=787943
cf. It is a surprise to realize that XIM support API in libX11 seems to have been generating bogus timestamps for close to 15-20 years(?), but
only in the last 18 months or so the problem caused Gnome and GDK-library programs to trip over it due to their strict timestamp checking. TB and FF are suffering from the bogus timestamps now. I think libX11 fix is easy now. But I digress.) 

Anyway, it is disturbing to see so many uninitialized value warnings.
(Some of them are from GC-related in Java interpreter, and may be intentional to reduce memory access and CPU cycles for interpretation. But then, it would be best to initialize such memory areas in a debug build so that we can focus on REAL problems.)

TIA

PS: There are lines of 
OpenDis.c conn_buf_size =16384
GetProp:79: nbytes=1440
and similar. They are dump messages from modified libX11 to track XIM related problems. I already found additional one while I am running thunderbird 19.0a1 and tried Japanese input and when TB suddenly died due to the segmentation error when I tried to show the version number using [HELP] -> [About TB].
But that is another issue.
I am afraid that the uninitialized value causes the different behavior.

In this log, when I clicked on [help] -> [about TB], I think misclicked a button 
to show the help message to invoke a browser t show the mozilla web page, but
something went flakey and we had strange memory error warnings from valgrind.

On the second run, somehow [help] -> [about TB] DID show the version message 19.0a1. (This could not be done in the first run, and so I do suspect the
uninitialized value causes some broken internal data structure.]

Then when I tried to close TB, it seemed go into a hung state.
I had to kill valgrind from a different console.

So I think the uninitialized value error does seem to cause real problems.
Did you build with --enable-valgrind? You'd need to, or else you'd get lots of brokenness in the JS engine. It looks like you didn't, which makes these logs completely useless.
(In reply to Joshua Cranmer [:jcranmer] from comment #2)
> Did you build with --enable-valgrind?

No. Thank you for the pointer.
I have no idea if this switch is applicable to thunderbird.

But I just found today there is a blog post (posted 11 October)
"Valgrind builds are now green on TBPL as of this morning!"
http://garykwong.wordpress.com/author/nth10sd/

Reading the bugzilla entries there, 
now I have a feeling that thunderbird may need a lot of  tweaking (?) since only firefox build is mentioned in the blog post.

But I will try and see if it goes well with thunderbird.

TIA
(In reply to ISHIKAWA, chiaki from comment #3)
> (In reply to Joshua Cranmer [:jcranmer] from comment #2)
> > Did you build with --enable-valgrind?
> 
> No. Thank you for the pointer.
> I have no idea if this switch is applicable to thunderbird.

The switch enables special logging code in JS and jemalloc to make valgrind reports more accurate, so it is as applicable to TB as it is for FF.
I obsoleted two older log files (they did not use --enable-valgrind)

Here is a valgrind run, after thunderbird is rebuilt using
ac_add_options --enable-valgrind in my MOZCONFIG.

Looking at the first few warnings (and subsequent many) 
I am afraid that there may be a consistent miscalculation (introduced
for valgrind build?) regarding memory allocation area size.

I see so many warnings about 
Address ... is 4 bytes after allocated/re-allocate area

I quit the run by killing valgrind (since tb is so slow running
for quitting after hitting control-C).

Just on a hunch, I am retrieving the latest comm-central and re-compiling (but it takes a while). So I will post it tomorrow.

TIA

PS: I will edit the title to fix misspell (if that is possible) and add "thunderbird" at the end.
Attachment #673542 - Attachment is obsolete: true
Attachment #673552 - Attachment is obsolete: true
Summary: Valgrind warnings about uninitializsed → Valgrind warnings about uninitialised memory use (Thunderbird)
Can you apply the exclusions in mozilla-central/build/valgrind?
Thanks for the report!

(1) Please also compile with --disable-jemalloc - most of the errors you saw were related to jemalloc. Thus, please ensure you have added:

ac_add_options --enable-optimize="-g -O -freorder-blocks"
ac_add_options --disable-jemalloc
ac_add_options --enable-valgrind

to your .mozconfig file.


(2) Ensure you have the following Valgrind parameters:

--smc-check=all-non-file --gen-suppressions=all --leak-check=full --num-callers=50 --show-possibly-lost=no --track-origins=yes

(--track-origins=yes is useful only if you see "Conditional jump or move depends on uninitialised value(s)" errors, else it makes your system very slow)

and also add the following parameters for known suppression files in the tree:

--suppressions=<path to your comm-central compiled objdir>/_valgrind/cross-architecture.sup

and

--suppressions=<path to your comm-central compiled objdir>/_valgrind/x86_64-redhat-linux-gnu.sup

(assuming you are running on 64-bit Linux. If on 32-bit, replace "x86_64-redhat-linux-gnu.sup" with "i386-redhat-linux-gnu.sup")


Please feel free to report your results after these steps, and let us know if you encounter any issues.
Flags: needinfo?(ishikawa)
Version: 19 → Trunk
(In reply to Joshua Cranmer [:jcranmer] from comment #6)
> Can you apply the exclusions in mozilla-central/build/valgrind?

(In reply to Gary Kwong [:gkw, :nth10sd] from comment #7)
> Thanks for the report!
> Please also compile with --disable-jemalloc ...

Will do. Like suggested, even if I applied supression files, the runtime error reported by
jemalloc was there. So --disable-jemalloc will remove these messages.

I left TB compiling with the new setup on a PC. It takes time :-( 
Hopefully, I can report the result tonight (JST).

TIA
Flags: needinfo?(ishikawa)
Thank you for the tips.

Here is a log. (I will attach another.)

I now don't see disturbing run-time memory access errors any more.

Here is a quote from the paragraphs I included at the beginning of the attached
file.

I ran using a self-built libX11 library since I was tracking down
uninitialized variable usage problem in XIM support of libX11 that
produced bogus timestamp (and that was the cause of the pull-down menu
not showing properly.)

Lines like the following is dump from the modified libX11 with dump statements.
OpenDis.c conn_buf_size =16384
GetProp:98: nbytes=0
...
imDefLkup.c,740: putback k press   FIXED ev.time=0 ev.serial=9760

[System Info]
My system us Debian GNU/Linux, and
libpango which seems to have an issue is the version as shown below.

Debian Version:
ishikawa@debian-vm:/tmp$ uname -a
Linux debian-vm 3.2.0-2-686-pae #1 SMP Mon Apr 30 05:59:35 UTC 2012 i686 GNU/Linux

libpango:
ishikawa@debian-vm:/tmp$ LANG=C 
ishikawa@debian-vm:/tmp$ LC_ALL=C
ishikawa@debian-vm:/tmp$ aptitude search libpango
i A libpango-perl                   - Perl module to layout and render internati
v   libpango1-dbg-ruby              -                                           
v   libpango1-dbg-ruby1.8           -                                           
p   libpango1-ruby                  - Transitional package for ruby-pango       
p   libpango1-ruby1.8               - Transitional package for ruby-pango       
p   libpango1-ruby1.8-dbg           - Transitional package for ruby-pango-dbg   
i A libpango1.0-0                   - Layout and rendering of internationalized 
p   libpango1.0-0-dbg               - Pango library and debugging symbols       
c   libpango1.0-common              - Modules and configuration files for the Pa
i A libpango1.0-dev                 - Development files for the Pango           
p   libpango1.0-doc                 - Documentation files for the Pango         
i A libpangomm-1.4-1                - C++ Wrapper for pango (shared libraries)  
p   libpangomm-1.4-dbg              - C++ Wrapper for pango (debugging symbols) 
p   libpangomm-1.4-dev              - C++ Wrapper for pango (development files) 
p   libpangomm-1.4-doc              - C++ Wrapper for pango (documentation)     
ishikawa@debian-vm:/tmp$ 

It may be possible that conservative Debian package system uses
somewhat old libraries where some of the memory issues are already
solved.


Below is the log: We see two summary of valgrind: I think this is
because mozilla/dist/bin/thunderbird calls
mozilla/dist/bin/thunderbird-bin (?)


I ran thunderbird, and without doing much, I typed a Japanese text
string into the search box (near the upper right corner) and then quit
by closing the main window (by hitting close button of the main window
pane: not using the EXIT button of TB).
In this log, I ran thunderbird (and removed --track-origins=yes
from valgrind. I left it by mistake),
and then created a new message and typed in a few Japanese sentences there, and then quit (by saving the message into draft folder)
and quit TB using the quit button of TB itself.

Some memory issues with "_XIM*" routines are probably related to the Japanese input operation.

TIA.

PS: It would be great if mozilla can offer a "--enable-valgrind" version of the released TB versions, or run thunderbird test a la firefox test run about which I quoted Gary^s blog.

Thank you again.

I am more or less confident that TB itself is not to blame regarding the
original problem I was investigating namely
https://bugzilla.mozilla.org/show_bug.cgi?id=787943
("Using XIM with Firefox will cause firefox menu cannot popup correctly. ")
This seems to be caused mainly by the uninitialized variable use in libX11's XIM
support portion (at least for me, that is. Verification by Chinese users
who experienced the problems are still to happen.)

It is a long series of steps to install new version of valgrind that can grok
thunderbird and its libraries, and then figuring libx11 seems to be blame and
compiling it with necessary tools, and re-compile XIM-related tools.
I ran the setup for a week and only once I got a complaint from Gnome's metacity
window manager about 0-value timestamp (and metacity was fuzzy about the problem itself, and its source states the problem can be ignored for now.)

Now that thunderbird does not produce obvious run-time memory access issues (although it seems to leave a few memory blocks behind [thinking maybe that exit() will take care of that], I am confident that Bug 787943 is fixed.

PPS: it would be great if valgrind hacker can take another look at jremalloc to
make it run under valgrind so that we don't have to run thunderbird without it since it seems that jremalloc is compiled in by default in the distributed binary.


If someone would like me to try fixes to eliminate the few memory lossage
in the log, I will be happy to test it.

Thank you again.
Ok, maybe I was too fast to say that there does not seem to be runtime memory access errors.

To obtain the attached log, I did open a message and saved an attachment to
/tmp and quit.

There are so called mismatched free / delete, etc. in the log now : runtime error or warning.

Maybe TB is trying to release already released memory or whatever?
It could be an artifact of valgrind or false warning, but
I doubt valgrind fails to handle object destructor of simple C++ program.

Anyway, someone in the know can figure out what is causing this runtime error message.

TIA
One thing that would probably be more useful is trying to run the unit tests under valgrind, since that's a fairly easily reproducible way to get memory reports.
(In reply to Joshua Cranmer [:jcranmer] from comment #12)
> One thing that would probably be more useful is trying to run the unit tests
> under valgrind, since that's a fairly easily reproducible way to get memory
> reports.

Well, I happily typed make check and got the response below.

./do-make.sh ( a wrapper to set up environment variables and use client.mk, etc.)
make: *** No rule to make target `check'. Stop.

Whoa?

Then I realized that I have the following line in my MOZCONFIG.
ac_add_options --disable-tests

Also, it seems that I need to invoke
make check under the OBJ directory instead of the original source directory.

But the problem is the required resources. Gary's blog mentioned something about full firefox test run under valgrind was attempted only infrequently once a day on a powerful server (and only for a limited set of tests.)

I am not sure if my home PC is up to the task of full blown testing now...

But I will try and see if I can find some tips for people who may attempt the thunderbird valgrind tests on mozilla server farm.


TIA
make check only one of the three test suites that Thunderbird uses the other two are accessible via make xpcshell-tests and make mozmill, and it's the latter two that have the most useful tests.

Firefox does have several hours of test suites to run if they were all done linearly, while TB's takes about an hour linearly. For added speed, make xpcshell-tests in objdir/mailnews + make mozmill will catch about 90% of the tests that are likely to be useful.
I am afraid that my setup is not quite correct to run thunderbird test as of now.

What is the proper syntax for TEST_PATH...
In relation to TEST_PATH, it seems that TB tries to access
web page (!?), but this seems to be done by invoking EXTERNAL
WEB browser(!?), but under development PC using Debian GNU/Linux,
it seems to invoke non-mozilla browser. This does not seem to be
right, and invoking external browser is not quite right IMHO.
What is a proper method to invoke mozilla thunderbird for valgrind
test.

[what happened.]

I am looking at "Debugging Mozilla with Valgrind"
   https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_Valgrind

Can somebody suggest

 - what is the proper way to run thunderbird under valgrind in
   "make check" or similar command, and

 - what would be the first appropriate subset where
   thunderbird is properly exercied, i.e,
   what is the appropriate target?
   make appropriate-target  blah blah ..  

[What I did so far.]

Just to be sure, I ran "make check" after placing
--enable-tests in my MOZCONIFG and re-built thunderbird.

I moved to my MOZOBJ directory 
  cd ~/TB-NEW/TB-3HG/objdir-tb3/

and then ran make check 
(without valgrind), that is.

It failed when it tried to enter xpcom test.
Many tests preceding it were related to js jit compiler, it seemed.

The failure in xpcom directory is because, on this particular machine, I have
moved runtests.py out of the way by gzipping it.

  -rw-r--r--  1 ishikawa ishikawa   753  Aug 13 00:41 runtests.py.gz

This is related to some make issues in idl-parser directory. 


Restoring runtests.py seems to have solved the
problem. make check passed this part.

[Problems to solve.]

But, I need to learn proper make command (and its arguments) to run
"THUNDERBIRD" (not firefox) easily under valgrind.

Otherwise, I may have to rename MOZOBJ/mozilla/dist/bin/thunderbird
and then install a shell script to run the above binary
under valgrind with the original thunderbird name.
I suspect there is already a mechanism to do that if firefox can be
run on a test machine. But I don't know the right way.

It seems if I simply type "make mochitests" under my MOZOBJ directory,
thunderbird is eventually invoked to display a web page in a tab (a
wonder for a mailer to access web page during testing...), and then it
stops since a http://mochi.test site is not accessible (?!)/
[I thought it stopped, but it may be actually progressing very slowly.]

http://mochi.test:8888/tests/relative/path/test_mything.html?autorun=1&closeWhenDone=1&logFile=%2FTB-NEW%2FTB-3HG%2Fobjdir-tb3%2Fmozilla%2Fmochitest-plain.log&fileLevel=INFO&consoleLevel=INFO&failureFile=/TB-NEW/TB-3HG/objdir-tb3/mozilla/_tests/testing/mochitest/makefailures.json

Oh, I see, probably I have to specify file:// at the start of relative
path mentioned in  "Debugging Mozilla with Valgrind"
   https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_Valgrind

Well, I did. But still thunderbird showed four (or five. Hmm. it
increased?) web display tabs where the message about inaccessibility
to http://mochi.test:8888/ is mentioned and then make mochi-test ended
there (?). Why 8888? Did thunderbird detect a proxy setup
automagically somewhere?

[And this set of five tabs in thunderbird remain even if the make
mochitest ... below finishes.]

When I add EXTRA_TEST_ARGS to the command line
EXTRA_TEST_ARGS='--debugger=valgrind --setpref=javascript.options.jit.chrome=false --setpref=javascript.options.jit.content=false'

I added --debugger-args=--smc-check=all-non-file and
--debugger-args=--trace-children=yes, also.

Then I still see *COPIOUS* output from valgrind, 

At the end, I see the warning? about mochitest-ipcplugins.log not found.

>INFO | runtests.py | Running tests: end.
>grep: mochitest-ipcplugins.log: そのようなファイルやディレクトリはありません
>make[1]: ディレクトリ `/TB-NEW/TB-3HG/objdir-tb3/mozilla' から出ます
>ishikawa@debian-vm:~/TB-NEW/TB-3HG/objdir-tb3$ 

OK, maybe I should be targeting "mochitest-plain" instead of "mochitest".

From what I have been doing, the progress will be at the snail's
pace. 

Am I doing the right thing, or going in the right direction, at least?

Is there a thunderbird-specific page similar to "Debugging Mozilla
with Valgrind" ?

Any tips will be appreciated.

TIA

PS: I wonder if I failed to install thunderbird-specific test files somehow.

PPS: Now, wait a second, does thunderbird invoke a SEPARATE browser
instead of its own? I see epiphany-browser warning right below the
start of valgrind when I think TB paused (!).

...
(==26683== Rerun with --leak-check=full to see details of leaked memory
==26683==
==26683== For counts of detected and suppressed errors, rerun with: -v
==26683== Use --track-origins=yes to see where uninitialised values come from
==26683== ERROR SUMMARY: 18074 errors from 16 contexts (suppressed: 267 from 10)

(epiphany-browser:26720): Gtk-WARNING **: Loading IM context type 'uim' failed


This happened when I removed TEST_PATH argument:
 make mochitest-plain EXTRA_TEST_ARGS='--debugger=valgrind --debugger-args=--trace-children=yes --debugger-args=--smc-check=all-non-file --setpref=javascript.options.jit.chrome=false --setpref=javascript.options.jit.content=false'

TIA indeed.
We still do not yet run Gecko platform tests under Valgrind yet - see bug 795124.
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #16)
> We still do not yet run Gecko platform tests under Valgrind yet - see bug
> 795124.

OK, I understand the general problem.

What about the strange invocation of browser from thunderbird under valgrind?

For testing firefox, I think running javascript tests within 
firefox makes sense since after all, it is firefox running javascript.
Any memory-related errors reported by valgrind are about firefox.

But for testing thunderbird in my case, the thunderbird 
binary in question seems to invoke a different external executable to
run javascript and this makes no sense to me. Either I am screwing up things completely, and/or
there is a flaw in the testing framework ???

Any tips will be appreciated.

TIA

TIA
> 
> What about the strange invocation of browser from thunderbird under valgrind?
> 

Of course, I meant to say that thunderbird should run the javascript test using *its own*
built-in browser facility, I think.

It may be true that running "firefox" externally and then
use it to run javascript tests which *MAY* invoke thunderbird function via network transparent XPC magic may be what the original test developers may have intended, but

 - thunderbird may be misconfigured to run non-mozilla browser (as in my setup), and

 - running extra browser under valgrind may not be a great idea [because of the manner 
   thunderbird is run (I had --trace-children=yes and so a separately invoked
   browser probably runs under valgrind again.: this option, --trace-children may not
   be necessary for testing. I am a little confused here. I will re-check again.)

Anyway, I found that running thunderbird under valgrind is not a breezy operation :-)
My PC is a somewhat outdated 2 core Xeon. Linux runs within a VMplayer with 3 GB memory allocate to it on windows7/64 bits.
So external invocation of a large program is better avoided.
Bug 749135 - On Thunderbird builders, disable Valgrind and reftest-no-accel builders 

I could locate the above bug: is this related to the fact that 
> We still do not yet run Gecko platform tests under Valgrind yet - see bug 795124.

TIA
(In reply to ISHIKAWA, chiaki from comment #11)
> Created attachment 674679 [details]
> valgrind log of thunderbird compiled with --enable-valgrind (take 3)
> 
> Ok, maybe I was too fast to say that there does not seem to be runtime
> memory access errors.

There is an allocator mismatch bug in MimeRebuffer::~MimeRebuffer(),
true.  I doubt it is a serious problem, but it would be nice to get it
fixed.

--------

Does Tbird have some kind of big automated test suite along the lines
of Mochitests?  Assuming yes, how does one run it from the command
line?  Probably the simplest thing to do now is run Tbird's entire
test suite on Valgrind.
I replaced 

  MOZOBJ/mozilla/dist/bin/thunderbid 

with a binary of the source program attached (helper function) so
that when the test framework invoked "thunderbird" binary, it actually
invokes valgrind to run "thunderbird-bin" under the same directory.

It adds various options to valgrind.

After this change, I ran make mozmill (thank you Joshua for pointing
it out) under my MOZOBJ directory and got a log (attached in the next
post) (I ran script command and run "make mozmill" there.)

Since the test seems to run forever on my PC [maybe at least a few
hours?], I captured the first part while it is still running.

>==4333== Mismatched free() / delete / delete []
>==4333==    at 0x4025D7A: operator delete(void*) (vg_replace_malloc.c:480)
>==4333==    by 0x5E1436B: MimeRebuffer::~MimeRebuffer() (nsMimeRebuffer.cpp:22)

The mismatched free /delete / delete [] is recorded in the log  again so I think
the bug is for real (would someone file a bug entry for it?), and since such
message is in the log file, I am assured that the log has the valgrind messages. [Not quite sure how the python scripts handle the log output in the test framework, though.]   
The log looks very clean (at this stage of the test run, which is still running) in terms of run-time memory error (except for the few cases). This is  great!

"make mozmill" is an interesting test.  Without valgrind it runs rather
quickly. ( I should learn how to write and add a test case to mozmill for
bugzilla Bug 567585 
https://bugzilla.mozilla.org/show_bug.cgi?id=567585 
"TB3 fails to raise an error when it tries to save an attachment to
write-protected directory. ", which is a long-standing issue for me
using thunderbird in a corporate setup where the access to file servers are
controlled by admins, and not myself. )

Anyway, I notice that the TIMEOUT is an issue here due to the slowdown
because of the memory monitoring done by valgrind.
See Bug 794627 - Run mochitests in Valgrind tbpl builds on test slaves
https://bugzilla.mozilla.org/show_bug.cgi?id=794627

- Making timeout longer for the thunderbird mozmill tests,

- Making it easy to specify the use of valgrind and its options
  (Reading the valgrind options from environmental variable(s) or
  files, say.)

should be on the TODO list of people who may want to run thunderbird 
under valgrind to track down memory problems.

TIA
Partial test run of "make mozmill" under MOZOBJ directory after the
replacement of MOZOBJ/mozilla/dist/bin/thunderbird with the compiled binary of the helper function posted above has been done.
(In reply to Julian Seward from comment #20)

> Does Tbird have some kind of big automated test suite along the lines
> of Mochitests?  Assuming yes, how does one run it from the command
> line?  Probably the simplest thing to do now is run Tbird's entire
> test suite on Valgrind.

Test for "make mozmill" runs albeit slowly and timeout as you noted in a bugzilla about running firefox under valgrind with my wrapper.

However, I have not been able to figure out how to run xpcshell binary 
(MOZOBJ/mozilla/dist/bin/xpcshell)  under valgrind.

So I could certainly figure out how to run 

  make xpcshell-tests 

under MOZOBJ/mailnews as suggested by Joshua [and it ran successfully to completion], but I could not invoke xpcshell under valgrind easily. 

*BUT*, it seems that xpcshell-tests *MAY* invoke mostly common tests shared by Firefox (?).
If so, running "make mozmill" can give us a good 
clue about run-time memory errors of thunderbird for now.

If someone can figure out how to run xpcshell under valgrind when
make xpcshell-tests is called under MOZOBJ/mailnews 
using an extra wrapper a la the program I posted, I would appreciate it.
I tried to create a simple wrapper, but could not make it work.
xpcshell needs a wrapper to set up library path, etc, it seems.
But run-mozilla.sh that seems to do the job is shell script.
(For thunderbird, there is thunderbird and thunderbird-bin. "thunderbird" seems to handle
the library search path setup, and thunderbird-bin is the main executable although today
they seem to be the same binary. It used to be that thunderbird was a shell script to invoke thunderbird-bin, I think.)

The testing python scripts seem to invoke the external programs using Exec() and
it does not seem to like a shell script (format error and such).

I tried to include the shell program in the middle of the program invoking chain.
Then maybe I miscounted the arguments or something. The resulting wrapper made the xpcshell failed to grok the argument passed by
the test script such as "-r argument" and "-a argument". I gave up.
 
TIA

PS: Oh well, this is a long winding endeavor just to figure out thunderbird is not
generating an XEvent with a bogus timestamp just because an uninitialized variable is
accessed.
For this end, maybe I really need to make sure that jremalloc can be compiled and linked in
and make sure valgrind understand its operation correctly so that it won't produce many false positives :-(

For now, the original problem mentioned in 
Using XIM with Firefox will cause firefox menu cannot popup correctly.
https://bugzilla.mozilla.org/show_bug.cgi?id=787943
looks more or less solved for me now, and for some Chinese users by patching libX11(!).
But a Chinese user contacted me about the strange generation of a large volume of XEvents from firefox under certain conditions. It seems a separate issue. Search continues.
For those of you, who may want to help us in running xpcshell under valgrind within the testing framework, here are some relevant links I found:

Bug 198531 - Valgrind shows nsSOAPEncoding leaks
(xpcshell was run manually under valgrind.)

Bug 551095 - Add target to run xpcshell tests under valgrind
(xpcshell was run for a few test targets using a script snippet
by a developer. It was desired that this mechanism was incorporated into the
general framework, I think.)

Bug 803739 - Run xpcshell tests in Valgrind tbpl builds on test slaves
(Gary's call for inclusion of all these tests.)
Thanks for the great work so far!

To run thunderbird-bin directly, I *think* you first need to set LD_LIBRARY_PATH to be the directory of your source. (or the directory where thunderbird-bin is located, you may have to play around)

I see "Conditional jump or move depends on uninitialised value(s)" errors in your logs.

You might need to have a run with --track-origins=yes added.
> There is an allocator mismatch bug in MimeRebuffer::~MimeRebuffer(),
> true.  I doubt it is a serious problem, but it would be nice to get it
> fixed.

Spun this off as bug 805748. Credit goes to Ishikawa, Chiaki.
Also, besides adding "--track-origins=yes", you might want to add "--gen-suppressions=all". This will generate suppression parameters to allow Valgrind to ignore them in the next run.

I'd recommend you consolidate these suppression parameters into a separate file of your own choice, and append the location as you did in:

valgrind --trace-children=yes --smc-check=all-non-file --leak-check=full --num-callers=50 --suppressions=$HOME/TB-NEW/TB-3HG/new-src/mozilla/build/valgrind/cross-architecture.sup ...

Also, running with Valgrind consumes resources, and a decent multicore processor with >=4Gb ram is recommended. If you're on 2Gb, you'll likely swap a lot more than usual.
Flags: needinfo?(ishikawa)
(In reply to ISHIKAWA, chiaki from comment #22)
> Created attachment 675117 [details]
> Partial log file from "make mozmill" test run
> 
> Partial test run of "make mozmill" under MOZOBJ directory after the
> replacement of MOZOBJ/mozilla/dist/bin/thunderbird with the compiled binary
> of the helper function posted above has been done.

Also, what's the repository changeset hash that this was run with?

(run `hg identify` in the repository)
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #25)
> Thanks for the great work so far!
> 
> To run thunderbird-bin directly, I *think* you first need to set
> LD_LIBRARY_PATH to be the directory of your source. (or the directory where
> thunderbird-bin is located, you may have to play around)
> 
> I see "Conditional jump or move depends on uninitialised value(s)" errors in
> your logs.
> 
> You might need to have a run with --track-origins=yes added.

Gary, thank you for the encouragement.

Now I have a problem.

-Timeouts due to very slow operation when --track-origins=yes is
specified.

If I add --track-origins=yes, the processing is way too slow and tests seem to fail even before thunderbird-bin reaches the stage of showing the main window at all!  I think the tester of thunderbird needs a faster machine (and maybe 64 bits version of linux to make valgrind run faster.)

Also, increasing the timeout does not seem to be easy.

I checked for hard-coded timeout value but found nothing like that. It is implicitly set somewhere, and I do not know python well enough to set it.  

(It is not as easy as in mochitest in Bug 794627 - Run mochitests in
Valgrind tbpl builds on test slaves.  Too bad :-( 
*.py under MOZOBJ/mozilla/mail [ in my case,
~/TB-NEW/TB-3HG/objdir-tb3/mozilla/_tests/mozmill/] does not have an explicit setting of timeout=330, etc. )

I did increase the memory allocation to my VMPlayer, but
--track-origins causes valgrid to crawl when it runs thunderbird-bin in 32-bit linux environment.

>Also, what's the repository changeset hash that this was run >with?
>
>(run `hg identify` in the repository)

Identity is: comm-central repository is /home/ishikawa/TB-NEW/TB-3HG/new-src
hg identify
1016cef82fd8+ tip

mozilla subdirectory repository is /home/ishikawa/TB-NEW/TB-3HG/new-src/mozilla
hg identify
a517f7ea5bef+ tip

Hope this helps.
Flags: needinfo?(ishikawa)
> If I add --track-origins=yes, the processing is way too slow and tests seem
> to fail even before thunderbird-bin reaches the stage of showing the main
> window at all!  I think the tester of thunderbird needs a faster machine
> (and maybe 64 bits version of linux to make valgrind run faster.)

> I did increase the memory allocation to my VMPlayer, but
> --track-origins causes valgrid to crawl when it runs thunderbird-bin in
> 32-bit linux environment.


Yes, 64-bit Linux would help. You might also want to increase memory allocated to the VM to at least 4Gb from 3Gb, but I'm not sure about the specs of your host machine, so you may be limited there.

While running with --track-origins=yes may be slow, it is necessary to help solve "Conditional jump or move depends on uninitialised value(s)" errors. Some of them may be false positives, but some of them may be issues in the Thunderbird / Gecko platform, and if we're lucky/unlucky, may even be security bugs.

(which *may* qualify you for the bug bounty program in case they are serious security problems)

https://www.mozilla.org/security/bug-bounty.html
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #30)
> > I did increase the memory allocation to my VMPlayer, but
> > --track-origins causes valgrid to crawl when it runs thunderbird-bin in
> > 32-bit linux environment.
> 

> 
> Yes, 64-bit Linux would help. You might also want to increase memory
> allocated to the VM to at least 4Gb from 3Gb, but I'm not sure about the
> specs of your host machine, so you may be limited there.
> 

Time to upgrade my installation I suppose. (I tried 64 bit linux about 4 years ago, then 64 bit integration was not that good.)

My host has only 8GB. As a matter of fact, VMPlayer warns me that if I increase memory allocated to it, it causes SWAP to happen (on the host that is), and this may indeed slow things down further still.
> While running with --track-origins=yes may be slow, it is necessary to help
> solve "Conditional jump or move depends on uninitialised value(s)" errors.
> Some of them may be false positives, but some of them may be issues in the
> Thunderbird / Gecko platform, and if we're lucky/unlucky, may even be
> security bugs.
> 

Hmm. At least, I now know which test trigger this "Conditional jump ...", I will try to selectively run test under valgrind with
--track-origins=yes.
The issue to solve is to inrease the TIMEOUT.
(Well, first, I will need to upgrade to 64 bit linux.)

> (which *may* qualify you for the bug bounty program in case they are serious
> security problems)
> 
> https://www.mozilla.org/security/bug-bounty.html

Whoa! 

Seriously, though, as I tried to debug through the original bug
"Using XIM with Firefox will cause firefox menu cannot popup correctly."
https://bugzilla.mozilla.org/show_bug.cgi?id=787943
I think I uncovered a few real cases of DoS problems, and
even memory access(es) outside the currently allocated area 
(due to uninitialized variable on the stack, etc.) outside TB, mind you, in code that has been in use for more than 10 years(!) widely [libX11 itself (!), and very popular XIM input front end].
I am not surprised if there are a few dubious cases in TB/FF today.

OK, off to install 64 bit linux

Thank you again for making the great software available.
Nowadays, I depend on thunderbird for office work, and so
when it encounters a serious problem, there is a disruption to
my work flow. [Thus I have begun investigating the problem before I get too busy for annual exhibition at work, not as something to do in my spare time :-) ]

I am attaching the full log in my next post.

TIA
Wait, should I post the full log (yet without --track-origins=yes)
into a separate bug and make it a security-related one?
Tips will be appreciated. (Well, a motivated cracker with plenty of time on his/her hands could have done something like this to obtain a log in a couple of weeks easily, though.)
You can post the full log here for the moment. Not need to spawn new bugs yet unless there is a specific bug that we can isolate.
OK, for completeness's sake, I am attaching the full log
I obtained Saturday night.

(I removed GetProp:ll nbytes=nn debug dump lines.)

There are a few timeout errors and seemingly genuine errors (?).

The following is the top-level list of
test targets :
quoted from  MYOBJ/mozilla/_tests/mozmill/mozmilltests.list
(which is actually a symlink to
MYSRC/mail/test/mozmill/mozmilltests.list)

account
addrbook
attachment
cloudfile
composition
content-policy
content-tabs
cookies
crypto
folder-display
folder-pane
folder-tree-modes
folder-widget
im
instrumentation
junk-commands
keyboard
message-header
message-window
migration-to-rdf-ui-2
migration-to-rdf-ui-3
migration-to-rdf-ui-5
multiple-identities
newmailaccount
notification
override-main-menu-collapse
pref-window
quick-filter-bar
search-window
session-store
startup-firstrun
tabmail
utils

TODO:
I will try to figure out 
 - how to increase timeout, and
 - how to run individual test target with
   --track-origins so that we can learn more
   when "Conditional jump ..." appears.

Also, I will try to incorporate the supressions, at least for the
invoking part of thunderbird to have less clutter, but
these frivolous supressions ought to be investigated, too, although
they are more or less concered with external libraries (Correct?)

I am checking with x86_64 linux to see how it goes with valgrind.
I still need much larger timeout for mozmill to run.

One thing that makes me wonder.
gcc, g++ and valgrind in x86_64 may work faster because CPU can hold
many values in the larger number of registers (than in x86 mode) and reduce
the memory read/write so that there is less checking done by valgrind.
OK so far.
But, suppose a function has a few local variables, and compiler under x86_64 
is clever enough to allocate it to register throughout function's life time.
(In x86 mode, it is allocated to stack.)
Will the uninitialized use of such variable be detected by valgrind in x86_64 mode? (I am not familiar how ordinary registers can be tracked for such
usage before setting, if it ever is possible.)
> these frivolous supressions ought to be investigated, too, although
> they are more or less concered with external libraries (Correct?)

Bugs in external libraries should be reported in their respective bug trackers and not be filed in bugzilla.mozilla.org. Because there are so many of them, we merely suppress them here.
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #35)

> Bugs in external libraries should be reported in their respective bug
> trackers and not be filed in bugzilla.mozilla.org. Because there are so many
> of them, we merely suppress them here.

OK, will do. Now I understand why spartan use of suppression files are encouraged.
Yes, I noticed that many of the problems seemed to be related to
external libraries, and quite a lot of them :-(
I am uploading my latest attempt to run make mozmill with better suppression.
This revealed a few easy to identify (but maybe hard to analyze) error cases aside from the ones which Gary already filed as bugs.

I will post them in the bugzilla.

Since the timing (occurrence of timeouts seems to vary from one run to the other, I left the previous log as important reference.)
 
The manner how the wrapper is inovked, etc. is in the initial part of the log itself.

TIA
(In reply to Joshua Cranmer [:jcranmer] from comment #14)
> Firefox does have several hours of test suites to run if they were all done
> linearly, while TB's takes about an hour linearly. For added speed, make
> xpcshell-tests in objdir/mailnews + make mozmill will catch about 90% of the
> tests that are likely to be useful.

(In reply to Gary Kwong [:gkw, :nth10sd] from comment #16)
> We still do not yet run Gecko platform tests under Valgrind yet - see bug
> 795124.

(In reply to Julian Seward from comment #20)
> Does Tbird have some kind of big automated test suite along the lines
> of Mochitests?  Assuming yes, how does one run it from the command
> line?  Probably the simplest thing to do now is run Tbird's entire
> test suite on Valgrind.

I believe running TB's mozmill test provided a few real bug reports.

I tried running xpcshell-tests under valgrind, but could not.
The entire test proceeded very quickly, and I don't think valgrind was invoked in the
key places in the command chain despite my creating the wrapper for xpcshell binary.
 
Maybe efforts should be spent in the following entry for xpcshell-test?
Bug 803739 - Run xpcshell tests in Valgrind tbpl builds on test slaves
(Gary's call for inclusion of all these tests.)
Depends on: 809064
Depends on: 809060
A more valgrind log to classify the bugs.
A summary will follow after the next upload, which is an annotated
output of
  grep -6 uninitialized this_log_file

and added case ddd annotation.
Manually inserted case ddd labeling:
The base document is 
   grep -6 uninitialised the-previous-uploaded-log-file

TIA
Here is the classification of bugs found by valgrind.

I have uploaded  a new log file and a summary file.

Background: 

After a few more tweaks to the source files, and more entries in my
own suppression file, I ran the make mozmill test under valgrind
(memgrind) to check for memory related problems further.

This is the log uploaded in this entry :
     attachment 679654 [details] 
     still more log of valgrind run for classifying the problems
     https://bug803816.bugzilla.mozilla.org/attachment.cgi?id=679654

There seemed to be so many problems to overwhelm me initially.
However, it turns out that the problems seem to originate only
from a relatively small set of functions.

I inserted the labels "case 1", "case 2", etc. to the output of

    grep -6 uninitialized the_log_file_of_valgrind 

so that each uninitialized value usage warning is now marked with such
lables in the excerpted log.
(The grep output with the added label is also uploaded.)  
This is the   attachment 679655 [details] 
     Manually inserted case ddd labeling to excerpted portion of the previous log.

Note:

(1)  memcpy() overlap problem is already discussed and patch was
produced in Bug 809321
Source and destination overlap in mempcy in nsMimeRebuffer.cpp
(But I missed patching the file before this run. So it still showed
up, but it is no longer there in a newer log. So I feel it is fixed now.)

(2) Also, undefined value issue of fieldType is also taken care of.
Bug 809064
Uninitialized value usage in ./mailnews/base/src/nsMsgDBView.cpp
Thus errors/warnings due to this issue is not in the log file any more.

Now, below, I list the summary of each case and the function that
caused it (a representative function in the stack path that generated
the uninitialized value) below.

In the case label in the excerpted file, a title 
"not sure RunScript -> Interpret" signifies that Interpreter 
encouters certain undefined data. For figuring out where the data was
created, we need to look at the "Uninitialised value was created by"
line and the stack trace created by track-origins=yes.

Although such errors are numerous, finally I found that the problems
are caused by only several paths that contains the named functions
below.  Thus the issues are classified by these relatively small
number of function names.

Relatvively small number of these functions means that 
the developers who know the relatd code can dig into the
problem and hopefully we can get rid of these uninitialized value
usage issues quickly.

Classification of uninitialized value usage problem

case 1 - 14: nsImapMailDatabaseConstructor issue
     Bug 809866 nsImapMailDatabaseConstructor creates and leaves some data uninitialized (found by valgrind)

CASE 15: incorrect numbering? Please ignore.

CASE 16: bugzilla entry entered already. (nsNntpIncomingServer.cpp case)
     Bug 809060 - Uninitialized value usage in ./mailnews/news/src/nsNntpIncomingServer.cpp

CASE 17: n/a: looks like an external library issue.

CASE 18: a bugzilla entry is filed already: pl_base64_encode_buffer (nssb64e.c:182)
     Bug 805752 - Use of uninitialised value of size 4 in NSS
     Now assumed to be dupe of the following bug:
     Bug 806293 - Use of uninitialised value of size 4 in PR_ParseTimeStringToExplodedTime probably created by nsByteArray::GrowBuffer     

CASE 19: similar to case 18? but on a different line: pl_base64_encode_flush (nssb64e.c:253)

case 20, 21: CrateImage issue		(already filed)
     Bug 798989 Uninitialised value use in gfxUtils::GetYCbCrToRGBDestFormatAndSize (aSuggestedFormat)

case 22: nsMsgFilterList issue	
     Bug 809880 uninitialized data created by nsMsgFilterList::ParseCondition

case 23 - 29: OpenMailDBFromFile issue	
     Bug 809883 uninitialized data created by OpenMailDBFromFile

case 30 - 36: nsNewsDatabaseConstructor issue 
     Bug 809887 uninitialized data created by nsNewsDatabaseConstructor

CASE 37, 38 : similar to CASE 16: bugzilla entry filed already. (nsNntpIncomingServer.cpp case)"
     Bug 809060 - Uninitialized value usage in ./mailnews/news/src/nsNntpIncomingServer.cpp

CASE 39: a bugzilla entry filed already (PR_ParseTimeStringToExplodedTime (prtime.c:968))
     Bug 806293 - Use of uninitialised value of size 4 in PR_ParseTimeStringToExplodedTime probably created by nsByteArray::GrowBuffer     

CASE 40 - 53 : A "new"ed object seems to have uninitialized data. :
     Bug 809866 nsImapMailDatabaseConstructor creates and leaves some data uninitialized (found by valgrind)


By the time I post this, I will have filed bugzilla entries to the
uninitialized value usage issues in TB found so far (as of Nov 8th),
and let us hope the developers can get rid of the problems.

I will keep quiet for a while since the office work is getting busy now.

Thank you again for making the great software available on linux and windows.
My office work flow depends on the unbuggy operation of TB these days, and I hope
other developers/users can look into the problems uncovered (still incomplete) valgrind run so far.

I would like to thank Gary and others for guiding me so far in running TB under valgrind.

IMHO, TB needs better testing framework. I was advised to use chrome (?) for
creating test cases for the bug and patch I was struggling with (and am still
struggling (Bug 567585 - TB3 fails to raise an error when it tries to save an attachment to write-protected directory.), but
now I learned TB could not (and still cannot?) run browser-based testing harness.
Hmm... 
More testing is always better, but we need to make it easier for people to contribute IMHO.
 
TIA
Depends on: 798989, 809866, 809883, 809887, 809321
Hi,

With much longer timeout setting than in the past, the "make mozmill"
testing ran very well.

In the attached log, near the end, wee see

>INFO | (runtestlist.py) | Directories Run: 33, Passed: 908, Failed: 46

This means there were 46 failures and, since on my PC, the running
"make mozmill" WITHOUT valgrind also produces similar number of errors
(!?), I am not that concered. There could be errors in the tests
themselves (or some configuration issues).
 
However, I have noticed one new problem today, and so reporting it
separately.

Bug 814438 - Invalid Read of 4 bytes through bogus pointer in nssCertificate_Destroy (thunderbird) (edit) 

(In the attached log, you can search for it by looking for "Memcheck:addr4" )

The problem seems to be timing-related and does not happen all the
time. I have ran the testing maybe a two dozen times (once per day,
and this is the first time this problem was noticed.

TIA
Depends on: 814438
Found another bug from the latest uploadedlog:

Bug 814851 - Mismatched free() / delete / delete [] in SkMallocPixelRef::~SkMallocPixelRef() 

TIA
Depends on: 814851
Depends on: 845187
Keywords: meta
Summary: Valgrind warnings about uninitialised memory use (Thunderbird) → Valgrind warnings about uninitialised memory use (Thunderbird) [meta]
Depends on: 965424
Depends on: 1020229
(this wouldn't be OS specific)
OS: Linux → All
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #44)
> (this wouldn't be OS specific)

Right, I suspect.

I will post the latest log from the latest tree after DEC 20 for people's taking look at during holiday season...
There a few mysterious warnings coming up when there was this crash that I reported to development mailing list (but that crash is now gone mysteriously after a few source upgrades, and local Debian package upgrades. It may return anytime.)
Severity: normal → S3

Are some of the blocking bugs gone?
Does ther eneed to be some new blocking bugs?

Good question.
Some of them are definitely gone.

However, I have not been able to run TB under valgrind for mochitest fully.
So I cannot say fore sure what is gone and what is not gone, and if there are any new problems.
(Many tests time out and I am not even sure if I am covering half the tests. It was much better in mozmill days.)

Unless significant amount of man-power is used, I probably cannot not figure out WHY TB NEEDED 1500 threads and other mysteries.
This is Bug 1629433 .

I suspect there is some flawed-logic in thread generation high-level routine of TB (caused maybe due to slowdown of execution)
and the sheer amount of resulting so many threads competing for locks, etc. slows down the TB under valgrind unnecessarily. (Just a theory, but quite likely.)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: