Closed Bug 479713 Opened 15 years ago Closed 15 years ago

"Conditional jump or move depends on uninitialised value" in nsHTMLDocument::MaybeEditingStateChanged

Categories

(Core :: DOM: Editor, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: MatsPalmgren_bugz)

Details

(Keywords: testcase, valgrind)

Attachments

(4 files)

1. Load editor/composer/src/crashtests/428844-1.html in Firefox under Valgrind.

Result:

==56620== Conditional jump or move depends on uninitialised value(s)
==56620==    at 0x1870962: nsHTMLDocument::MaybeEditingStateChanged() 
==56620==    by 0x1792BD5: nsGenericElement::doInsertChildAt(nsIContent*, unsigned int, int, nsIContent*, nsIDocument*, nsAttrAndChildArray&) 
==56620==    by 0x177668B: nsDocument::InsertChildAt(nsIContent*, unsigned int, int) 
==56620==    by 0x188096C: nsXMLContentSink::SetDocElement(int, nsIAtom*, nsIContent*) 
==56620==    by 0x1880D2F: nsXMLContentSink::HandleStartElement(unsigned short const*, unsigned short const**, unsigned int, int, unsigned int, int) 
==56620==    by 0x1880E9B: nsXMLContentSink::HandleStartElement(unsigned short const*, unsigned short const**, unsigned int, int, unsigned int) 
==56620==    by 0x154FA95: nsExpatDriver::HandleStartElement(unsigned short const*, unsigned short const**) 
==56620==    by 0x1575D59: doContent 
==56620==    by 0x1576CFD: contentProcessor 
==56620==    by 0x1574840: doProlog 
==56620==    by 0x15755A4: prologProcessor 
==56620==    by 0x156E02D: MOZ_XML_Parse 

See http://blog.mozilla.com/nnethercote/2009/02/17/valgrind-mac-os-x-update-feb-17-2009/#comment-25 for comments about running Valgrind on Mac.
Flags: in-testsuite+
Attached file valgrind output
I can reproduce this on x86_64 Linux.  It's nsHTMLDocument::mEditingState
it complains about, which is odd since nsHTMLDocument inherits
ZEROING_OPERATOR_NEW from nsIDocument.  I added a printf in 
nsHTMLDocument::GetEditingState() and there are several preceding calls
for the same document that does not trigger a valgrind error...
... adding an explicit assignment "mEditingState=eOff" just before said
printf makes valgrind silent.  So, at the time of the CTOR valgrind
thinks it's initialized, then for 10 or so calls to GetEditingState()
it thinks it's initialized, then all of sudden it thinks it's uninitialized...

The only possible explanations I can think of is either a bug in valgrind
itself, or we make bogus VALGRIND_* calls to misinform it about the
status of this memory.  AFAICT, jemalloc is only place we use VALGRIND_*
macros (http://mxr.mozilla.org/mozilla-central/search?string=VALGRIND_)
and I traced all of those and I don't see any suspicious calls...
... but using a suppressions file is better than messing around
in the code, IMO.
Valgrind (Memcheck) is pretty good at tracking definedness, and is usually only confused by highly optimised hand-written assembly code, such as found in glibc's memcpy() function (and Memcheck replaces such functions with its own, less gnarly versions).

However, interpreting these messages can be tricky because undefined values can be created and used in various ways without being complained about.  Memcheck will only complain if an undefined value is used in a way that could affect your program's behaviour.  The common cases are:
- in the condition of a conditional branch (could cause incorrect control flow);
- as a parameter to a syscall (could cause incorrect side-effects);
- as the address of a load/store (could cause seg fault or similar).
Things that it won't complain about include:
- copying undefined values;
- doing arithmetic on undefined values.

So there can be a big gap between the original defect (you forgot to initialise a value) and the error reporting point.

One way to help with diagnosing these problems is to use the VALGRIND_CHECK_MEM_IS_DEFINED or VALGRIND_CHECK_VALUE_IS_DEFINED macros.  They're much better than printf for working backwards and finding out where the value is defined or undefined.

Even better, if you have Valgrind 3.4.0, the --track-origins=yes option gives you information about where the undefined value originated, eg. the malloc'd block you forgot to initialise, or the stack variable you forgot to initialise.  It makes Memcheck even slower than normal, but is very helpful in these situations.  If you don't have Valgrind 3.4.0 it's worth getting just for this feature.

See how these help you.  If it's still a problem, I'll try to replicate and investigate it myself.
> Valgrind (Memcheck) is pretty good at tracking definedness

I agree, my experience is that valgrind rarely have false positives,
in fact this is the first time I have ever suspected so... and I was
wrong.

Thanks for your help; it allowed me to find the real error eventually.
Using --track-origins=yes I got this output.
It blames nsDocShell::EnsureEditorData()
http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#9154
which allocates a nsDocShellEditorData, I printed the address and extent
of nsDocShellEditorData and nsHTMLDocument but there's no overlap
or anything like that.  So, I looked a bit closer at nsDocShellEditorData
and saw that it has a nsIHTMLDocument::EditingState member... which isn't
initialized in the CTOR... doing so fixed it, so I'm assuming its
"uninitializedness" traveled to nsHTMLDocument::mEditingState through
some path...

--track-origins is a great feature, but is it possible to force the error
to show up earlier somehow?  (That is, to have every use of an
uninitialized value trigger an error.)
Assignee: nobody → mats.palmgren
Attached patch Patch rev. 1Splinter Review
Attachment #363811 - Flags: superreview?(roc)
Attachment #363811 - Flags: review?(roc)
OS: Mac OS X → All
Hardware: x86 → All
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.value describes some of the constraints on detecting "use" of uninitialized memory.  I wonder if we could use a macro or compiler hack to help Valgrind detect bugs like this without introducing new false positives.
Attachment #363811 - Flags: superreview?(roc)
Attachment #363811 - Flags: superreview+
Attachment #363811 - Flags: review?(roc)
Attachment #363811 - Flags: review+
Mats (re comment #6): no, there's no way to do more eager checking without modifying Valgrind.  At the very least, you can't complain about copying undefined values, it happens all the time due to struct padding.  It might be possible to warn if an undefined value is used in an arithmetic operation, but it would slow things down a lot as every arithmetic op would require test on all the operands.  (And I bet there'd be false positives.)  --track-origins is the current best compromise.

Jesse (re comment #8): what would these hacks look like?
Add VALGRIND_CHECK_VALUE_IS_DEFINED for every argument to a constructor, using the precise (non-padded) size for structs.  Annotate out-params (we already do this) and "optional" params, if needed.
(In reply to comment #9)
> ... it happens all the time due to struct padding.

I see what you mean and I agree there would be a lot of false positives
from "low level C" parts of the application, e.g. libraries like GTK,
cairo, sqlite3 etc, and maybe JS too, but other parts of the code
like layout/ content/ dom/ etc would have few I think.

The ideal is of course if there is a way for valgrind to find
the exact layout of structs and know which parts are padding.
Can it figure that out if the code is compiled with debug info?

In this case though it was an undefined byte value passed as
argument to a function... it should be possible to flag this
directly without false positives?

With eager checking...
1, it's easier to find the origin of the "use"
2, some "uses" will otherwise not be detected until they
   "might conceivably affect the outcome of your program's computation"
   and we want to fix the code before that happens

--track-origins helps a lot with 1 and maybe 2 isn't that common
(assuming that our tests have good code coverage), so I'm starting to
think that eager checking is perhaps less valuable than I first thought.

A feature to eagerly check integral values when passed as arguments to
a function might be worth it though?
http://hg.mozilla.org/mozilla-central/rev/cf8a92034d2a

-> FIXED
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
I tried turning on naive eager checking (checking even copies) once.  I got hundreds of errors reported for /bin/true.  Almost all structs have padding in them.  As for debug info... in theory it would help, but it gets complicated pretty quickly.  Reading all that extra debug info takes a lot more time and space as well.

This is an area we've given a lot of consideration to, trading-off performance and likelihood of false positives.  It's probably not going to change any time soon.  Lots of people were happy when --track-origins=yes was added :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: