Closed
Bug 547964
Opened 15 years ago
Closed 15 years ago
Potentially dangerous word-size error in definition of "nsnull"
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a5
People
(Reporter: jseward, Assigned: jseward)
References
Details
(Keywords: 64bit, valgrind, Whiteboard: [sg:moderate])
Attachments
(2 files, 1 obsolete file)
|
801 bytes,
patch
|
dbaron
:
review+
|
Details | Diff | Splinter Review |
|
1.79 KB,
patch
|
Details | Diff | Splinter Review |
Build m-c on 64-bit Linux (it must be 64-bit), start up with
"valgrind --track-origins=yes" and head over to
http://acid3.acidtests.org.
Somewhere at around 70/100 the following appears
Conditional jump or move depends on uninitialised value(s)
at 0x5C461E6: nsSVGElement::GetAnimatedLengthValues(float*, ...)
(nsSVGElement.cpp:1223)
by 0x5C85861: nsSVGRectElement::ConstructPath(gfxContext*)
(nsSVGRectElement.cpp:174)
by 0x5C345E5: nsSVGPathGeometryFrame::GeneratePath(gfxContext*,
gfxMatrix const*) (nsSVGPathGeometryFrame.cpp:502)
by 0x5C34A4C: nsSVGPathGeometryFrame::UpdateCoveredRegion()
(nsSVGPathGeometryFrame.cpp:254)
by 0x5C33DD0: nsSVGOuterSVGFrame::UpdateAndInvalidateCoveredRegion(nsIFrame*)
(nsSVGOuterSVGFrame.cpp:640)
by 0x5C3A749: nsSVGUtils::UpdateGraphic(nsISVGChildFrame*) (nsSVGUtils.cpp:684)
by 0x5C34F97: nsSVGPathGeometryFrame::DidSetStyleContext(nsStyleContext*)
(nsSVGPathGeometryFrame.cpp:97)
by 0x5782C7F: nsFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsFrame.cpp:366)
by 0x5C2B537: nsSVGGeometryFrame::Init(nsIContent*, nsIFrame*,
nsIFrame*) (nsSVGGeometryFrame.cpp:57)
by 0x5713487: nsCSSFrameConstructor::InitAndRestoreFrame(nsFrameConstructorState
const&, nsIContent*, nsIFrame*, nsIFrame*, nsIFrame*, int)
(nsCSSFrameConstructor.cpp:4512)
by 0x571A126: nsCSSFrameConstructor::ConstructFrameFromItemInternal(nsCSSFrameCon
structor::FrameConstructionItem&, nsFrameConstructorState&,
nsIFrame*, nsFrameItems&) (nsCSSFrameConstructor.cpp:3739)
Uninitialised value was created by a stack allocation
at 0x5C8580F: nsSVGRectElement::ConstructPath(gfxContext*) (nsSVGRectElement.cpp:171)
It has to do with this call:
void
nsSVGRectElement::ConstructPath(gfxContext *aCtx)
{
float x, y, width, height, rx, ry;
GetAnimatedLengthValues(&x, &y, &width, &height, &rx, &ry, nsnull);
to a loop in nsSVGElement::GetAnimatedLengthValues(float *aFirst, ...)
which iterates over this sequence of 6 float*'s:
float *f = aFirst;
...
va_list args;
va_start(args, aFirst);
while (f && i < info.mLengthCount) {
...
*f = info.mLengths[i++].GetAnimValue(this); // or (ctx)
...
f = va_arg(args, float*);
}
Valgrind complains about the null-check of "f" in "while (f && ..."
which is very odd, because the argument list appears to be properly
null-terminated by the use of nsnull at the call site. However ...
Looking at the assembly of the call site, we have the nsnull written
to the stack as a 32-bit store:
c3e874: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp)
c3e87b: 00
At first I thought this was a gcc bug. But no: it's because
nsnull is defined as an integer which has no indication of the
word size. Hence in the absence of any implicit
applicable (C/C++) wordsize-promotion rules at use sites, I
imagine it will just get treated as a 32-bit int. That's what
has happened in this case. xpcom/base/nscore.h:358:
/**
* The preferred symbol for null.
*/
#define nsnull 0
So what's the upshot in this case? It is that the arg list is not
(64-bit) null terminated, and so this loop -- which writes stack --
is terminated only by the second condition, i < info.mLengthCount.
If info.mLengthCount is taken from network data then we have a
potential remote stack overwrite scenario.
So: valgrind complains correctly because the loop does a 64-bit
null comparison, but only the lower half the sentinel value is
defined. And, if we print the sequence of f's in the loop we
get this
f = 0x7fff21b5ae1c
f = 0x7fff21b5ae18
f = 0x7fff21b5ae14
f = 0x7fff21b5ae10
f = 0x7fff21b5ae0c
f = 0x7fff21b5ae08
f = 0x7f6b00000000
^^^^^^^^ suspicious last value
Changing the use of nsnull in the call site to a bona-fide value,
"(void*)0", makes the printed output be as expected (last f = 0x0),
and stops Valgrind complaining.
As a final sanity check, putting this into the code:
printf("QQQQQQQQQQQ sizeof(nsnull) = %d, sizeof(void*) = %d\n",
sizeof(nsnull), sizeof(void*));
produces
QQQQQQQQQQQ sizeof(nsnull) = 4, sizeof(void*) = 8
How many other places in 64-bit Fx are affected by having a
nsnull which only happens to be 64-bit if we get lucky with the
C++ automatic-operand-widening rules?
Comment 1•15 years ago
|
||
This is only a problem when it is being passed as varargs (and then dereferenced as a pointer), right? We could create a static-checking warning for this case (passing literal 0 as a vararg) pretty easily.
| Assignee | ||
Comment 2•15 years ago
|
||
(In reply to comment #1)
> This is only a problem when it is being passed as varargs (and then
> dereferenced as a pointer), right?
It's certainly a problem in the varargs case, but I wouldn't like
to swear that there's not some other weird cases that are also affected.
> We could create a static-checking warning
> for this case (passing literal 0 as a vararg) pretty easily.
The obvious fix would be to declare nsnull as an object of the
correct size, perhaps (uintptr_t)0. That should remove all
ambiguity, no?
| Assignee | ||
Comment 3•15 years ago
|
||
Defining nsnull as (uintptr_t)0, (void*)0 or (char*)0 gives
various compilation failures. Defining it simply as NULL works,
though, at least on x86_64 Linux:
--- a/xpcom/base/nscore.h Sun Feb 21 21:47:21 2010 -0500
+++ b/xpcom/base/nscore.h Tue Feb 23 17:47:16 2010 +0100
@@ -358,7 +358,7 @@
/**
* The preferred symbol for null.
*/
-#define nsnull 0
+#define nsnull NULL
#include "nsError.h"
Updated•15 years ago
|
| Assignee | ||
Comment 4•15 years ago
|
||
This patch changes "#define nsnull 0" to "#define nsnull NULL".
If nsnull is an unadorned "0", it is a 32-bit constant even on 64-bit
platforms. At least on LP64 (all 64-bit Unix) and LLP64 (win64).
This is wrong, but we get away with it most of the time because in
the two idiomatic cases
T* p = nsnull;
if (p == nsnull) ...
C++'s type promotion rules for assignment and binary operators cause
it to be widened to 64-bits before use. At least, that's my _best
guess_ of why it works .. #include <disclaimer.h>
Defining nsnull as (void*)0 or (char*)0 causes many compilation
failures because C++ doesn't allow automatic conversion from these to
arbitrary pointer types.
Defining nsnull as (intptr_t)0 causes compilation failures because
the relevant .h is not always included.
It strikes me as safest to define nsnull simply as NULL, since it
seems inconceivable that a C/C++ compiler would supply a definition
for NULL which doesn't work properly.
This is known to work (not break the build, + produce a usable
browser) on x86_64-linux and x86-win32.
Assignee: nobody → jseward
| Assignee | ||
Updated•15 years ago
|
Attachment #428895 -
Flags: review?(benjamin)
I think an issue in the past, although maybe not anymore, was that there were some system headers that #defined NULL to be ((void*)0), which causes problems in C++. That might not be a real concern anymore.
It looks like gcc defines NULL to its internal constant __null. The C++ standard, section 18.1 [lib.support.types], says that NULL is a null pointer constant that could be the same as 0 or 0L, and 4.10 [conv.ptr] defines a 'null pointer constant' without reference to its size.
Either way, I don't think there's any guarantee that NULL is the size of a pointer. This patch happens to fix the SVG code for gcc, but I don't think there's any guarantee that it will for all compilers. It seems like it might be better to just make this SVG code explicitly cast to (void*).
(I'd noticed this warning before, but didn't figure out why it was happening.)
Comment 6•15 years ago
|
||
(In reply to comment #5)
>
> The C++
> standard, section 18.1 [lib.support.types], says that NULL is a null pointer
> constant that could be the same as 0 or 0L, and 4.10 [conv.ptr] defines a 'null
> pointer constant' without reference to its size.
>
> Either way, I don't think there's any guarantee that NULL is the size of a
> pointer.
Interpreting "null pointer constant" as something that's not pointer-sized would be obtuse and I doubt any compiler does it. SpiderMonkey has static assertions, does NSPR? This would be a perfect place for one.
(In reply to comment #5)
> Either way, I don't think there's any guarantee that NULL is the size of a
> pointer. This patch happens to fix the SVG code for gcc, but I don't think
> there's any guarantee that it will for all compilers. It seems like it might
> be better to just make this SVG code explicitly cast to (void*).
Or, really, ((float*)0). (Perhaps the code in question should have a macro for ((float*)0).)
Alternatively, if we really want to ensure that sizeof(NULL)==sizeof(void*), we should have a PR_STATIC_ASSERT testing for that so that platforms on which it isn't true detect that there's a bug in the SVG code.
(In reply to comment #6)
> Interpreting "null pointer constant" as something that's not pointer-sized
> would be obtuse and I doubt any compiler does it.
It looks like MSVC on Win64 does, #ifdef __cplusplus (though I'm not sure that the 64-bit version doesn't use different system headers).
| Assignee | ||
Comment 9•15 years ago
|
||
(In reply to comment #5)
> It seems like it might be better to just make this SVG code
> explicitly cast to (void*).
That's fixing the symptoms, not the root cause. If we did that we'd
still have the burden of auditing all present and future uses of
nsnull and adding casts where necessary.
| Assignee | ||
Comment 10•15 years ago
|
||
(In reply to comment #8)
> It looks like MSVC on Win64 does
What does printf("%d\n", sizeof(NULL)) on Win64 produce?
(I don't have a Win64 box to hand, else I'd check myself.)
Comment 11•15 years ago
|
||
Since we don't support 64-bit on 1.9.2 this isn't blocking, but we'd probably take the patch when reviewed and baked, so feel free to nominate.
blocking1.9.2: ? → -
Updated•15 years ago
|
status1.9.1:
--- → wanted
Whiteboard: [sg:moderate]
Comment 12•15 years ago
|
||
I know we invented nsnull because people were using NULL and it was doing the wrong thing in many cases. Then again that was ten or twelve years ago, plus we no longer support a half a dozen unix variants each with their own compiler.
Updated•15 years ago
|
| Assignee | ||
Comment 13•15 years ago
|
||
If hardwiring nsnull to be NULL is causing concern, a possible
compromise is to first do the hardwiring, and then if it causes
compilation failures on some mutant platform, move to
#if EL_MUTANTO_PLATFORM
# define nsnull 0 // as at present
#else
# define nsnull NULL
#endif
Comment 14•15 years ago
|
||
Comment on attachment 428895 [details] [diff] [review]
defines nsnull safely for 64-bit platforms
This happens to work because GCC #defines NULL to __null, but this is not a general solution: C++ compilers are allowed to define it to "0" or "0L".
As an expedient gcc-only solution, though, I guess this would be acceptable.
Attachment #428895 -
Flags: review?(benjamin) → review?(dbaron)
(In reply to comment #9)
> (In reply to comment #5)
> > It seems like it might be better to just make this SVG code
> > explicitly cast to (void*).
>
> That's fixing the symptoms, not the root cause. If we did that we'd
> still have the burden of auditing all present and future uses of
> nsnull and adding casts where necessary.
Are there any such cases other than varargs?
(In reply to comment #10)
> (In reply to comment #8)
> > It looks like MSVC on Win64 does
>
> What does printf("%d\n", sizeof(NULL)) on Win64 produce?
> (I don't have a Win64 box to hand, else I'd check myself.)
From irc.mozilla.org, #developers:
[2010-03-04 16:19:34] <dbaron> does anybody have a 64-bit windows build setup?
[2010-03-04 16:20:50] <dbaron> if so, what does the program #include <stdio.h> /* newline */ int main(void) { printf("%u\n", unsigned(sizeof(NULL))); return 0; } print?
[2010-03-04 16:21:27] <dbaron> well, to double-check, the program should probably really be:
[2010-03-04 16:21:52] <dbaron> #include <stdio.h> /* newline */ int main(void) { printf("%u %u\n", unsigned(sizeof(NULL)), unsigned(sizeof(void*))); return 0; }
[2010-03-04 16:28:11] <Bas> dbaron: ugh, seems I don't have the right tools in this Visual Studio install.
[2010-03-04 16:31:22] <Bas> dbaron: Got it.
[2010-03-04 16:31:45] <Bas> dbaron NULL is 4 (as expected since it'd preprocess to sizeof(0) aye?) and void* is 8.
(In reply to comment #13)
> If hardwiring nsnull to be NULL is causing concern, a possible
> compromise is to first do the hardwiring, and then if it causes
> compilation failures on some mutant platform, move to
>
> #if EL_MUTANTO_PLATFORM
> # define nsnull 0 // as at present
> #else
> # define nsnull NULL
> #endif
The issue isn't so much mutant platforms. It's rather that there are lots of system (etc.) headers that define NULL to various things, and which one you actually get depends on the order you include headers and which headers you include.
For example, after searching through /usr/include/ on my Ubuntu 9.10 system, and testing the results of including only one file in a C++ program, I found that a C++ file that includes only:
#include <linux/stddef.h>
would end up with NULL defined as 0. (There are a few other headers that might have similar problems, or even the ((void*)0) problem, but I couldn't actually test the results of including them without messing with pkgconfig more than I wanted to right now.)
Since the headers that are included vary from file to file, it's really hard to verify that NULL will be the right thing in every compilation unit.
| Assignee | ||
Comment 18•15 years ago
|
||
(In reply to comment #15)
> Are there any such cases other than varargs?
I don't know. I'm not sure how I'd find out either.
| Assignee | ||
Comment 19•15 years ago
|
||
This seems to be a swamp of far greater complexity than I had
initially expected. Given that ... I would be more than happy to make
up a patch which leaves nsnull exactly as it is, but casts the
offending uses to void* in as many places as I can find them.
Would that be generally considered a preferable solution?
I think it would. Casting to (float*) might fit better than (void*), but it doesn't matter much.
I think the offending uses are simply all the callers of GetAnimatedLengthValues, which are listed at:
http://mxr.mozilla.org/mozilla-central/search?string=GetAnimatedLengthValues
Comment on attachment 428895 [details] [diff] [review]
defines nsnull safely for 64-bit platforms
As we discussed on IRC, I don't think NULL fixes the problem.
I think we should either fix this in the SVG code with casts to (float*) (relatively simple), or find an appropriate way to #ifdef for LLP64 so that we can make nsnull either 0LL or 0L.
Attachment #428895 -
Flags: review?(dbaron) → review-
| Assignee | ||
Comment 22•15 years ago
|
||
Sigh, it looks like there are there are actually three functions which
suffer from this problem; they all use the same NULL-termination convention:
void GetAnimatedLengthValues(float *aFirst, ...);
void GetAnimatedNumberValues(float *aFirst, ...);
void GetAnimatedIntegerValues(PRInt32 *aFirst, ...);
| Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #21)
> or find an appropriate way to #ifdef for LLP64 so that we
> can make nsnull either 0LL or 0L.
Does this fit the bill?
#if defined(_MSC_VER) && defined(_M_AMD64)
# define nsnull 0LL
#else
# define nsnull 0L
#endif
It gives an appropriately sized value for both win64 and everything
else (assuming win64 is the only 64-bit target which isn't LP64), and
it doesn't rely on potentially broken or mis-sized definitions of NULL.
| Assignee | ||
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Does this fit the bill?
>
> #if defined(_MSC_VER) && defined(_M_AMD64)
Apparently this won't work on mingw64 in the future. Revised version:
#include <stdio.h>
#if defined(_WIN64)
# define mynull 0LL
#else
# define mynull 0L
#endif
int main (int argc, char** argv) {
printf("sizeof(void*) = %d, sizeof(mynull) = %d\n",
(int)sizeof(void*), (int)sizeof(mynull));
return 0;
}
I tested on amd64-linux and x86-linux and Bas on IRC tried on
win32 and win64. It produces the expected results
x86-linux and win32
sizeof(void*) = 4, sizeof(mynull) = 4
amd64-linux and win64
sizeof(void*) = 8, sizeof(mynull) = 8
The gating directly on _WIN64 should mean this is ok even on mingw64,
assuming it defines _WIN64.
| Assignee | ||
Comment 26•15 years ago
|
||
Defines nsnull correctly on all platforms as per comment 24.
Attachment #428895 -
Attachment is obsolete: true
| Assignee | ||
Comment 27•15 years ago
|
||
(In reply to comment #26)
Verified doesn't-break-the-build and produces-usable-browser
on win32 and amd64-linux. Verified fixes-the-original-problem
on amd64-linux.
| Assignee | ||
Updated•15 years ago
|
Attachment #438058 -
Flags: review?(dbaron)
Attachment #438058 -
Flags: review?(dbaron) → review+
Comment on attachment 438058 [details] [diff] [review]
second try
r=dbaron if you also add a:
PR_STATIC_ASSERT(sizeof(void*) == sizeof(nsnull));
either:
* somewhere in a C++ file in XPCOM, or
* next to the implementation of nsSVGElement::GetAnimatedLengthValues
(probably the latter makes more sense)
(PR_STATIC_ASSERT requires that you #include "prlog.h", which is better not done from a .h file because of the other things in prlog.h.)
| Assignee | ||
Comment 30•15 years ago
|
||
| Assignee | ||
Comment 31•15 years ago
|
||
(In reply to comment #30)
Re-verified as per comment 27. Also verified that the build breaks
if the assert is changed to something stupid, eg
PR_STATIC_ASSERT(sizeof(void*) == sizeof(nsnull)-1);
| Assignee | ||
Updated•15 years ago
|
Keywords: checkin-needed
Comment 32•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Comment 33•15 years ago
|
||
The instant problem seems to have been resolved, but I want to point out a (theoretical) way it could come back and bite us still, and explain a little of what's going on here.
C++ makes a distinction between null pointer _values_ and null pointer _constants_. A null pointer value has some concrete pointer type and behaves as any other value of that type (so, in particular, it cannot be implicitly converted to some other pointer type) but it doesn't point to any object, is distinct from all pointers to objects of that type, etc.
A null pointer _constant_ is "an integral constant expression rvalue of integer type that evaluates to zero". That's a lot of jargon, but basically 0, 0L, 0LL, 0U, and (1 - 1) are all valid null pointer constants, whereas 0.0 is not. Notice that it specifically says null pointer constants have _integer_ type, and it does not matter _which_ integer type it is. This is why ((void *)0) is not a null pointer constant - it is only the null pointer _value_ for the type void*.
Now, a null pointer constant is implicitly converted to the appropriate null pointer value whenever it appears in a context that expects a pointer. That makes it safe to write a bare 0 almost everywhere one needs a null pointer - yes, even on L(L)P64 systems where "int" is not the same size, because the type of the integer constant with value 0 is irrelevant. There are only two exceptions that I see, having just skimmed the entire C++ standard for them:
In overload resolution and template instantiation, where the compiler will prefer f(int) to f(void*), t<int> to t<void*>, etc. Notice that this will also fail with 'nsnull' as presently defined, or with any correct definition of NULL - the only way to ensure the compiler does the right thing is to write an explicit cast.
In variadic function calls, because the arguments mapped to '...' are untyped and suffer the old K&R argument promotion rules; the 0 gets passed as its actual integer type, not as any pointer type. This was the problem in this bug. Technically speaking, in this case, the only way to ensure the compiler does the right thing is *also* to write an explicit cast at the call site - the patch checked in does not cover all possible problems. The remaining holes are:
- Pointers to different types do not necessarily all have the same representation, or even the same size. However, AFAIK the last system that took advantage of this license was the Cray and/or the 80286 (depending which track of CPU development you pay attention to). I suspect we would have much bigger problems if we tried to port Mozilla to something where pointers weren't all the same.
- Pointers do not necessarily follow the same argument passing convention as integers. I *do* think we need to worry about this. A realistic scenario is a chip similar to the 68000, which makes a distinction between address and data registers, and passes pointers in the former but integers in the latter. This design concept is still alive and well in the embedded space (heck, the 68k itself is still used there).
So I would have gone for explicit casts at all call sites, rather than mucking with the definition of nsnull.
But then, I would also argue for s/nsnull/0/ throughout the code base (not entirely a joke).
Comment 34•15 years ago
|
||
Also, I would argue very strongly against using variadic functions. They are almost always a bad idea, because they prevent the compiler from doing any type checking for you, and there are almost always better solutions.
In this case, for example, GetAnimatedLengthValues could easily be rewritten to take an array argument with a length parameter, and the callers would need little modification. Or, if we really don't want to touch the callers, we could easily rewrite several overloads each with a different number of arguments, and use an array based underlying implementation for all of them.
Comment 35•15 years ago
|
||
The assert causes a build regression, see Bug 565607
Updated•15 years ago
|
blocking2.0: ? → final+
Updated•13 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•