Closed Bug 547964 Opened 15 years ago Closed 15 years ago

Potentially dangerous word-size error in definition of "nsnull"

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5
Tracking Status
blocking2.0 --- final+
blocking1.9.2 --- -
status1.9.2 --- wanted
status1.9.1 --- wanted

People

(Reporter: jseward, Assigned: jseward)

References

Details

(Keywords: 64bit, valgrind, Whiteboard: [sg:moderate])

Attachments

(2 files, 1 obsolete file)

Build m-c on 64-bit Linux (it must be 64-bit), start up with "valgrind --track-origins=yes" and head over to http://acid3.acidtests.org. Somewhere at around 70/100 the following appears Conditional jump or move depends on uninitialised value(s) at 0x5C461E6: nsSVGElement::GetAnimatedLengthValues(float*, ...) (nsSVGElement.cpp:1223) by 0x5C85861: nsSVGRectElement::ConstructPath(gfxContext*) (nsSVGRectElement.cpp:174) by 0x5C345E5: nsSVGPathGeometryFrame::GeneratePath(gfxContext*, gfxMatrix const*) (nsSVGPathGeometryFrame.cpp:502) by 0x5C34A4C: nsSVGPathGeometryFrame::UpdateCoveredRegion() (nsSVGPathGeometryFrame.cpp:254) by 0x5C33DD0: nsSVGOuterSVGFrame::UpdateAndInvalidateCoveredRegion(nsIFrame*) (nsSVGOuterSVGFrame.cpp:640) by 0x5C3A749: nsSVGUtils::UpdateGraphic(nsISVGChildFrame*) (nsSVGUtils.cpp:684) by 0x5C34F97: nsSVGPathGeometryFrame::DidSetStyleContext(nsStyleContext*) (nsSVGPathGeometryFrame.cpp:97) by 0x5782C7F: nsFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsFrame.cpp:366) by 0x5C2B537: nsSVGGeometryFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsSVGGeometryFrame.cpp:57) by 0x5713487: nsCSSFrameConstructor::InitAndRestoreFrame(nsFrameConstructorState const&, nsIContent*, nsIFrame*, nsIFrame*, nsIFrame*, int) (nsCSSFrameConstructor.cpp:4512) by 0x571A126: nsCSSFrameConstructor::ConstructFrameFromItemInternal(nsCSSFrameCon structor::FrameConstructionItem&, nsFrameConstructorState&, nsIFrame*, nsFrameItems&) (nsCSSFrameConstructor.cpp:3739) Uninitialised value was created by a stack allocation at 0x5C8580F: nsSVGRectElement::ConstructPath(gfxContext*) (nsSVGRectElement.cpp:171) It has to do with this call: void nsSVGRectElement::ConstructPath(gfxContext *aCtx) { float x, y, width, height, rx, ry; GetAnimatedLengthValues(&x, &y, &width, &height, &rx, &ry, nsnull); to a loop in nsSVGElement::GetAnimatedLengthValues(float *aFirst, ...) which iterates over this sequence of 6 float*'s: float *f = aFirst; ... va_list args; va_start(args, aFirst); while (f && i < info.mLengthCount) { ... *f = info.mLengths[i++].GetAnimValue(this); // or (ctx) ... f = va_arg(args, float*); } Valgrind complains about the null-check of "f" in "while (f && ..." which is very odd, because the argument list appears to be properly null-terminated by the use of nsnull at the call site. However ... Looking at the assembly of the call site, we have the nsnull written to the stack as a 32-bit store: c3e874: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp) c3e87b: 00 At first I thought this was a gcc bug. But no: it's because nsnull is defined as an integer which has no indication of the word size. Hence in the absence of any implicit applicable (C/C++) wordsize-promotion rules at use sites, I imagine it will just get treated as a 32-bit int. That's what has happened in this case. xpcom/base/nscore.h:358: /** * The preferred symbol for null. */ #define nsnull 0 So what's the upshot in this case? It is that the arg list is not (64-bit) null terminated, and so this loop -- which writes stack -- is terminated only by the second condition, i < info.mLengthCount. If info.mLengthCount is taken from network data then we have a potential remote stack overwrite scenario. So: valgrind complains correctly because the loop does a 64-bit null comparison, but only the lower half the sentinel value is defined. And, if we print the sequence of f's in the loop we get this f = 0x7fff21b5ae1c f = 0x7fff21b5ae18 f = 0x7fff21b5ae14 f = 0x7fff21b5ae10 f = 0x7fff21b5ae0c f = 0x7fff21b5ae08 f = 0x7f6b00000000 ^^^^^^^^ suspicious last value Changing the use of nsnull in the call site to a bona-fide value, "(void*)0", makes the printed output be as expected (last f = 0x0), and stops Valgrind complaining. As a final sanity check, putting this into the code: printf("QQQQQQQQQQQ sizeof(nsnull) = %d, sizeof(void*) = %d\n", sizeof(nsnull), sizeof(void*)); produces QQQQQQQQQQQ sizeof(nsnull) = 4, sizeof(void*) = 8 How many other places in 64-bit Fx are affected by having a nsnull which only happens to be 64-bit if we get lucky with the C++ automatic-operand-widening rules?
This is only a problem when it is being passed as varargs (and then dereferenced as a pointer), right? We could create a static-checking warning for this case (passing literal 0 as a vararg) pretty easily.
(In reply to comment #1) > This is only a problem when it is being passed as varargs (and then > dereferenced as a pointer), right? It's certainly a problem in the varargs case, but I wouldn't like to swear that there's not some other weird cases that are also affected. > We could create a static-checking warning > for this case (passing literal 0 as a vararg) pretty easily. The obvious fix would be to declare nsnull as an object of the correct size, perhaps (uintptr_t)0. That should remove all ambiguity, no?
Defining nsnull as (uintptr_t)0, (void*)0 or (char*)0 gives various compilation failures. Defining it simply as NULL works, though, at least on x86_64 Linux: --- a/xpcom/base/nscore.h Sun Feb 21 21:47:21 2010 -0500 +++ b/xpcom/base/nscore.h Tue Feb 23 17:47:16 2010 +0100 @@ -358,7 +358,7 @@ /** * The preferred symbol for null. */ -#define nsnull 0 +#define nsnull NULL #include "nsError.h"
blocking1.9.2: --- → ?
blocking2.0: --- → ?
status1.9.2: --- → ?
This patch changes "#define nsnull 0" to "#define nsnull NULL". If nsnull is an unadorned "0", it is a 32-bit constant even on 64-bit platforms. At least on LP64 (all 64-bit Unix) and LLP64 (win64). This is wrong, but we get away with it most of the time because in the two idiomatic cases T* p = nsnull; if (p == nsnull) ... C++'s type promotion rules for assignment and binary operators cause it to be widened to 64-bits before use. At least, that's my _best guess_ of why it works .. #include <disclaimer.h> Defining nsnull as (void*)0 or (char*)0 causes many compilation failures because C++ doesn't allow automatic conversion from these to arbitrary pointer types. Defining nsnull as (intptr_t)0 causes compilation failures because the relevant .h is not always included. It strikes me as safest to define nsnull simply as NULL, since it seems inconceivable that a C/C++ compiler would supply a definition for NULL which doesn't work properly. This is known to work (not break the build, + produce a usable browser) on x86_64-linux and x86-win32.
Assignee: nobody → jseward
Attachment #428895 - Flags: review?(benjamin)
I think an issue in the past, although maybe not anymore, was that there were some system headers that #defined NULL to be ((void*)0), which causes problems in C++. That might not be a real concern anymore. It looks like gcc defines NULL to its internal constant __null. The C++ standard, section 18.1 [lib.support.types], says that NULL is a null pointer constant that could be the same as 0 or 0L, and 4.10 [conv.ptr] defines a 'null pointer constant' without reference to its size. Either way, I don't think there's any guarantee that NULL is the size of a pointer. This patch happens to fix the SVG code for gcc, but I don't think there's any guarantee that it will for all compilers. It seems like it might be better to just make this SVG code explicitly cast to (void*). (I'd noticed this warning before, but didn't figure out why it was happening.)
(In reply to comment #5) > > The C++ > standard, section 18.1 [lib.support.types], says that NULL is a null pointer > constant that could be the same as 0 or 0L, and 4.10 [conv.ptr] defines a 'null > pointer constant' without reference to its size. > > Either way, I don't think there's any guarantee that NULL is the size of a > pointer. Interpreting "null pointer constant" as something that's not pointer-sized would be obtuse and I doubt any compiler does it. SpiderMonkey has static assertions, does NSPR? This would be a perfect place for one.
(In reply to comment #5) > Either way, I don't think there's any guarantee that NULL is the size of a > pointer. This patch happens to fix the SVG code for gcc, but I don't think > there's any guarantee that it will for all compilers. It seems like it might > be better to just make this SVG code explicitly cast to (void*). Or, really, ((float*)0). (Perhaps the code in question should have a macro for ((float*)0).) Alternatively, if we really want to ensure that sizeof(NULL)==sizeof(void*), we should have a PR_STATIC_ASSERT testing for that so that platforms on which it isn't true detect that there's a bug in the SVG code.
(In reply to comment #6) > Interpreting "null pointer constant" as something that's not pointer-sized > would be obtuse and I doubt any compiler does it. It looks like MSVC on Win64 does, #ifdef __cplusplus (though I'm not sure that the 64-bit version doesn't use different system headers).
(In reply to comment #5) > It seems like it might be better to just make this SVG code > explicitly cast to (void*). That's fixing the symptoms, not the root cause. If we did that we'd still have the burden of auditing all present and future uses of nsnull and adding casts where necessary.
(In reply to comment #8) > It looks like MSVC on Win64 does What does printf("%d\n", sizeof(NULL)) on Win64 produce? (I don't have a Win64 box to hand, else I'd check myself.)
Since we don't support 64-bit on 1.9.2 this isn't blocking, but we'd probably take the patch when reviewed and baked, so feel free to nominate.
blocking1.9.2: ? → -
Whiteboard: [sg:moderate]
I know we invented nsnull because people were using NULL and it was doing the wrong thing in many cases. Then again that was ten or twelve years ago, plus we no longer support a half a dozen unix variants each with their own compiler.
Keywords: 64bit, valgrind
Blocks: 549224
If hardwiring nsnull to be NULL is causing concern, a possible compromise is to first do the hardwiring, and then if it causes compilation failures on some mutant platform, move to #if EL_MUTANTO_PLATFORM # define nsnull 0 // as at present #else # define nsnull NULL #endif
Comment on attachment 428895 [details] [diff] [review] defines nsnull safely for 64-bit platforms This happens to work because GCC #defines NULL to __null, but this is not a general solution: C++ compilers are allowed to define it to "0" or "0L". As an expedient gcc-only solution, though, I guess this would be acceptable.
Attachment #428895 - Flags: review?(benjamin) → review?(dbaron)
(In reply to comment #9) > (In reply to comment #5) > > It seems like it might be better to just make this SVG code > > explicitly cast to (void*). > > That's fixing the symptoms, not the root cause. If we did that we'd > still have the burden of auditing all present and future uses of > nsnull and adding casts where necessary. Are there any such cases other than varargs?
(In reply to comment #10) > (In reply to comment #8) > > It looks like MSVC on Win64 does > > What does printf("%d\n", sizeof(NULL)) on Win64 produce? > (I don't have a Win64 box to hand, else I'd check myself.) From irc.mozilla.org, #developers: [2010-03-04 16:19:34] <dbaron> does anybody have a 64-bit windows build setup? [2010-03-04 16:20:50] <dbaron> if so, what does the program #include <stdio.h> /* newline */ int main(void) { printf("%u\n", unsigned(sizeof(NULL))); return 0; } print? [2010-03-04 16:21:27] <dbaron> well, to double-check, the program should probably really be: [2010-03-04 16:21:52] <dbaron> #include <stdio.h> /* newline */ int main(void) { printf("%u %u\n", unsigned(sizeof(NULL)), unsigned(sizeof(void*))); return 0; } [2010-03-04 16:28:11] <Bas> dbaron: ugh, seems I don't have the right tools in this Visual Studio install. [2010-03-04 16:31:22] <Bas> dbaron: Got it. [2010-03-04 16:31:45] <Bas> dbaron NULL is 4 (as expected since it'd preprocess to sizeof(0) aye?) and void* is 8.
(In reply to comment #13) > If hardwiring nsnull to be NULL is causing concern, a possible > compromise is to first do the hardwiring, and then if it causes > compilation failures on some mutant platform, move to > > #if EL_MUTANTO_PLATFORM > # define nsnull 0 // as at present > #else > # define nsnull NULL > #endif The issue isn't so much mutant platforms. It's rather that there are lots of system (etc.) headers that define NULL to various things, and which one you actually get depends on the order you include headers and which headers you include. For example, after searching through /usr/include/ on my Ubuntu 9.10 system, and testing the results of including only one file in a C++ program, I found that a C++ file that includes only: #include <linux/stddef.h> would end up with NULL defined as 0. (There are a few other headers that might have similar problems, or even the ((void*)0) problem, but I couldn't actually test the results of including them without messing with pkgconfig more than I wanted to right now.) Since the headers that are included vary from file to file, it's really hard to verify that NULL will be the right thing in every compilation unit.
(In reply to comment #15) > Are there any such cases other than varargs? I don't know. I'm not sure how I'd find out either.
This seems to be a swamp of far greater complexity than I had initially expected. Given that ... I would be more than happy to make up a patch which leaves nsnull exactly as it is, but casts the offending uses to void* in as many places as I can find them. Would that be generally considered a preferable solution?
I think it would. Casting to (float*) might fit better than (void*), but it doesn't matter much. I think the offending uses are simply all the callers of GetAnimatedLengthValues, which are listed at: http://mxr.mozilla.org/mozilla-central/search?string=GetAnimatedLengthValues
Comment on attachment 428895 [details] [diff] [review] defines nsnull safely for 64-bit platforms As we discussed on IRC, I don't think NULL fixes the problem. I think we should either fix this in the SVG code with casts to (float*) (relatively simple), or find an appropriate way to #ifdef for LLP64 so that we can make nsnull either 0LL or 0L.
Attachment #428895 - Flags: review?(dbaron) → review-
Sigh, it looks like there are there are actually three functions which suffer from this problem; they all use the same NULL-termination convention: void GetAnimatedLengthValues(float *aFirst, ...); void GetAnimatedNumberValues(float *aFirst, ...); void GetAnimatedIntegerValues(PRInt32 *aFirst, ...);
(In reply to comment #21) > or find an appropriate way to #ifdef for LLP64 so that we > can make nsnull either 0LL or 0L. Does this fit the bill? #if defined(_MSC_VER) && defined(_M_AMD64) # define nsnull 0LL #else # define nsnull 0L #endif It gives an appropriately sized value for both win64 and everything else (assuming win64 is the only 64-bit target which isn't LP64), and it doesn't rely on potentially broken or mis-sized definitions of NULL.
(In reply to comment #23) > Does this fit the bill? > > #if defined(_MSC_VER) && defined(_M_AMD64) Apparently this won't work on mingw64 in the future. Revised version: #include <stdio.h> #if defined(_WIN64) # define mynull 0LL #else # define mynull 0L #endif int main (int argc, char** argv) { printf("sizeof(void*) = %d, sizeof(mynull) = %d\n", (int)sizeof(void*), (int)sizeof(mynull)); return 0; } I tested on amd64-linux and x86-linux and Bas on IRC tried on win32 and win64. It produces the expected results x86-linux and win32 sizeof(void*) = 4, sizeof(mynull) = 4 amd64-linux and win64 sizeof(void*) = 8, sizeof(mynull) = 8 The gating directly on _WIN64 should mean this is ok even on mingw64, assuming it defines _WIN64.
Attached patch second trySplinter Review
Defines nsnull correctly on all platforms as per comment 24.
Attachment #428895 - Attachment is obsolete: true
(In reply to comment #26) Verified doesn't-break-the-build and produces-usable-browser on win32 and amd64-linux. Verified fixes-the-original-problem on amd64-linux.
Attachment #438058 - Flags: review?(dbaron)
Comment on attachment 438058 [details] [diff] [review] second try r=dbaron if you also add a: PR_STATIC_ASSERT(sizeof(void*) == sizeof(nsnull)); either: * somewhere in a C++ file in XPCOM, or * next to the implementation of nsSVGElement::GetAnimatedLengthValues (probably the latter makes more sense) (PR_STATIC_ASSERT requires that you #include "prlog.h", which is better not done from a .h file because of the other things in prlog.h.)
(In reply to comment #30) Re-verified as per comment 27. Also verified that the build breaks if the assert is changed to something stupid, eg PR_STATIC_ASSERT(sizeof(void*) == sizeof(nsnull)-1);
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
The instant problem seems to have been resolved, but I want to point out a (theoretical) way it could come back and bite us still, and explain a little of what's going on here. C++ makes a distinction between null pointer _values_ and null pointer _constants_. A null pointer value has some concrete pointer type and behaves as any other value of that type (so, in particular, it cannot be implicitly converted to some other pointer type) but it doesn't point to any object, is distinct from all pointers to objects of that type, etc. A null pointer _constant_ is "an integral constant expression rvalue of integer type that evaluates to zero". That's a lot of jargon, but basically 0, 0L, 0LL, 0U, and (1 - 1) are all valid null pointer constants, whereas 0.0 is not. Notice that it specifically says null pointer constants have _integer_ type, and it does not matter _which_ integer type it is. This is why ((void *)0) is not a null pointer constant - it is only the null pointer _value_ for the type void*. Now, a null pointer constant is implicitly converted to the appropriate null pointer value whenever it appears in a context that expects a pointer. That makes it safe to write a bare 0 almost everywhere one needs a null pointer - yes, even on L(L)P64 systems where "int" is not the same size, because the type of the integer constant with value 0 is irrelevant. There are only two exceptions that I see, having just skimmed the entire C++ standard for them: In overload resolution and template instantiation, where the compiler will prefer f(int) to f(void*), t<int> to t<void*>, etc. Notice that this will also fail with 'nsnull' as presently defined, or with any correct definition of NULL - the only way to ensure the compiler does the right thing is to write an explicit cast. In variadic function calls, because the arguments mapped to '...' are untyped and suffer the old K&R argument promotion rules; the 0 gets passed as its actual integer type, not as any pointer type. This was the problem in this bug. Technically speaking, in this case, the only way to ensure the compiler does the right thing is *also* to write an explicit cast at the call site - the patch checked in does not cover all possible problems. The remaining holes are: - Pointers to different types do not necessarily all have the same representation, or even the same size. However, AFAIK the last system that took advantage of this license was the Cray and/or the 80286 (depending which track of CPU development you pay attention to). I suspect we would have much bigger problems if we tried to port Mozilla to something where pointers weren't all the same. - Pointers do not necessarily follow the same argument passing convention as integers. I *do* think we need to worry about this. A realistic scenario is a chip similar to the 68000, which makes a distinction between address and data registers, and passes pointers in the former but integers in the latter. This design concept is still alive and well in the embedded space (heck, the 68k itself is still used there). So I would have gone for explicit casts at all call sites, rather than mucking with the definition of nsnull. But then, I would also argue for s/nsnull/0/ throughout the code base (not entirely a joke).
Also, I would argue very strongly against using variadic functions. They are almost always a bad idea, because they prevent the compiler from doing any type checking for you, and there are almost always better solutions. In this case, for example, GetAnimatedLengthValues could easily be rewritten to take an array argument with a length parameter, and the callers would need little modification. Or, if we really don't want to touch the callers, we could easily rewrite several overloads each with a different number of arguments, and use an array based underlying implementation for all of them.
Depends on: 560582
The assert causes a build regression, see Bug 565607
blocking2.0: ? → final+
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: