Closed Bug 526038 Opened 15 years ago Closed 14 years ago

win/nsIconChannel.cpp miscompiled by -O2 optimization

Categories

(Core :: Graphics, defect)

1.9.2 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: swsnyder, Unassigned)

Details

Attachments

(5 files, 1 obsolete file)

With the source files from the Firefox 3.6b1 tarball, compiling ~/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp with the -O2 optimization level (VS2005/SP1) causes a runtime protection fault when a null pointer is dereferenced.

The fault occurs here, on the assignment:

http://mxr.mozilla.org/mozilla1.9.2/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#644

Example:

--- mozilla/modules/libpr0n/decoders/icon/win/Makefile.in.original      2009-10-29 18:11:06.000000000 -0400
+++ mozilla/modules/libpr0n/decoders/icon/win/Makefile.in       2009-11-02 18:07:24.000000000 -0500
@@ -57,6 +57,7 @@

 # we don't want the shared lib, but we want to force the creation of a static lib.
 FORCE_STATIC_LIB = 1

 include $(topsrcdir)/config/rules.mk
+OS_COMPILE_CXXFLAGS += -O2
Does the same problem exist in VS2008?
(In reply to comment #1)
> Does the same problem exist in VS2008?

Unknown; I only have VS2005/SP1 (Win7SDK).
I don't understand -- is the variable actually null?  If so, that line will indeed cause a crash.
The variable _retval is defined as "nsIInputStream**", a pointer to a pointer.  _retval itself is not null, but the pointer that it points to is null.

It is on this 2nd dereferencing that the fault occurs.  It'll take me a couple of hours to come up with the exact x86 asm code, but it will look a lot like this:

mov eax, _retval        ; load valid pointer
mov ecx, [eax]          ; load null pointer
mov [ecx], inStream     ; <--- protection fault occurs here, on deref
Where that should be

mov [eax], inStream ?

I can believe a compiler bug, but it's just that this sequence appears all over the place.. (the NS_ADDREF(...)) stuff.  Could be something earlier tripping it up.
This graphic is a screen shot of the VS2005 debugger at the time of the fault.  It probably conveys the information better than my description.

Speaking of inadequate descriptions, I neglected to describe how to trigger the fault.  Select Tools>>Options... from the Firefox 3.6b1 menu.  The fault occurs in the course of displaying the 1st tab in the Options dialog box.

No problem seen when win/nsIconChannel.cpp is compiled with -O1 optimization.
see Windows Event Viewer, The fault occurs at C Runtime(mozcrt19.dll, msvcr80.dll, msvcr90.dll).

same problem ?
http://forums.mozillazine.org/viewtopic.php?f=42&t=1708215

(In reply to comment #1)
> Does the same problem exist in VS2008?
I tried ...
good: VS2005(jemalloc, no PGO)
bad : VS2008(jemalloc, no PGO)
bad : VS2005 express(no jemalloc, no PGO)
bad : VS2008 express(no jemalloc, no PGO)
(In reply to comment #7)
> Created an attachment (id=434854) [details]
> use memmove(), no memcpy()

I don't see from the code that the source and destination buffers overlap, so how does memmove() work better than memcpy()?
(In reply to comment #8)
> I don't see from the code that the source and destination buffers overlap,
Me too. But I have not verified it yet.

I think ...
1. This problem is caused by a compiler bug.
2. When buffers overlapping, memcpy() is not stable.
3. So, can not predict what will happen.
4. Then, at least, memmove() should be used for safety.
Attachment #434854 - Flags: review?(joe)
Comment on attachment 434854 [details] [diff] [review]
use memmove(), no memcpy()

These arrays aren't even close; one is on the heap, the other the stack. I'm going to need some convincing to switch from memcpy to memmove. (Other than "this works.")
Attachment #434854 - Flags: review?(joe) → review-
I think I'm seeing this with SeaMonkey trunk as well (self-compiled build):
<http://crash-stats.mozilla.com/report/index/26aad785-e7b8-41e8-9f65-b3ec32100604>
Found this bug through:
<http://forums.mozillazine.org/viewtopic.php?p=8510915#p8510915>

Note:
1. SeaMonkey doesn't build with libxul (yet).
2. The SeaMonkey Download Manager shows an icon matching the download's file type and a trash can icon for each download.
3. If there is no entry in the Download Manager: no crash. If a new download is added or the Download Manager is opened and contains entries from previous runs (e.g. profile previously used with official nightlies): crash. 100% reproducible.

I'm using --enable-optimize="-O2 -GALFT -GS- -Gs -Zc:wchar_t- -fp:fast -arch:SSE".
(In reply to comment #11)
> I'm using --enable-optimize="-O2 -GALFT -GS- -Gs -Zc:wchar_t- -fp:fast
> -arch:SSE".

This "fixes" the reported problem by forcing -O1 optimization for the icon code only, overriding the -O2 switch in your global opt flags:

diff -Nru mozilla.original/modules/libpr0n/decoders/icon/win/Makefile.in mozilla/modules/libpr0n/decoders/icon/win/Makefile.in
--- mozilla.original/modules/libpr0n/decoders/icon/win/Makefile.in      2010-01-05 22:35:27.000000000 -0500
+++ mozilla/modules/libpr0n/decoders/icon/win/Makefile.in       2010-01-07 10:03:20.000000000 -0500
@@ -59,4 +59,5 @@
 FORCE_STATIC_LIB = 1

 include $(topsrcdir)/config/rules.mk
+OS_COMPILE_CXXFLAGS += -O1
(In reply to comment #12)
> (In reply to comment #11)
> > I'm using --enable-optimize="-O2 -GALFT -GS- -Gs -Zc:wchar_t- -fp:fast
> > -arch:SSE".
> 
> This "fixes" the reported problem by forcing -O1 optimization for the icon code
> only, overriding the -O2 switch in your global opt flags:

Confirmed, thanks! I first tried removing the L from GALFT but that only changed the stack trace (so that it ended in msvcr80.dll/memcpy instead of imgicon.dll/imgicon.dll@0x2d95).
Comment on attachment 409839 [details]
JPEG showing state of code at fault

This screenshot shows _retval is 0x00000000, which is bad. But you'd need to be able to walk the stack to find out why it is zero.
On VS2005, the source code of memcpy and memmove in C-runtime library is the same when BLD_ASM is set to the value 1.

Although the compiler may replace the function call of memcpy with inline instructions, if it will result in better performance.
(In reply to comment #15)
> On VS2005, the source code of memcpy and memmove in C-runtime library is the
> same when BLD_ASM is set to the value 1.

Sorry, original C-runtime libraries (CRTs) have been built with BLD_ASM=0.

My private builds have always used CRTs which are built with BLD_ASM=1. Even if using such CRTs, the builds often crashed when I opened "Options" dialog and clicked "Applications" button.

The crash has not occurred yet since I disabled the optimization of nsIconChannel::MakeInputStream() about 3 months ago in the following way.

#ifdef _MSC_VER
#pragma optimize("", off)
#endif
nsresult nsIconChannel::MakeInputStream(nsIInputStream** _retval, PRBool nonBlocking)
{
 (snip)
}
#ifdef _MSC_VER
#pragma optimize("", on)
#endif
(In reply to comment #15)
I had the wrong idea about BLD_ASM. BLD_ASM is used to specify whether to rebuild the assembler sources.
Attached patch use #pragma function(memcpy) (obsolete) — Splinter Review
inline memcpy causes a crash.
memmove is not inline instruction.
and /O1 not contains /Oi, /O2 contains /Oi.
Attachment #459836 - Flags: review?(joe)
Comment on attachment 459836 [details] [diff] [review]
use #pragma function(memcpy)

I'm okay with this (with some added comments to explain them), but first I'd like to see precisely where things go wrong - a comparison, for example, of the -O1 and the -O2 generated asm, and where it goes wrong. Once I have that I'll be more than happy to grant this r+.

(Probably we should put these #pragmas in #ifdefs for the precise version of MSVC that is broken, too.)
Attachment #459836 - Flags: review?(joe)
VS2005: crash
VS2008: crash
VS2010: icon disappeared

asm code is created by VS2005.
but I still don't understand asm code. sorry.
Attachment #459836 - Attachment is obsolete: true
Attachment #461212 - Flags: review?(joe)
Comment on attachment 461212 [details] [diff] [review]
use #pragma function(memcpy)

This is going to require some pretty in-depth ASM debugging, which I don't have the time to do right now :(

(The reason I don't just r+ this and let it in is because I worry that we've got a latent bug that just happens to be triggered on various MSVC versions. Until we can prove that's not the case, I don't want to be papering over things like this.)
Attachment #461212 - Flags: review?(joe)
I fixed a memory corruption issue with this code in bug 590116. Could someone confirm whether this still happens?
The problem I reported in comment 11 (SM trunk) is gone, so this is WFM for me, but Steve reported the bug for 1.9.2 and needs to verify his case (FF 3.6 branch) from comment 6.
(In reply to comment #25)
> The problem I reported in comment 11 (SM trunk) is gone, so this is WFM for me,
> but Steve reported the bug for 1.9.2 and needs to verify his case (FF 3.6
> branch) from comment 6.

I can no longer reproduce the problem I reported, so I can't test your fix.

This is with the v3.6.13 Build #3 tarball.

Beyond using the contemporary Firefox code, I am also using a different Windows SDK.  For the original report I was using the WinSDK v6.0a that came with VS2005.  I'm still using the same compiler, but now with WinSDK v7.0.

These 2 changes are all I can think of in the past 13 months to explain why I can no longer miscompile nsIconChannel::MakeInputStream() at -O2 optimization.
In that case it's very likely bug 590116 fixed it.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
(In reply to comment #27)
> In that case it's very likely bug 590116 fixed it.

Oh, OK.  I didn't realize that the fix was already checked in for v1.9.2.x.

I was trying to miscompile the code as a baseline to testing what I thought was a proposed fix.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: