Closed Bug 137478 Opened 22 years ago Closed 20 years ago

win32 crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]

Categories

(Core :: Graphics: ImageLib, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: pavlov)

References

Details

(Keywords: crash)

Attachments

(2 files, 2 obsolete files)

I've run into a crash in '-O2' optimized builds on msvc/win32.

The strange thing is that it happens on one win2k machine, but not
another. Here's the state of affairs:

1) build on machine A with '-O2' -- crashes as soon as it sees a JPEG
2) build again on machine A wiht '-O1' -- no crash
3) run build from step (1) on machine B -- no crash
4) build on machine B with '-O2' -- no crash when run on machine B
5) run build from (4) on machine A -- crash on any JPEG, when run on A

I also have another win98 system building with '-O2' and it doesn't crash
either. [Obviously, a debug, unoptimized build doesn't crash either].

All machines are running MSVC 6.0 with Service Pack 5 and the MSVC++ Processor
Pack for that service pack.

One thing I should note is that it is the newest (and fastest) machine which
crashes, which I suppose points to some asm or processor problem.  Let me know
what details you need on the exact processor being used on the crashing
system.

The crash is inn jpeg\jdapimin.c: jpeg_consume_input()
 
 switch (cinfo->global_state) {
  case DSTATE_START:
    /* Start-of-datastream actions: reset appropriate modules */
    (*cinfo->inputctl->reset_input_controller) (cinfo);
    /* Initialize application's data source module */
    (*cinfo->src->init_source) (cinfo);                  //XXXX<-- crash here
    cinfo->global_state = DSTATE_INHEADER;
    /*FALLTHROUGH*/

and |*cinfo->src| says that it has value "0x02" in the debugger (but some
inspection of the actually assembly and registers would confirm whether that
value was correct, or whether that was the kind of bogus value that the msvc
debugger gives sometimes with optimized builds).


Stack trace:

jpeg_consume_input(jpeg_decompress_struct * 0x01c0d240) line 314 + 4 bytes
jpeg_read_header(jpeg_decompress_struct * 0x01c0d240, unsigned char 1) line 268
nsJPEGDecoder::WriteFrom(nsJPEGDecoder * const 0x01c0d220, nsIInputStream * 
0x029bd3a8, unsigned int 4008, unsigned int * 0x0012fc6c) line 229 + 11 bytes
imgRequest::OnDataAvailable(imgRequest * const 0x10068c62, nsIRequest * 
0x029bd370, nsISupports * 0x1002a542, nsIInputStream * 0x029bd3a8, unsigned int 
26428578, unsigned int 43775256) line 751
nsCOMPtr_base::assign_with_AddRef(nsCOMPtr_base * const 0x01c0d240, nsISupports 
* 0x029c7f20) line 73
nsStreamListenerTee::OnDataAvailable(nsStreamListenerTee * const 0x029bd3a8, 
nsIRequest * 0x029c7be0, nsISupports * 0x00000000, nsIInputStream * 0x00000000, 
unsigned int 0, unsigned int 4008) line 56 + 34 bytes
nsHttpChannel::OnDataAvailable(nsHttpChannel * const 0x029c7be4, nsIRequest * 
0x029bf3ac, nsISupports * 0x00000000, nsIInputStream * 0x029c47cc, unsigned int 
0, unsigned int 4008) line 2876
nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x01c0d240) 
line 192 + 23 bytes
PL_HandleEvent(PLEvent * 0x029caa3c) line 596 + 4 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x10042fe0) line 526 + 6 bytes
_md_EventReceiverProc(HWND__ * 0x000201a6, unsigned int 49315, unsigned int 0, 
long 15811040) line 1077 + 13 bytes
USER32! 77e13eb0()
nsAppShellService::Run(nsAppShellService * const 0x00fe2dd0) line 309
main1(int 4202558, char * * 0x00000004, nsISupports * 0x00253c90) line 1415 + 
10 bytes
main(int 0, char * * 0x00252cb8) line 1763 + 23 bytes
WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00400000, char * 0x00133c42, 
HINSTANCE__ * 0x00400000) line 1781 + 23 bytes
MOZILLA! WinMainCRTStartup + 308 bytes
blocks bug 136999 "Windows builds should build with -O2"

pavlov: I have a system all set up with a build that has this crash. Can we 
take some time to have a look at this and figure out what is going wrong.
Blocks: 136999
I've seen this a while ago (same stack)
I've tried to #undef MMX support (no more inline assembly that can break opt
code)and it worked for me
Later on I pulled another tree and the problem was simply not there anymore
So now I run with -O2 and no crash, go figure ...

So did you try to #undef MMX support ?

(in jmorecfg.h:

/* Defines for MMX support. */

#if defined(XP_WIN32) && defined(_M_IX86)
#define HAVE_MMX_INTEL_MNEMONICS            <-------- comment out that line
#endif
)
Bernard: thanks. Yes, removing that define and compiling with -O2 cures the 
crash that I was experiencing.
I think I got it:

http://lxr.mozilla.org/mozilla1.0/source/modules/libimg/png/pngvcrd.c#33

see the comment: CPUID instruction will trash ebx ecx edx !
those registers might be used by optimized code (especially ecx) !

therefore the solution would be to sync jpeg mmxsupport() function with 
png_mmx_support()

for test and review
Bernard. Good catch, although unfortunately, that doesn't seem to cure this 
crash. It does seem like something that should be done though.

One thing I noted, and I'm really over my head at this point, but I was 
stepping through this in the debugger, and it seemed like those push/pops
had been optimized away (???). It doesn't appear that ebx,ecx,edx had been
restored to previous state at the end of mmxsupported() in jdapimin.c. (I
don't know; maybe I botched the patch somehow). 

cc: a couple of pixel heads :-)
assembler code: 

GLOBAL(void)
jpeg_CreateDecompress (j_decompress_ptr cinfo, int version, size_t structsize)
{
60F11000  push        ebp  
60F11001  mov         ebp,esp 
60F11003  push        ecx  
  int i;

#ifdef HAVE_MMX_INTEL_MNEMONICS
  static int cpuidDetected = 0;

  if(!cpuidDetected)
60F11004  mov         eax,dword ptr ds:[60F27334h] 
60F11009  test        eax,eax 
60F1100B  push        ebx  
60F1100C  push        esi  
60F1100D  push        edi  
60F1100E  jne         jpeg_CreateDecompress+63h (60F11063h) 
  {
	MMXAvailable = mmxsupport();
60F11010  mov         dword ptr [ebp-4],0 
60F11017  push        ebx  
60F11018  push        ecx  
60F11019  push        edx  
60F1101A  pushfd           
60F1101B  pop         eax  
60F1101C  mov         ecx,eax 
60F1101E  xor         eax,200000h 
60F11023  push        eax  
60F11024  popfd            
60F11025  pushfd           
60F11026  pop         eax  
60F11027  xor         eax,ecx 
60F11029  je          jpeg_CreateDecompress+4Bh (60F1104Bh) 
60F1102B  xor         eax,eax 
60F1102D  cpuid            
60F1102F  cmp         eax,1 
60F11032  jl          jpeg_CreateDecompress+4Bh (60F1104Bh) 
60F11034  xor         eax,eax 
60F11036  inc         eax  
60F11037  cpuid            
60F11039  and         edx,800000h 
60F1103F  cmp         edx,0 
60F11042  je          jpeg_CreateDecompress+4Bh (60F1104Bh) 
60F11044  mov         dword ptr [ebp-4],1 
60F1104B  mov         eax,dword ptr [ebp-4] 
60F1104E  pop         edx  
60F1104F  pop         ecx  
60F11050  pop         ebx  
60F11051  mov         eax,dword ptr [ebp-4] 
60F11054  mov         dword ptr [_MMXAvailable (60F27348h)],eax 
	cpuidDetected = 1;
60F11059  mov         dword ptr ds:[60F27334h],1 
  }
#endif

  /* For debugging purposes, zero the whole master structure.
   * But error manager pointer is already there, so save and restore it.
   */

  /* Guard against version mismatches between library and caller. */
  cinfo->mem = NULL;		/* so jpeg_destroy knows mem mgr not called */
  if (version != JPEG_LIB_VERSION)
60F11063  mov         eax,dword ptr [version] 
60F11066  cmp         eax,3Eh 
60F11069  mov         esi,dword ptr [cinfo] 

....


I use VC70. I don't have VC60.
as you can see, mmxsupport() is inlined but push and pops are there
(to step through the assembly code, I did right click | Go to disassembly after
I hit the breakpoint)
I verified that CPUID changes the registers and that they are now preserved

All that I think it's true:

1) From Comment #3 we know that this is somewhat mmx related.

2) From the stack trace the crash occurs before any data is actually decoded so
the only mmx specific code involved would be mmxsupport()

I suggest that you look at the assembly code to check if push and pops are there
then try to replace the line:

MMXAvailable = mmxsupport();

by:

MMXAvailable = 1; (or 0, it must be the result of the true mmxsupport() call)

maybe that way we would know if the problem is really with mmxsupport()/CPUID or not

Target Milestone: --- → mozilla1.1beta
The patch wfm.	Can other windows people verify?
Attachment #79515 - Attachment is obsolete: true
Tested swalker's updated patch. Seems to do the trick, I can render JPEG images
using an -O2 build without crashing.
Attachment #134032 - Flags: superreview?(tor)
Attachment #134032 - Flags: review?(tor)
Comment on attachment 134032 [details] [diff] [review]
Updated version of  Bernard's patch; sync the additions added when libpng 1.2.5 landed

According to the MS documentation, asm blocks generate their own 
clobber list and you don't need to manually preserve e[abcd]x or 
e[sd]i.  Just preserving the flags should be enough.
Attachment #134032 - Flags: superreview?(tor)
Attachment #134032 - Flags: superreview-
Attachment #134032 - Flags: review?(tor)
Attachment #134032 - Flags: review-
I assume __fastcall option /Gr isn't being used is it? It was my understanding
that if you make a call to a function and need register values preserved it's
the caller's responsibility to preserve them not the callee's, other than things
like esp and ebp. The only other thing I thought might be happening is the local
being stored in a register, in which case it might have gotten clobbered, but a
simple test I ran indicates that shouldn't happen. For simple assignment it was
referenced on the stack, for more complex operations a register was used and
then it was stored back to a location on the stack and the assembler referenced
that from then on.

I'm not sure if I have the Processor Pack, though. That could very well make a
difference. I'll check and report back.
"I assume __fastcall option /Gr isn't being used is it?"
yes, __fastcall isn't being used here

And I looked at MS doc again and I couldn't see any statement about __asm block
preserving any register

About the patch I have to say that it's just a matter of synchronizing jpeg
mmxsupport() with libpng png_mmx_support() and that there's no problem with
png_mmx_support() with -O2 ...

And I've looked at assembly code again and it's clear that no push/pop are
generated to preserve registers, you have to do this manually, hence the
push/pop for ebx,ecx,edx

I've noticed that with -O2 when push/pop for ebx/cx/dx are added the call to
mmxsupport() is not inlined. When push/pops are removed mmxsupport() is inlined
with -O2 and not inlined with -O1. Maybe the crash has nothing to do with
preserving the registers but is just about inlining or not inlining mmxsupport() ?
I wasn't implying that the __asm would cause the registers to be preserved, what
I was saying that in my test the register was stored back at the start of the
__asm block.

The inline issue may be the heart of it. With it inlined it's not actually a
function call, thus the optimizer may have tripped up didn't think about
preserving registers used. Forcing this to be non-inline would be an interesting
test.
+__declspec(noinline) int mmxsupport()

Does indeed not crash.
Summary: win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() → win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]
*** Bug 235087 has been marked as a duplicate of this bug. ***
Removing -O2 from the summary because this bug exhibits itself using other
optimisation flags too. Also clearing target milestone of mozilla1.1beta and
removing defunct QA contact.

This bug exhibits itself regularly when compiling with MSVC++ .NET 2003 using
the generic "ac_add_options --enable-optimize" mozconfig option.
Summary: win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg] → win32 crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]
Target Milestone: mozilla1.1beta → ---
Untested possible fix.
Attachment #142875 - Flags: review?(bugmail)
Just a comment that both of these patches with SSE2 IDCT and GL optimization crash SeaMonkey. They work fine on FireFox though.

I don't think that the code in jdapimin.c is the cause as I replaced both
routines with "return 1" and the thing still crashed. It could be that the
SSE2 and/or MMX code has a problem with GL optimization.

I'm going to play around with this stuff a little more next week.
http://msdn.microsoft.com/library/en-us/vclang/html/_core_Using_and_Preserving_Registers_in_Inline_Assembly.asp

Indicates that the compiler should be preserving registers used in asm-blocks.
However they also say that we should make sure to not use __fastcall, which I
think that we do by default.
Attached patch add staticSplinter Review
Attachment #142875 - Attachment is obsolete: true
Comment on attachment 143013 [details] [diff] [review]
add static

r=me if you've tested that this fixes the problem.

However it'd be great if someone that has MSVC *without* the processorpack
could try to compile with this change and make sure it still compiles.

It's not a big deal if it doesn't compile since we already require the
processorpack. But if it's required then it might be good to put an
announcement in the NGs or some such.
Attachment #143013 - Flags: review+
Attachment #143013 - Flags: superreview?(bryner)
Attachment #142875 - Flags: review?(bugmail)
Attachment #143013 - Flags: superreview?(bryner) → superreview+
Checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
This bug (or bug 125762, I'm not sure) may have caused major regression bug
247437 (see attached screenshot)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: