Closed Bug 718541 Opened 12 years ago Closed 12 years ago

Can't build a working optimized Windows from trunk since bug 715821 landed; resulting build crashes on startup and attempting to build installer crashes during build

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 7
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox12 - ---

People

(Reporter: wgianopoulos, Assigned: rain1)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files, 3 obsolete files)

The last time I was able to successfully build Firefox under windows from the trunk was a pull of the changeset used for the 3AM January 10th Nightly build.

Since I suspect it will be late tonight when I actually get my hg bisect to complete (since my attempt to do this using depend builds was completely unsuccessful, so I have to do clobbers on each test, it is taking awhile) but I wanted to get as much detail in the bug as possible before it got too late tonight to be thinking straight.

Details on my build environment to follow.
This issue is occurring using Microsoft Visual C++ 2008 Express.

The attachment sdocuments the SDKs being used.
Keywords: regression
One issue is that it fails in the installer build step with the following:

Problem signature:
  Problem Event Name:   APPCRASH
  Application Name:     xpcshell.exe
  Application Version:  12.0.0.4395
  Application Timestamp:        4f118138
  Fault Module Name:    mozjs.dll
  Fault Module Version: 0.0.0.0
  Fault Module Timestamp:       4f116fe9
  Exception Code:       c0000005
  Exception Offset:     00060745
  OS Version:   6.1.7601.2.1.0.256.48
  Locale ID:    1033
  Additional Information 1:     0a9e
  Additional Information 2:     0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:     0a9e
  Additional Information 4:     0a9e372d3b4ad19135b953a78882e789
Summary: Can't build from trunk since sometime on January 10th → Can't build Windows from trunk since sometime on January 10th
I get a very similar startup crash from just running the binaries built without trying to install.  I will be posting an example soon, but was not necessarily from the exact same build so it might look less similar than you would expect.
Have you clobbered recently?
My entire effort to bisect this is based on clobber builds as I was entirely unable to figure anything out form dep builds that is why this is being filed so far after the issue.
Startup crash looks like this:

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	firefox.exe
  Application Version:	12.0.0.4398
  Application Timestamp:	4f14d50a
  Fault Module Name:	mozjs.dll
  Fault Module Version:	0.0.0.0
  Fault Module Timestamp:	4f14ae95
  Exception Code:	c0000005
  Exception Offset:	0005fff5
  OS Version:	6.1.7601.2.1.0.768.3
  Locale ID:	1033
  Additional Information 1:	0a9e
  Additional Information 2:	0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:	0a9e
  Additional Information 4:	0a9e372d3b4ad19135b953a78882e789
Assignee: nobody → general
Blocks: 715821
Severity: major → blocker
Component: Build Config → JavaScript Engine
QA Contact: build-config → general
So hg bisect resulted in:

The first bad revision is:
changeset:   83761:0b102e74b3b8
user:        Jeff Walden <jwalden@mit.edu>
date:        Fri Jan 06 00:13:20 2012 -0600
summary:     Bug 715821 - Make Object.prototype.__defineGetter__ and Object.prototype.__defineSetter__ perform their work by forwarding to Object.defineProperty.  This eliminates two calls to CheckRedeclaration, which is impeding property-storage-splitting work.  r=bhackett
Summary: Can't build Windows from trunk since sometime on January 10th → Can't build Windows from trunk since bug 715821 landed
Could this be a dupe of bug #718187?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: DUPLICATE → ---
I decided to dupe this the other way so that the open bug would be on the supported product.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I should also mention that although this is a 32-bit build of Firefox and it fails when run on 32-bit systems, it is being built on a Windows 7 64-bit system with an AMD processor.

Built using standard 32-bit Mozilla Build tools.
Severity: blocker → critical
Crash Signature: [@ JS_ForwardGetPropertyTo]
Keywords: crash
There seems to be multiple developers hitting this, it feels like it needs tracking or some sort of investigation.
(In reply to Mark Banner (:standard8) from comment #13)
> There seems to be multiple developers hitting this, it feels like it needs
> tracking or some sort of investigation.

That was why I had the severity set to blocker.  Someone else lowered it.  Although we have a workaround by backing out the offending fix, it certainly sounds from bug 715821 comment 0 that there are follow-on fixes coming down the pike that are dependent on that bug. So, having people only being able to build by backing it out is not really a good plan going forward.
So, um, you're saying a change to __define[GS]etter__'s implementation is causing the build to fail?  That doesn't make much sense to me.  Maybe you can catch this in a debugger, or something?  It's really not obvious to me how a change like that would cause the build to break.
(In reply to Jeff Walden (remove +bmo to email) from comment #15)
> So, um, you're saying a change to __define[GS]etter__'s implementation is
> causing the build to fail?

Just to clarify: For the people on bug 718187 (which was duped to this one), including me, the build completed successfully, but the application (SeaMonkey) always crashes on startup, before anything is displayed (even before the Profile Manager comes up if requested via -P). We're all building 32-bit on Win 7 x64, but with different versions of MSVC. Note that I didn't personally track down the culprit so I can't tell whether bug 715821 caused this (I'm just watching the game).
I have a crash on startup w/ the same build settings (vc express 9, release build) that goes away when I back out the patch for bug 715821. The issue happens on three different machines running 3 different versions of windows, so I'm pretty sure it's vc express c++ optimizer that's causing the issue.

I crash here: http://mxr.mozilla.org/comm-central/source/mozilla/js/src/jsobj.cpp#5450

because obj2 is null, and apparently LookupPropertyWithFlagsInline returns true but null prop and obj2 (though debugging release builds with symbols can make debugging a bit tricky)

I'm happy to debug this, if you have any hints for me.
So LPWFI returned true, and it set prop to NULL?  That indicates that the lookup didn't hit any errors (OOM, thrown exception, etc.), and the property wasn't found.  But if prop is NULL, then you should have entered the if-block just above.  And there's clearly no way out of that if-block except by returning.

I just hopped off a cross-country flight on too little sleep, so I'm hesitant to have particular certainty in my judgment for the rest of the day.  But even still, I don't see how you could have |prop == NULL| and have hit that |if (obj2->isNative())|.
(In reply to Jens Hatlak (:InvisibleSmiley) from comment #16)
> (In reply to Jeff Walden (remove +bmo to email) from comment #15)
> > So, um, you're saying a change to __define[GS]etter__'s implementation is
> > causing the build to fail?
> 
> Just to clarify: For the people on bug 718187 (which was duped to this one),
> including me, the build completed successfully, but the application
> (SeaMonkey) always crashes on startup, before anything is displayed (even
> before the Profile Manager comes up if requested via -P). We're all building
> 32-bit on Win 7 x64, but with different versions of MSVC. Note that I didn't
> personally track down the culprit so I can't tell whether bug 715821 caused
> this (I'm just watching the game).

It is the installer step that crashes during the build, so if you do not create the installer it will not crash.  Also, that is for firefox.  I have no idea if trying to build the seamonkey installer works or not.
(In reply to Jeff Walden (remove +bmo to email) from comment #15)
> So, um, you're saying a change to __define[GS]etter__'s implementation is
> causing the build to fail?  That doesn't make much sense to me.  Maybe you
> can catch this in a debugger, or something?  It's really not obvious to me
> how a change like that would cause the build to break.

It is the step to build the Firefox installer that crashes.
Summary: Can't build Windows from trunk since bug 715821 landed → Can't build a working Windows from trunk since bug 715821 landed; Building installer crashes and resulting build crashes on startup
We are now getting reports of similar failures using MSVC 10.  Setting this back form Critical to blocker.
Severity: critical → blocker
If there is a legitimate reason to downgrade the severity, please explain it in a comment otherwise please leave it as it is.
Summary: Can't build a working Windows from trunk since bug 715821 landed; Building installer crashes and resulting build crashes on startup → Can't build a working Windows from trunk since bug 715821 landed; resulting build crashes on startup and attempting to build installer crashes during build
(In reply to Jeff Walden (remove +bmo to email) from comment #18)
> So LPWFI returned true, and it set prop to NULL? 

I could be wrong about prop being null, but I'm pretty sure obj2 was null.
> Someone else lowered it.
Sorry I thought at that time only very few people were affected by it - at that time mostly SeaMonkey developers.
I seem to crash here:

http://mxr.mozilla.org/comm-central/source/mozilla/js/src/jsobj.cpp#5478
5478     return js_GetPropertyHelperInline(cx, obj, receiver, id, JSGET_METHOD_BARRIER, vp);


>	mozjs.dll!js_GetProperty(JSContext * cx=0x03d1b040, JSObject * obj=0x03d1b040, JSObject * receiver=0x03d1b040, int id=64037120, JS::Value * vp=0x0036f708)  Line 5478 + 0x4 bytes	C++
 	mozjs.dll!JSObject::getGeneric(JSContext * cx=0x03197220, JSObject * receiver=0x03d1b040, int id=64037216, JS::Value * vp=0x0036f708)  Line 209 + 0x15 bytes	C++
 	mozjs.dll!JS_GetPropertyById(JSContext * cx=0x03197220, JSObject * obj=0x03d1b040, int id=64037216, JS::Value * vp=0x0036f708)  Line 4060 + 0x2b bytes	C++
 	xul.dll!XPCWrappedNativeScope::SetGlobal(XPCCallContext & ccx={...}, JSObject * aGlobal=0x03d125e0)  Line 269 + 0x15 bytes	C++
 	xul.dll!nsXPConnect::InitClassesWithNewWrappedGlobal(JSContext * aJSContext=0x03197220, nsISupports * aCOMObj=0x031d3244, const nsID & aIID={...}, nsIPrincipal * aPrincipal=0x031b0600, nsISupports * aExtraPtr=0x00000000, unsigned int aFlags=2, nsIXPConnectJSObjectHolder * * _retval=0x0036f91c)  Line 1351	C++
 	xpcshell.exe!main(int argc=5, char * * argv=0x002520f0, char * * envp=0x00252d20)  Line 1951	C++
 	xpcshell.exe!__tmainCRTStartup()  Line 586 + 0x17 bytes	C
 	kernel32.dll!774e3677() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!77ba9f02() 	
 	ntdll.dll!77ba9ed5() 	

Disassembly at the crash point:

726607B1  mov         esi,dword ptr [esp+14h] 
726607B5  mov         edi,dword ptr [esi] 
726607B7  mov         cl,byte ptr [edi+0Dh] 
726607BA  not         cl   
726607BC  test        cl,1 
726607BF  jne         js_GetProperty+3B0h (72660810h) 
726607C1  mov         edx,dword ptr [edi] 
726607C3  mov         eax,dword ptr [edx] 
726607C5  cmp         eax,offset js::ObjectProxyClass (727C8490h) 
726607CA  je          js_GetProperty+37Ah (726607DAh) 
726607CC  cmp         eax,offset js::OuterWindowProxyClass (727C8578h) 
726607D1  je          js_GetProperty+37Ah (726607DAh) 
726607D3  cmp         eax,offset js::FunctionProxyClass (727C8838h) 
726607D8  jne         js_GetProperty+39Ah (726607FAh) 
726607DA  mov         eax,dword ptr [esp+48h] 
726607DE  mov         ecx,dword ptr [esp+40h] 
726607E2  push        eax  
726607E3  push        ebx  
726607E4  push        ecx  
726607E5  push        esi  
726607E6  push        ebp  
726607E7  call        js::Proxy::get (72678300h) 
726607EC  add         esp,14h 
726607EF  movzx       eax,al
I believe I am reproducing this with this in my .mozconfig:

ac_add_options --disable-debug 
ac_add_options --enable-optimize

If I reverse them, I get a working build.

Windows 7, 64bit, VS2010

Building one final build to confirm, but I believe this info can be helpful.
I spoke too soon... I am definitely on to something though.

I reversed them, and still got a broken build.

Be back once I have this confirmed.
Adding:

ac_add_options --disable-optimize

to .mozconfig allows me to get a working build.
I'm crashing in the same place running xpcshell tests as comment #25 compiling 32 bit on a 64 bit system with VS 2008 (ver. 9.0.30729.1). Backing out Bug 715821 fixes the crash.

mozjs.dll!js_GetProperty(JSContext * cx=0x0341b040, JSObject * obj=0x0341b040, JSObject * receiver=0x0341b040, int id=54599936, JS::Value * vp=0x0040f5f4)  Line 5478 + 0x4 bytes	C++
mozjs.dll!JSObject::getGeneric(JSContext * cx=0x03040220, JSObject * receiver=0x0341b040, int id=54600032, JS::Value * vp=0x0040f5f4)  Line 209 + 0x15 bytes	C++
mozjs.dll!JS_GetPropertyById(JSContext * cx=0x03040220, JSObject * obj=0x0341b040, int id=54600032, JS::Value * vp=0x0040f5f4)  Line 4060 + 0x2b bytes	C++
xul.dll!XPCWrappedNativeScope::SetGlobal(XPCCallContext & ccx={...}, JSObject * aGlobal=0x034125e0)  Line 269 + 0x15 bytes	C++
xul.dll!nsXPConnect::InitClassesWithNewWrappedGlobal(JSContext * aJSContext=0x03040220, nsISupports * aCOMObj=0x0307e524, const nsID & aIID={...}, nsIPrincipal * aPrincipal=0x0305c180, nsISupports * aExtraPtr=0x00000000, unsigned int aFlags=2, nsIXPConnectJSObjectHolder * * _retval=0x0040f804)  Line 1351	C++
xpcshell.exe!main(int argc=20, char * * argv=0x002ca000, char * * envp=0x002c2f98)  Line 1950 + 0x2e bytes	C++
xpcshell.exe!__tmainCRTStartup()  Line 586 + 0x17 bytes	C
kernel32.dll!770d339a() 	
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
ntdll.dll!77c29ef2() 	
ntdll.dll!77c29ec5()
I'll give building with msvc2008 a try and see what I can find.
Attached patch Workaround (obsolete) — Splinter Review
For those trying to build who encounter this issue this is probably a superior solution to backing out bug 715821.
(In reply to Philip Chee from comment #24)
> > Someone else lowered it.
> Sorry I thought at that time only very few people were affected by it - at
> that time mostly SeaMonkey developers.

The issue has more to do with what you are testing.  It does not impact non-optimized debug builds.  But if you are trying to do performance testing with an optimized build, then you run into this issue.
Summary: Can't build a working Windows from trunk since bug 715821 landed; resulting build crashes on startup and attempting to build installer crashes during build → Can't build a working optimized Windows from trunk since bug 715821 landed; resulting build crashes on startup and attempting to build installer crashes during build
This is a much better workaround because:

1.  It only disables optimization in the area change by bug 715821 instead of in the entire module.

2.  It removes the MSVC version checks because we have not really validated which versions are affected.
Attachment #589733 - Attachment is obsolete: true
Attachment #590051 - Attachment description: Workarond-Disspabl;e Optimization in code changed by bug 715821 → Workaday-Disable Optimization in code changed by bug 715821
Attachment #590051 - Attachment description: Workaday-Disable Optimization in code changed by bug 715821 → Workaround-Disable Optimization in code changed by bug 715821
Attachment #589733 - Attachment is obsolete: false
Attachment #590051 - Attachment is obsolete: true
(In reply to Bill Gianopoulos [:WG9s] from comment #33)
> Created attachment 590051 [details] [diff] [review]
> Workaround-Disable Optimization in code changed by bug 715821
> 
> This is a much better workaround because:
> 
> 1.  It only disables optimization in the area change by bug 715821 instead
> of in the entire module.
> 
> 2.  It removes the MSVC version checks because we have not really validated
> which versions are affected.

Reverted to previous patch.  I must have screwed up somehow when I tested this last night.  My automated build today failed with the supposedly better patch.
Looks to me like since the landing of this patch we have functions that sometimes return true/false, sometimes return JS_TRUE/JS_FALSE .

This seems wrong to me.
Actually these issues seem to be outside of the area modified by bug 715821.  Nevermind! ;-)
I'm hitting this same crash (also building with VC2008). It's certainly possible it's just a compiler bug, but it's probably worth working around if possible since it's affecting so many people.
FYI all, I see a lot of attention focused on VC2008... I had/have the same problem with VC2010 ... comment #28 got me working around ...
I'm hitting the same problem with VC10.

At js_GetProperty+34Ah (seemingly jumped to when https://mxr.mozilla.org/mozilla-central/source/js/src/jsobj.cpp#5371 returns false) I see

mov         ebp,dword ptr [esp+14h]

At this point esp+14h is 0.

mov         ecx,dword ptr [ebp]

and so it crashes.
Ignore my last comment. I did some uninlining, and the problem happens at https://mxr.mozilla.org/mozilla-central/source/js/src/jsobj.cpp#4991.

The (!obj->nativeEmpty()) check succeeds because *(obj.ptr) is non-null. However in the subsequent nativeLookup call, at https://mxr.mozilla.org/mozilla-central/source/js/src/jsobjinlines.h#1161 a new Shape is allocated with the exact same address as (obj.ptr), which causes things to go awry at https://mxr.mozilla.org/mozilla-central/source/js/src/jsscope.h#1075. I think this might be a dangling pointer somewhere.
Attached patch "fix" (obsolete) — Splinter Review
might be a compiler bug -- forcing nativeLookup to not be inlined fixes the crash.
(In reply to Siddharth Agarwal [:sid0] from comment #41)
> Created attachment 590788 [details] [diff] [review]
> "fix"
> 
> might be a compiler bug -- forcing nativeLookup to not be inlined fixes the
> crash.

Might it make sense to do this on MSVC only then?
Yeah, probably.  But you want MOZ_NEVER_INLINE from mozilla/Attributes.h, not Windows-specific declspec-ing.  And I wouldn't try to make it Windows-specific by implementing it once in the header for non-Windows, then again in the cpp for Windows -- that'd be a recipe for unreadability.
The patch wasn't meant to be in any sort of shape to check in, hence the scare quotes.
Attached patch workaround v1Splinter Review
Disables inlining this function on MSVC9 and above.
Attachment #590788 - Attachment is obsolete: true
Attachment #590871 - Flags: review?
Attachment #590871 - Flags: review? → review?(jwalden+bmo)
Attachment #589733 - Attachment is obsolete: true
(In reply to Siddharth Agarwal [:sid0] from comment #45)
> Created attachment 590871 [details] [diff] [review]
> workaround v1
> 
> Disables inlining this function on MSVC9 and above.

This seems to fix the issue that I originally reported for me.
FWIW, my VC10 w/ JS PGO enabled builds don't have this problem. We may want to re-test this once PGO gets turned on again for JS.
(In reply to Ryan VanderMeulen from comment #47)
> FWIW, my VC10 w/ JS PGO enabled builds don't have this problem. We may want
> to re-test this once PGO gets turned on again for JS.

That does not help people doing their own non-pgo builds.  The official builds don't see this problem at all.  This is only an issue for people doing their own optimized builds.
I've been running into a similar issue and think this bug is relevant to it, with --enable-optimize set my PGO builds are failing at the step where profiling is meant to occur.

TEST-UNEXPECTED-FAIL | automation.py | Exited with code -1073741819 during test run
INFO | automation.py | Application ran for: 0:00:00.132000
INFO | automation.py | Reading PID log: c:\users\shadow~1\appdata\local\temp\tmpvzvjpcpidlog
f:\mozilla-central\client.mk:222:0: command 'MOZ_PGO_INSTRUMENTED=1 OBJDIR=. JARLOG_DIR=./jarlog/en-US python ./_profile/pgo/profileserver.py' failed, return code -1073741819

if I attempt to manually execute firefox.exe via the shell I receive a crash.

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	firefox.exe
  Application Version:	12.0.0.4406
  Application Timestamp:	4f1e9bf7
  Fault Module Name:	xul.dll
  Fault Module Version:	12.0.0.4406
  Fault Module Timestamp:	4f1e9bc4

If I edit js/src/makefile.in to allow PGO on JS, enable-optimize and everything else works perfectly fine, hopefully this information helps somebody figure it out.
start-msvc10.bat

"Mozilla tools directory: C:\mozilla-build\"
Visual C++ 10 Express directory: C:\Program Files (x86)\Microsoft
Visual Studio 10.0\VC\
Windows SDK directory: C:\Program Files (x86)\Microsoft
SDKs\Windows\v7.0A\
Windows SDK version: 7.0A
Setting environment for using Microsoft Visual Studio 2010 x86 tools.
Mozilla build environment: MSVC version 10.

Master@MARKSLAPTOP ~

$ hg version

Mercurial Distributed SCM (version 1.9.1)
(see http://mercurial.selenic.com for more information)
Copyright (C) 2005-2011 Matt Mackall and others
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

Master@MARKSLAPTOP ~

$ python -OO build/pymake/make.py -f client.mk

This is compiled with no .MOZCONFIG file, and is of clean source
compile, no custom changes, (baseline) attempt last 01/16.

The build seems to complete okay, but the execution runs in the background, then terminates without displaying any windows. It stops according to my MSDEV debug sessions with error message pointing to a problem in mozjs.dll.

Adding:
ac_add_options --disable-optimize
to .mozconfig allowed me to get a working build.

One week later now, working on my local changes and doing daily pull -U's I removed the 
ac_add_options --disable-optimize
to try to recreate the problem and re-join the groups degugging efforts and TWA / Trouble Went Away ....

:-(
Have you tried a clobber (clean) build, Mark?
Oh, sorry... I meant building now works for me with no problems as originally encountered ... no further need for ac_add_options --disable-optimize ... and nothing for me to debug as all is well
(In reply to Mark Capella from comment #52)
> Oh, sorry... I meant building now works for me with no problems as
> originally encountered ... no further need for ac_add_options
> --disable-optimize ... and nothing for me to debug as all is well

The previous comment was trying to say that if you clobber your obdir in order to ensure everything is rebuilt with optimization enabled, the issue will likely return for you.
Oh, new developer missed a nuance ... my recent rebuild picked up previously cached portions of compiled code and masked the original problem ... lemme figure out clobber / clean build and re-try ... write me eMail or IRQ in #developers if more appropriate
Attachment #590871 - Flags: review?(jwalden+bmo) → review+
(In reply to Ryan VanderMeulen from comment #47)
> FWIW, my VC10 w/ JS PGO enabled builds don't have this problem. We may want
> to re-test this once PGO gets turned on again for JS.

Not sure this will make a difference, since default builds aren't ever going to be PGO. PGO probably changes the codegen here so much that it avoids the problem.
http://hg.mozilla.org/integration/mozilla-inbound/rev/40f3a8423c89
Assignee: general → sagarwal
Status: NEW → ASSIGNED
Whiteboard: [inbound]
https://hg.mozilla.org/mozilla-central/rev/40f3a8423c89
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: