Last Comment Bug 555727 - NS_DECLARE_FRAME_PROPERTY breaks VC9 builds; also breaks opt builds on OS X 10.7
: NS_DECLARE_FRAME_PROPERTY breaks VC9 builds; also breaks opt builds on OS X 10.7
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: Layout: Misc Code (show other bugs)
: Trunk
: All All
: -- blocker (vote)
: mozilla15
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
:
Mentors:
: 732467 739123 (view as bug list)
Depends on:
Blocks: 551660
  Show dependency treegraph
 
Reported: 2010-03-29 09:10 PDT by Mike Pesce (:By-Tor)
Modified: 2012-07-05 10:10 PDT (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch that does not help (1.88 KB, patch)
2010-04-02 16:19 PDT, Bill Gianopoulos [:WG9s]
no flags Details | Diff | Review
the fix (807 bytes, patch)
2010-04-03 08:18 PDT, Bill Gianopoulos [:WG9s]
no flags Details | Diff | Review
better patch (1.26 KB, patch)
2010-04-03 11:30 PDT, Bill Gianopoulos [:WG9s]
dbaron: review+
Details | Diff | Review
Patch for checkin (1.32 KB, patch)
2010-04-03 13:02 PDT, Bill Gianopoulos [:WG9s]
no flags Details | Diff | Review
patch, remove NS_PROPERTY_DESCRIPTOR_CONST to avoid excessive const-folding by eager optimizers (2.45 KB, patch)
2012-04-26 08:52 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Review
patch, ensure uniqueness of property descriptors (3.44 KB, patch)
2012-04-26 10:23 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Review
patch, un-constify property descriptors for apple gcc 4.2.1 (1.31 KB, patch)
2012-04-27 02:44 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Review
macros defined by clang (4.27 KB, text/plain)
2012-04-27 07:00 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details
macros defined by llvm-gcc (3.93 KB, text/plain)
2012-04-27 07:01 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details
macros defined by gcc (3.86 KB, text/plain)
2012-04-27 07:01 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details
patch, un-constify property descriptors to work around apple llvm-gcc bug (1.26 KB, patch)
2012-04-27 09:17 PDT, Jonathan Kew (:jfkthame)
respindola: review-
Details | Diff | Review
select clang over llvm-gcc (7.54 KB, patch)
2012-04-27 10:26 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Review

Description Mike Pesce (:By-Tor) 2010-03-29 09:10:56 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a4pre) Gecko/20100329 Minefield/3.7a4pre (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a4pre) Gecko/20100329 Minefield/3.7a4pre (.NET CLR 3.5.30729)

I have been unable to get a working executable since changeset 4b8936ac4a31 and subsequent patches using vc9.  Since then, I have only gotten executables which crash with a close program dialog.  

The error log reports: Faulting application name: firefox.exe, version: 1.9.3.3740, time stamp: 0x4bb0c9e2
Faulting module name: xul.dll, version: 1.9.3.3740, time stamp: 0x4bb0c9cb
Exception code: 0xc0000005
Fault offset: 0x00224b1b
Faulting process id: 0x241c
Faulting application start time: 0x01cacf591d8fa84d
Faulting application path: C:\Users\Mike\mozilla-central\objdir-ff-release\dist\firefox\firefox.exe
Faulting module path: C:\Users\Mike\mozilla-central\objdir-ff-release\dist\firefox\xul.dll
Report Id: 5c3b0fe6-3b4c-11df-bdb8-001b2112456b

Reproducible: Always
Comment 1 Benjamin Smedberg [:bsmedberg] 2010-03-29 09:18:59 PDT
Did you narrow it down to http://hg.mozilla.org/mozilla-central/rev/4b8936ac4a31 in particular, or what was the last changeset that didn't display the crash?

Do you have symbols to get a good stacktrace?
Comment 2 Mike Pesce (:By-Tor) 2010-03-29 09:21:37 PDT
The build previous to that was the last good build.  Unfortunately, I can't give more info at the moment.
Comment 3 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-29 09:43:12 PDT
There was Windows bustage on tinderbox that I fixed with http://hg.mozilla.org/mozilla-central/rev/90a30ea68797.
Comment 4 Mike Pesce (:By-Tor) 2010-03-29 09:48:16 PDT
Yes, but, as I said, all subsequent patches haven't worked.  So, it may have fixed vc8 without vc9.  Although, I'm not getting the compile time errors I did before the patch.
Comment 5 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-29 09:50:58 PDT
Are you building from scratch each time or are these incremental builds?  If the latter, try the former.
Comment 6 Mike Pesce (:By-Tor) 2010-03-29 09:55:41 PDT
No, I've done completely clean builds since then to no avail.
Comment 7 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-29 09:58:15 PDT
Hmm, ok.  I'll try to reproduce.
Comment 8 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-29 10:09:00 PDT
(In reply to comment #7)
> Hmm, ok.  I'll try to reproduce.

Heh trying to build on MSVC 9 dies elsewhere for me and I don't have time to troubleshoot this further right now.
Comment 9 Mike Pesce (:By-Tor) 2010-03-29 10:11:15 PDT
Ok, I guess something is going on somewhere then.  heh
Comment 10 Philip Chee 2010-03-30 08:40:21 PDT
Same with SeaMonkey 2.1a1pre. => NEW
Comment 11 Tanner M. Young [:tmyoung] 2010-03-30 11:34:25 PDT
Currently, I get xul!gfxSize::gfxSize+0x310f as the crash point in xul.dll.

0:000:x86> g
ModLoad: 75fd0000 75fd9000   C:\Windows\syswow64\LPK.DLL
ModLoad: 75060000 750e0000   C:\Windows\SysWOW64\uxtheme.dll
ModLoad: 74970000 7497c000   C:\Windows\SysWOW64\dwmapi.dll
ModLoad: 00000000`778c0000 00000000`77a4a000   C:\Windows\syswow64\SETUPAPI.dll
ModLoad: 00000000`75720000 00000000`7573e000   C:\Windows\SysWOW64\USERENV.dll
ModLoad: 00000000`6f250000 00000000`6f30b000   C:\Windows\SysWOW64\PROPSYS.dll
ModLoad: 00000000`75fe0000 00000000`76064000   C:\Windows\syswow64\CLBCatQ.DLL
ModLoad: 6e5c0000 6e5fb000   C:\mozilla-central\objdir-ff-release\dist\bin\components\browsercomps.dll
ModLoad: 75750000 7578b000   C:\Windows\SysWOW64\mswsock.dll
ModLoad: 756b0000 756b5000   C:\Windows\SysWOW64\wshtcpip.dll
ModLoad: 00000000`75690000 00000000`756a9000   C:\Windows\SysWOW64\iphlpapi.dll
ModLoad: 00000000`75430000 00000000`75465000   C:\Windows\SysWOW64\dhcpcsvc.DLL
ModLoad: 00000000`75300000 00000000`7532c000   C:\Windows\SysWOW64\DNSAPI.dll
ModLoad: 00000000`756c0000 00000000`756c7000   C:\Windows\SysWOW64\WINNSI.DLL
ModLoad: 00000000`752b0000 00000000`752d2000   C:\Windows\SysWOW64\dhcpcsvc6.DLL
ModLoad: 00000000`70b70000 00000000`70b9b000   C:\Windows\SysWOW64\t2embed.dll
ModLoad: 00000000`6ecd0000 00000000`6edc4000   C:\Windows\SysWOW64\WindowsCodecs.dll
ModLoad: 00000000`74f10000 00000000`74f3c000   C:\Windows\SysWOW64\apphelp.dll
ModLoad: 00000000`71260000 00000000`7127f000   EhStorAPI.DLL
ModLoad: 00000000`71260000 00000000`7127f000   C:\Windows\SysWOW64\EhStorShell.dll
ModLoad: 00000000`75790000 00000000`757cb000   C:\Windows\SysWOW64\rsaenh.dll
ModLoad: 00000000`74d60000 00000000`74d6f000   C:\Windows\SysWOW64\NLAapi.dll
ModLoad: 00000000`74d50000 00000000`74d5f000   C:\Windows\SysWOW64\napinsp.dll
ModLoad: 00000000`74d10000 00000000`74d22000   C:\Windows\SysWOW64\pnrpnsp.dll
ModLoad: 00000000`74d40000 00000000`74d48000   C:\Windows\SysWOW64\winrnr.dll
ModLoad: 00000000`76620000 00000000`76669000   C:\Windows\syswow64\WLDAP32.dll
ModLoad: 00000000`77de0000 00000000`77de7000   C:\Windows\syswow64\PSAPI.DLL
(fa4.135c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\mozilla-central\objdir-ff-release\dist\bin\xul.dll - 
xul!gfxSize::gfxSize+0x310f:
6ba50358 a5              movs    dword ptr es:[edi],dword ptr [esi] es:002b:0033e86c=00000000 ds:002b:00000000=????????
0:000:x86> g
(fa4.135c): Access violation - code c0000005 (!!! second chance !!!)
xul!gfxSize::gfxSize+0x310f:
6ba50358 a5              movs    dword ptr es:[edi],dword ptr [esi] es:002b:0033e86c=00000000 ds:002b:00000000=????????
0:000:x86> g
(fa4.135c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
xul!gfxSize::gfxSize+0x310f:
6ba50358 a5              movs    dword ptr es:[edi],dword ptr [esi] es:002b:0033e86c=00000000 ds:002b:00000000=????????
Comment 12 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-30 16:59:18 PDT
Fixed by backout.

http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=9ba741f58e4f
Comment 13 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-30 18:38:25 PDT
Multiple reports on IRC that backing out the patch for that bug did not work.
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-03-30 18:43:19 PDT
Mike Pesce, can you bisect to find the actual patch that caused the build bustage?

I guess we should reland the nsIPresShell patch. Sorry Kyle!
Comment 15 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-30 18:57:58 PDT
I will reland Craig's patch in the morning.  I'm going to fire off a clobber build locally overnight to see if I can reproduce, but I develop with MSVC 9 and haven't had any trouble so far with that patch.
Comment 16 Tanner M. Young [:tmyoung] 2010-03-30 21:54:48 PDT
Microsoft (R) Windows Debugger Version 6.11.0001.404 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: C:\mozilla-central\objdir-ff-release\dist\bin\firefox.exe
Symbol search path is: SRV*c:\symbols*http://symbols.mozilla.org/firefox;srv*
Executable search path is: 
ModLoad: 00000000`00e50000 00000000`00e68000   firefox.exe
ModLoad: 00000000`77c50000 00000000`77dd6000   ntdll.dll
ModLoad: 00000000`77e10000 00000000`77f70000   ntdll32.dll
ModLoad: 00000000`75cb0000 00000000`75cf5000   C:\Windows\system32\wow64.dll
ModLoad: 00000000`75c60000 00000000`75cae000   C:\Windows\system32\wow64win.dll
ModLoad: 00000000`75bb0000 00000000`75bb9000   C:\Windows\system32\wow64cpu.dll
(1b04.1d34): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00000000`77c96060 cc              int     3
0:000> g
ModLoad: 00000000`77b20000 00000000`77c4d000   WOW64_IMAGE_SECTION
ModLoad: 00000000`77550000 00000000`77660000   WOW64_IMAGE_SECTION
ModLoad: 00000000`77b20000 00000000`77c4d000   NOT_AN_IMAGE
ModLoad: 00000000`77a50000 00000000`77b1d000   NOT_AN_IMAGE
ModLoad: 00000000`77550000 00000000`77660000   C:\Windows\syswow64\kernel32.dll
ModLoad: 00000000`6d8e0000 00000000`6e2e8000   C:\mozilla-central\objdir-ff-release\dist\bin\xul.dll
ModLoad: 00000000`73190000 00000000`73203000   C:\mozilla-central\objdir-ff-release\dist\bin\mozsqlite3.dll
ModLoad: 00000000`730e0000 00000000`73183000   C:\Windows\WinSxS\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_5090ab56bcba71c2\MSVCR90.dll
ModLoad: 00000000`72b00000 00000000`72be3000   C:\mozilla-central\objdir-ff-release\dist\bin\mozjs.dll
ModLoad: 00000000`10000000 00000000`1002a000   C:\mozilla-central\objdir-ff-release\dist\bin\nspr4.dll
ModLoad: 00000000`76400000 00000000`764c6000   C:\Windows\syswow64\ADVAPI32.dll
ModLoad: 00000000`764d0000 00000000`765c0000   C:\Windows\syswow64\RPCRT4.dll
ModLoad: 00000000`75de0000 00000000`75e40000   C:\Windows\syswow64\Secur32.dll
ModLoad: 00000000`75740000 00000000`75747000   C:\Windows\SysWOW64\WSOCK32.dll
ModLoad: 00000000`76070000 00000000`7609d000   C:\Windows\syswow64\WS2_32.dll
ModLoad: 00000000`762d0000 00000000`7637a000   C:\Windows\syswow64\msvcrt.dll
ModLoad: 00000000`77660000 00000000`77666000   C:\Windows\syswow64\NSI.dll
ModLoad: 00000000`74e90000 00000000`74ec2000   C:\Windows\SysWOW64\WINMM.dll
ModLoad: 00000000`77400000 00000000`774d0000   C:\Windows\syswow64\USER32.dll
ModLoad: 00000000`77670000 00000000`77700000   C:\Windows\syswow64\GDI32.dll
ModLoad: 00000000`75e40000 00000000`75f85000   C:\Windows\syswow64\ole32.dll
ModLoad: 00000000`777d0000 00000000`7785d000   C:\Windows\syswow64\OLEAUT32.dll
ModLoad: 00000000`74e50000 00000000`74e8d000   C:\Windows\SysWOW64\OLEACC.dll
ModLoad: 00000000`00170000 00000000`00188000   C:\mozilla-central\objdir-ff-release\dist\bin\smime3.dll
ModLoad: 00000000`00240000 00000000`002dd000   C:\mozilla-central\objdir-ff-release\dist\bin\nss3.dll
ModLoad: 00000000`001b0000 00000000`001c4000   C:\mozilla-central\objdir-ff-release\dist\bin\nssutil3.dll
ModLoad: 00000000`001e0000 00000000`001e7000   C:\mozilla-central\objdir-ff-release\dist\bin\plc4.dll
ModLoad: 00000000`002e0000 00000000`002e7000   C:\mozilla-central\objdir-ff-release\dist\bin\plds4.dll
ModLoad: 00000000`00300000 00000000`00321000   C:\mozilla-central\objdir-ff-release\dist\bin\ssl3.dll
ModLoad: 00000000`739b0000 00000000`739b6000   C:\mozilla-central\objdir-ff-release\dist\bin\mozalloc.dll
ModLoad: 00000000`768f0000 00000000`77400000   C:\Windows\syswow64\SHELL32.dll
ModLoad: 00000000`765c0000 00000000`76619000   C:\Windows\syswow64\SHLWAPI.dll
ModLoad: 00000000`75a00000 00000000`75a08000   C:\Windows\SysWOW64\VERSION.dll
ModLoad: 00000000`74ae0000 00000000`74b22000   C:\Windows\SysWOW64\WINSPOOL.DRV
ModLoad: 00000000`76380000 00000000`763f3000   C:\Windows\syswow64\COMDLG32.dll
ModLoad: 00000000`75860000 00000000`759fe000   C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6002.18005_none_5cb72f96088b0de0\COMCTL32.dll
ModLoad: 00000000`77860000 00000000`778c0000   C:\Windows\syswow64\IMM32.dll
ModLoad: 00000000`77700000 00000000`777c8000   C:\Windows\syswow64\MSCTF.dll
ModLoad: 00000000`75050000 00000000`75055000   C:\Windows\SysWOW64\MSIMG32.dll
ModLoad: 00000000`774d0000 00000000`7754d000   C:\Windows\syswow64\USP10.dll
ModLoad: 00000000`72a70000 00000000`72afe000   C:\Windows\WinSxS\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_5090ab56bcba71c2\MSVCP90.dll
ModLoad: 00000000`71d80000 00000000`71ea3000   C:\Windows\WinSxS\x86_microsoft.vc90.debugcrt_1fc8b3b9a1e18e3b_9.0.30729.1_none_bb1f6aa1308c35eb\MSVCR90D.dll
ModLoad: 00000000`74e30000 00000000`74e37000   C:\mozilla-central\objdir-ff-release\dist\bin\xpcom.dll
(1b04.1d34): WOW64 breakpoint - code 4000001f (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
ntdll32!DbgBreakPoint:
77e20004 cc              int     3
0:000:x86> g
ModLoad: 75fd0000 75fd9000   C:\Windows\syswow64\LPK.DLL
ModLoad: 75060000 750e0000   C:\Windows\SysWOW64\uxtheme.dll
ModLoad: 74970000 7497c000   C:\Windows\SysWOW64\dwmapi.dll
ModLoad: 00000000`778c0000 00000000`77a4a000   C:\Windows\syswow64\SETUPAPI.dll
ModLoad: 00000000`75720000 00000000`7573e000   C:\Windows\SysWOW64\USERENV.dll
ModLoad: 00000000`6f250000 00000000`6f30b000   C:\Windows\SysWOW64\PROPSYS.dll
ModLoad: 00000000`75fe0000 00000000`76064000   C:\Windows\syswow64\CLBCatQ.DLL
ModLoad: 730a0000 730db000   C:\mozilla-central\objdir-ff-release\dist\bin\components\browsercomps.dll
ModLoad: 75750000 7578b000   C:\Windows\SysWOW64\mswsock.dll
ModLoad: 756b0000 756b5000   C:\Windows\SysWOW64\wshtcpip.dll
ModLoad: 00000000`75690000 00000000`756a9000   C:\Windows\SysWOW64\iphlpapi.dll
ModLoad: 00000000`75430000 00000000`75465000   C:\Windows\SysWOW64\dhcpcsvc.DLL
ModLoad: 00000000`75300000 00000000`7532c000   C:\Windows\SysWOW64\DNSAPI.dll
ModLoad: 00000000`756c0000 00000000`756c7000   C:\Windows\SysWOW64\WINNSI.DLL
ModLoad: 00000000`752b0000 00000000`752d2000   C:\Windows\SysWOW64\dhcpcsvc6.DLL
ModLoad: 00000000`73280000 00000000`732ab000   C:\Windows\SysWOW64\t2embed.dll
ModLoad: 00000000`6ecd0000 00000000`6edc4000   C:\Windows\SysWOW64\WindowsCodecs.dll
ModLoad: 00000000`74f10000 00000000`74f3c000   C:\Windows\SysWOW64\apphelp.dll
ModLoad: 00000000`71260000 00000000`7127f000   EhStorAPI.DLL
ModLoad: 00000000`71260000 00000000`7127f000   C:\Windows\SysWOW64\EhStorShell.dll
ModLoad: 00000000`75790000 00000000`757cb000   C:\Windows\SysWOW64\rsaenh.dll
ModLoad: 00000000`74d60000 00000000`74d6f000   C:\Windows\SysWOW64\NLAapi.dll
ModLoad: 00000000`74d50000 00000000`74d5f000   C:\Windows\SysWOW64\napinsp.dll
ModLoad: 00000000`74d10000 00000000`74d22000   C:\Windows\SysWOW64\pnrpnsp.dll
ModLoad: 00000000`74d40000 00000000`74d48000   C:\Windows\SysWOW64\winrnr.dll
ModLoad: 00000000`76620000 00000000`76669000   C:\Windows\syswow64\WLDAP32.dll
ModLoad: 00000000`77de0000 00000000`77de7000   C:\Windows\syswow64\PSAPI.DLL
ModLoad: 00000000`74020000 00000000`74045000   C:\Program Files (x86)\Bonjour\mdnsNSP.dll
(1b04.1d34): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
xul!nsIFrame::GetOverflowRect+0x2f:
6db00567 a5              movs    dword ptr es:[edi],dword ptr [esi] es:002b:0014e62c=00000000 ds:002b:00000000=????????
0:000:x86> g
(1b04.1d34): Access violation - code c0000005 (!!! second chance !!!)
xul!nsIFrame::GetOverflowRect+0x2f:
6db00567 a5              movs    dword ptr es:[edi],dword ptr [esi] es:002b:0014e62c=00000000 ds:002b:00000000=????????
0:000:x86> kp
ChildEBP RetAddr  
0014e618 6db0335d xul!nsIFrame::GetOverflowRect(void)+0x2f [c:\mozilla-central\layout\generic\nsframe.cpp @ 3946]
0014e644 6dc21244 xul!nsFrame::CheckInvalidateSizeChange(struct nsHTMLReflowMetrics * aNewDesiredSize = 0x0014e988)+0x26 [c:\mozilla-central\layout\generic\nsframe.cpp @ 3984]
0014e7a4 6db0307f xul!nsBlockFrame::Reflow(class nsPresContext * aPresContext = 0x0434df10, struct nsHTMLReflowMetrics * aMetrics = 0x0014e988, struct nsHTMLReflowState * aReflowState = 0x0014e874, unsigned int * aStatus = 0x0014e91c)+0x416 [c:\mozilla-central\layout\generic\nsblockframe.cpp @ 1163]
0014e950 6db03cc3 xul!nsFrame::BoxReflow(class nsBoxLayoutState * aState = 0x0014edec, class nsPresContext * aPresContext = 0x0434df10, struct nsHTMLReflowMetrics * aDesiredSize = 0x0014e988, class nsIRenderingContext * aRenderingContext = 0x0373c188, int aX = 360, int aY = 60, int aWidth = 11880, int aHeight = 900, int aMoveFrame = 1)+0x313 [c:\mozilla-central\layout\generic\nsframe.cpp @ 6535]
0014e9dc 6db80c36 xul!nsFrame::DoLayout(class nsBoxLayoutState * aState = 0x0014edec)+0x51 [c:\mozilla-central\layout\generic\nsframe.cpp @ 6297]
0014e9f0 6db7f397 xul!nsIFrame::Layout(class nsBoxLayoutState * aState = 0x00000002)+0x18 [c:\mozilla-central\layout\xul\base\src\nsbox.cpp @ 568]
0014eb08 6dc541a6 xul!nsSprocketLayout::Layout(class nsIFrame * aBox = 0x043809f0, class nsBoxLayoutState * aState = 0x0014edec)+0x4d1 [c:\mozilla-central\layout\xul\base\src\nssprocketlayout.cpp @ 525]
0014eb24 6db80c36 xul!nsBoxFrame::DoLayout(class nsBoxLayoutState * aState = 0x00000000)+0x2e [c:\mozilla-central\layout\xul\base\src\nsboxframe.cpp @ 954]
0014eb38 6db7f397 xul!nsIFrame::Layout(class nsBoxLayoutState * aState = 0x00000002)+0x18 [c:\mozilla-central\layout\xul\base\src\nsbox.cpp @ 568]
0014ec50 6dc541a6 xul!nsSprocketLayout::Layout(class nsIFrame * aBox = 0x043507d0, class nsBoxLayoutState * aState = 0x0014edec)+0x4d1 [c:\mozilla-central\layout\xul\base\src\nssprocketlayout.cpp @ 525]
0014ec6c 6db80c36 xul!nsBoxFrame::DoLayout(class nsBoxLayoutState * aState = 0x00000005)+0x2e [c:\mozilla-central\layout\xul\base\src\nsboxframe.cpp @ 954]
0014ec80 6db810de xul!nsIFrame::Layout(class nsBoxLayoutState * aState = 0x00000000)+0x18 [c:\mozilla-central\layout\xul\base\src\nsbox.cpp @ 568]
0014ed84 6dc541a6 xul!nsStackLayout::Layout(class nsIFrame * aBox = 0x04350468, class nsBoxLayoutState * aState = 0x0014edec)+0x213 [c:\mozilla-central\layout\xul\base\src\nsstacklayout.cpp @ 346]
0014eda0 6db80c36 xul!nsBoxFrame::DoLayout(class nsBoxLayoutState * aState = 0x04350468)+0x2e [c:\mozilla-central\layout\xul\base\src\nsboxframe.cpp @ 954]
0014edb4 6dc53b39 xul!nsIFrame::Layout(class nsBoxLayoutState * aState = 0x6dbb6ec2)+0x18 [c:\mozilla-central\layout\xul\base\src\nsbox.cpp @ 568]
0014ee24 6dbb6ec2 xul!nsBoxFrame::Reflow(class nsPresContext * aPresContext = 0x0434df10, struct nsHTMLReflowMetrics * aDesiredSize = 0x0014efe4, struct nsHTMLReflowState * aReflowState = 0x0014ee94, unsigned int * aStatus = 0x0014f140)+0x147 [c:\mozilla-central\layout\xul\base\src\nsboxframe.cpp @ 757]
0014ee5c 6dc6379b xul!nsContainerFrame::ReflowChild(class nsIFrame * aKidFrame = 0x04350468, class nsPresContext * aPresContext = 0x0434df10, struct nsHTMLReflowMetrics * aDesiredSize = 0x0014efe4, struct nsHTMLReflowState * aReflowState = 0x0014ee94, int aX = 0, int aY = 0, unsigned int aFlags = 0, unsigned int * aStatus = 0x0014f140, class nsOverflowContinuationTracker * aTracker = 0x00000000)+0x89 [c:\mozilla-central\layout\generic\nscontainerframe.cpp @ 736]
0014f034 6da455e4 xul!ViewportFrame::Reflow(class nsPresContext * aPresContext = 0x0434df10, struct nsHTMLReflowMetrics * aDesiredSize = 0x0014f104, struct nsHTMLReflowState * aReflowState = 0x0014f05c, unsigned int * aStatus = 0x0014f140)+0xb6 [c:\mozilla-central\layout\generic\nsviewportframe.cpp @ 286]
0014f170 6da4575a xul!PresShell::DoReflow(class nsIFrame * target = 0x04350070, int aInterruptible = 0)+0x119 [c:\mozilla-central\layout\base\nspresshell.cpp @ 7248]
0014f194 6da45c4e xul!PresShell::ProcessReflowCommands(int aInterruptible = 0)+0x72 [c:\mozilla-central\layout\base\nspresshell.cpp @ 7363]

This is what it just returned for me.  The debugger pointed to http://hg.mozilla.org/mozilla-central/annotate/06ff8717ce97/layout/generic/nsFrame.cpp#l3946

I don't know why that line, but I'm lucky to have gotten this far.  I hope this is the kind of stack that will be useful.  I'm done for the day, but ping me tomorrow in IRC or in this bug if more information is needed or a better stack.
Comment 17 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-31 06:03:22 PDT
I relanded the patch.

If Tanner, Mike, Phillip, or somebody can, it would be helpful to take a clean tree built at the rev before this starts and manually apply the seven patches in the bug (starting with the patch at 2010-03-21 12:33 PDT, and ending at the patch at 2010-03-21 18:20 PDT) to narrow this down.
Comment 18 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-31 07:25:10 PDT
I cannot reproduce this on my machine.  Building the changeset in question with an almost identical mozconfig to Mike's (minus jemalloc) on the same compiler works for me.
Comment 19 Mike Pesce (:By-Tor) 2010-03-31 11:54:41 PDT
Ok, I made a completely clean working build using http://hg.mozilla.org/mozilla-central/rev/e6d08c87089e we'll have to take things from there on IRC.
Comment 20 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-03-31 12:21:50 PDT
(In reply to comment #19)
> Ok, I made a completely clean working build using
> http://hg.mozilla.org/mozilla-central/rev/e6d08c87089e we'll have to take
> things from there on IRC.

Ok, I'll be around tonight (EST) and hopefully will run into you.
Comment 21 Tanner M. Young [:tmyoung] 2010-04-01 01:46:48 PDT
Bas' theory worked for me.  The regression appears to be http://hg.mozilla.org/mozilla-central/rev/46e3ca39d3d0 (39965)

I built successfully for the first time updating to http://hg.mozilla.org/mozilla-central/rev/1f811ed26219 (39964) which is the revision immediately before the one that is listed above.  After building to http://hg.mozilla.org/mozilla-central/rev/46e3ca39d3d0 (39965) the crash resurfaced.

If possible, I would like By-Tor or someone who can reproduce this bug to confirm that compiling after running

hg update -r 1f811ed26219

creates a successful build, while compiling after running

hg update -r 46e3ca39d3d0 creates a broken build

I don't know of any dependency that breaks if we backout http://hg.mozilla.org/mozilla-central/rev/46e3ca39d3d0, (roc probably would) but what I quickly looked at seemed just fine.  Otherwise, I would say we have found the problem.
Comment 22 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-01 02:39:27 PDT
We can back that out to get things going again for people, but we need that patch. So I hope someone can figure out why this is blowing up on VC9.

One thing that could be tried is to rework the NS_DECLARE_FRAME_PROPERTY macro. Currently it's

#define NS_DECLARE_FRAME_PROPERTY(prop, dtor)                   \
  static const FramePropertyDescriptor* prop() {                \
    static const FramePropertyDescriptor descriptor = { dtor }; \
    return &descriptor;                                         \
  }

I wonder if inlining multiple copies of a function that returns the address of a static local variable is throwing the VC9 optimizer into a tailspin?

Maybe try something like

#define NS_DECLARE_FRAME_PROPERTY(prop, dtor)             \
  static const FramePropertyDescriptor prop##_descriptor; \
  static const FramePropertyDescriptor* prop() {          \
    prop##_descriptor.mDestructor = dtor;                 \
    return &prop##_descriptor;                            \
  }

Not optimal, but if it makes the problem go away we can come up with a better version.
Comment 23 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-01 02:40:08 PDT
Note that NS_DECLARE_FRAME_PROPERTY is used at global scope and in class declarations...
Comment 24 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-01 02:43:03 PDT
>   static const FramePropertyDescriptor prop##_descriptor; \

Lose the const...
Comment 25 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-04-01 04:18:23 PDT
I'd still like to get either myself or roc a copy of a crashing build so we can figure out what's wrong with this.
Comment 26 Bas Schouten (:bas.schouten) 2010-04-01 07:08:29 PDT
Technically I'm not sure if this is actually valid, right? According to the optimizer that variable is not valid outside of the scope and it could do any number of annoying tricks which make returning the address an invalid operation? I'm currently trying stuff like roc's proposal.
Comment 27 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-04-01 07:33:56 PDT
(In reply to comment #26)
> Technically I'm not sure if this is actually valid, right? According to the
> optimizer that variable is not valid outside of the scope and it could do any
> number of annoying tricks which make returning the address an invalid
> operation? I'm currently trying stuff like roc's proposal.

But it's static so it should still exist.
Comment 28 Julian Seward [:jseward] 2010-04-01 08:25:24 PDT
(In reply to comment #21)

> If possible, I would like By-Tor or someone who can reproduce this bug to
> confirm that compiling after running
> 
> hg update -r 1f811ed26219
> 
> creates a successful build, while compiling after running
> 
> hg update -r 46e3ca39d3d0 creates a broken build

Confirmed (WinXP, MSVC9, non-optimized, non-debug build).
Comment 29 Julian Seward [:jseward] 2010-04-01 08:58:12 PDT
(In reply to comment #28)
> Confirmed (WinXP, MSVC9, non-optimized, non-debug build).

The crash stack I have is shown below.  Is is similar but not
identical to the one in comment #16.  Looking at the code
for the first two frames:

nsrect.h:

60:  nsRect(const nsRect& aRect) {
61:    MOZ_COUNT_CTOR(nsRect);
62:    *this = aRect;
63:  }

So 'aRect' is null perhaps?  The caller is nsframe.cpp:

3941:  if (mOverflow.mType == NS_FRAME_OVERFLOW_LARGE) {
3942:    // there is an overflow rect, and it's not stored as deltas but as
3943:    // a separately-allocated rect
3944:    return *const_cast<nsIFrame*>(this)->GetOverflowAreaProperty(PR_FALSE);
3945:  }

I don't see how 3944 can cause a call to nsRect::nsRect, maybe some
copy constructor strangeness?  It does appear though that
nsIFrame::GetOverflowAreaProperty(PRBool aCreateIfNecessary) can
return nsnull.




Invalid read of size 4
   at 0x100031CD: nsRect::nsRect (nsrect.h:62)
   by 0x10427D75: nsIFrame::GetOverflowRect (nsframe.cpp:3944)
   by 0x1042B8F4: nsIFrame::FinishAndStoreOverflow (nsframe.cpp:5677)
   by 0x104CDC22: nsBox::SyncLayout (nsbox.cpp:632)
   by 0x104CCFD2: nsBox::EndLayout (nsbox.cpp:209)
   by 0x104CDA83: nsIFrame::Layout (nsbox.cpp:568)
   by 0x104CAF49: nsSprocketLayout::Layout (nssprocketlayout.cpp:521)
   by 0x106D6283: nsBoxFrame::DoLayout (nsboxframe.cpp:951)
   by 0x104CDA77: nsIFrame::Layout (nsbox.cpp:566)
   by 0x104CAF49: nsSprocketLayout::Layout (nssprocketlayout.cpp:521)
   by 0x106D6283: nsBoxFrame::DoLayout (nsboxframe.cpp:951)
   by 0x104CDA77: nsIFrame::Layout (nsbox.cpp:566)
   by 0x104CF21E: nsStackLayout::Layout (nsstacklayout.cpp:342)
   by 0x106D6283: nsBoxFrame::DoLayout (nsboxframe.cpp:951)
   by 0x104CDA77: nsIFrame::Layout (nsbox.cpp:566)
   by 0x106D5CC8: nsBoxFrame::Reflow (nsboxframe.cpp:748)
   by 0x1055E82B: nsRootBoxFrame::Reflow (nsrootboxframe.cpp:236)
   by 0x105407BF: nsContainerFrame::ReflowChild (nscontainerframe.cpp:736)
   by 0x106FE1A0: ViewportFrame::Reflow (nsviewportframe.cpp:285)
   by 0x10297C98: PresShell::DoReflow (nspresshell.cpp:7215)
   by 0x10298149: PresShell::ProcessReflowCommands (nspresshell.cpp:7348)
   by 0x10290AF3: PresShell::FlushPendingNotifications (nspresshell.cpp:4674)
   by 0x1029087D: PresShell::HandlePostedReflowCallbacks (nspresshell.cpp:4566)
   by 0x102978AD: PresShell::DidDoReflow (nspresshell.cpp:7097)
   by 0x102981BA: PresShell::ProcessReflowCommands (nspresshell.cpp:7360)
   by 0x10290AF3: PresShell::FlushPendingNotifications (nspresshell.cpp:4674)
   by 0x1051D228: nsRefreshDriver::Notify (nsrefreshdriver.cpp:224)
   by 0x10B9C908: nsTimerImpl::Fire (nstimerimpl.cpp:430)
   by 0x10B9CA46: nsTimerEvent::Run (nstimerimpl.cpp:519)
   by 0x10B8FE22: nsThread::ProcessNextEvent (nsthread.cpp:527)
   by 0x10B570DB: NS_ProcessNextEvent_P (nsthreadutils.cpp:250)
   by 0x10CF0611: mozilla::ipc::MessagePump::Run (messagepump.cpp:118)
   by 0x10BCD59A: MessageLoop::RunInternal (message_loop.cc:216)
   by 0x10BCD4F4: MessageLoop::RunHandler (message_loop.cc:199)
   by 0x10BCD442: MessageLoop::Run (message_loop.cc:173)
   by 0x10A92788: nsBaseAppShell::Run (nsbaseappshell.cpp:174)
   by 0x10A91F21: nsAppShell::Run (nsappshell.cpp:239)
   by 0x108F7430: nsAppStartup::Run (nsappstartup.cpp:182)
   by 0x10009F6C: XRE_main (nsapprunner.cpp:3548)
   by 0x401F39: NS_internal_main (nsbrowserapp.cpp:158)
   by 0x40186D: wmain (nswindowswmain.cpp:120)
   by 0x402649: __tmainCRTStartup (crtexe.c:583)
   by 0x7149E74: start_process (process.c:1025)
   by 0x701ED63: ??? (in /space2/sewardj/MOZ/WINE/Inst/lib/wine/ntdll.dll.so)
   by 0x701EF3F: call_thread_entry_point (signal_i386.c:2422)
   by 0x6FFB9C9: start_process (loader.c:2612)
   by 0x68BFEDC: ??? (in /space2/sewardj/MOZ/WINE/Inst/lib/libwine.so.1.0)
 Address 0x0 is not stack'd, malloc'd or (recently) free'd
Comment 30 neil@parkwaycc.co.uk 2010-04-01 09:23:20 PDT
(In reply to comment #29)
> 3941:  if (mOverflow.mType == NS_FRAME_OVERFLOW_LARGE) {
> 3942:    // there is an overflow rect, and it's not stored as deltas but as
> 3943:    // a separately-allocated rect
> 3944:    return
> *const_cast<nsIFrame*>(this)->GetOverflowAreaProperty(PR_FALSE);
> 3945:  }
> 
> I don't see how 3944 can cause a call to nsRect::nsRect, maybe some
> copy constructor strangeness?
"return" implicitly invokes the copy constructor, although compilers are specifically allowed to optimise it away in much the same way that they are allowed to optimise away the implicit copy constructor in Foo foo = bar;

> return *const_cast<nsIFrame*>(this)->GetOverflowAreaProperty(PR_FALSE);
Although it looks like this is failing because there is no overflow area property, this is unlikely because it only fails with the one compiler. So maybe it's the code used to retrieve the property that is confusing the compiler? It could perhaps be written something like this:
return *static_cast<nsRect*>(Properties().Get(OverflowAreaProperty()));
Comment 31 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-04-01 09:56:29 PDT
Can you step through or put printfs into nsIFrame::GetOverflowAreaProperty and see what path it's following and what it's returning?
Comment 32 Bas Schouten (:bas.schouten) 2010-04-01 10:27:08 PDT
(In reply to comment #31)
> Can you step through or put printfs into nsIFrame::GetOverflowAreaProperty and
> see what path it's following and what it's returning?

It returns with NULL because mLastEntry is NULL. I have no idea what that means though.
Comment 33 Philip Chee 2010-04-01 10:50:08 PDT
> If possible, I would like By-Tor or someone who can reproduce this bug to
> confirm that compiling after running

> hg update -r 1f811ed26219

> creates a successful build,
FWIW, I can confirm this part.
Comment 34 Julian Seward [:jseward] 2010-04-01 11:16:03 PDT
(In reply to comment #32)
> It returns with NULL because mLastEntry is NULL. I have no idea what that means
> though.

I don't see a mLastEntry in
nsIFrame::GetOverflowAreaProperty(PRBool).  Anyway, for me the NULL
is due to the "return nsnull;" right at the end of the function,
and because the caller passes aCreateifNecessary as FALSE.
Comment 35 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-01 13:54:04 PDT
If it's returning null because mLastEntry is null in FramePropertyTable, that means that we cached the previous lookup for the property list of the frame, and it didn't have a property list.

But if mOverflow.mType == NS_FRAME_OVERFLOW_LARGE, then we should have definitely stored an OverflowAreaProperty() for that frame.

Maybe there's a bug in FramePropertyTable? In FramePropertyTable::Get, try replacing "if (mLastFrame != aFrame)" with "if (1)" to bypass the cache.

It might be worth checking that OverflowAreaProperty() returns the same value every time we call it. If there's a compiler bug, I suspect it might be accidentally duplicating the static local.

Apart from these guesses, the thing to do is to check that when we set the frame's mOverflow.mType to NS_FRAME_OVERFLOW_LARGE, we are setting its OverflowAreaProperty(). If we are setting it, then we need to check that that property is never removed from the frame. If it's never removed from the frame, we need to see why it's not being found by the call to GetOverflowAreaProperty. This would all be pretty easy with record and replay and the right build, unfortunately it's already a holiday here and I'm not in the office...
Comment 36 Tanner M. Young [:tmyoung] 2010-04-01 20:03:17 PDT
> Maybe there's a bug in FramePropertyTable? In FramePropertyTable::Get, try
> replacing "if (mLastFrame != aFrame)" with "if (1)" to bypass the cache.

I quickly tried that and it didn't make any difference in the crash stack.
Comment 37 Julian Seward [:jseward] 2010-04-02 03:38:07 PDT
(In reply to comment #36)
> > Maybe there's a bug in FramePropertyTable? In FramePropertyTable::Get, try
> > replacing "if (mLastFrame != aFrame)" with "if (1)" to bypass the cache.
> 
> I quickly tried that and it didn't make any difference in the crash stack.

Ditto.

> It might be worth checking that OverflowAreaProperty() returns the same value
> every time we call it.

I decorated nsIFrame::GetOverflowAreaProperty as shown below, and got
the following output, which suggests that that it doesn't:


QQQ-1a: OverflowAreaProperty returned 00000000
QQQ3 017EE400
QQQ-1a: OverflowAreaProperty returned 017EE400
QQQ2 017EE400
QQQ-1a: OverflowAreaProperty returned 00000000
QQQ3 017EE418
QQQ-1a: OverflowAreaProperty returned 017EE418
QQQ2 017EE418
QQQ-1a: OverflowAreaProperty returned 00000000
QQQ3 017F47F0
QQQ-1a: OverflowAreaProperty returned 00000000
QQQ3 017F4830
QQQ-1a: OverflowAreaProperty returned 017F4830
QQQ2 017F4830
QQQ-1a: OverflowAreaProperty returned 00000000
QQQ4
(crash)


nsRect*
nsIFrame::GetOverflowAreaProperty(PRBool aCreateIfNecessary) 
{
  if (!((mOverflow.mType == NS_FRAME_OVERFLOW_LARGE) ||
        aCreateIfNecessary)) {
    printf("QQQ1\n");
    return nsnull;
  }

  FrameProperties props = Properties();
  void *value = props.Get(OverflowAreaProperty());
  printf("QQQ-1a: OverflowAreaProperty returned %p\n", value);

  if (value) {
    printf("QQQ2 %p\n", value);
    return (nsRect*)value;  // the property already exists
  } else if (aCreateIfNecessary) {
    // The property isn't set yet, so allocate a new rect, set the property,
    // and return the newly allocated rect
    nsRect*  overflow = new nsRect(0, 0, 0, 0);
    props.Set(OverflowAreaProperty(), overflow);
    printf("QQQ3 %p\n", overflow);
    return overflow;
  }

  NS_NOTREACHED("Frame abuses GetOverflowAreaProperty()");
  printf("QQQ4\n");
  return nsnull;
}
Comment 38 neil@parkwaycc.co.uk 2010-04-02 04:35:41 PDT
I think this is a union aliasing bug.
> // It's a "small" overflow area so we store the deltas for each edge
> // directly in the frame, rather than allocating a separate rect.
> // Note that we do NOT store in this way if *all* the deltas are zero,
> // as that would be indistinguishable from the complete absence of
> // an overflow rect.
> DeleteProperty(nsGkAtoms::overflowAreaProperty);
> mOverflow.mDeltas.mLeft   = l;
> mOverflow.mDeltas.mTop    = t;
> mOverflow.mDeltas.mRight  = r;
> mOverflow.mDeltas.mBottom = b;
This is a union, so we're expecting these values to overwrite the mType. But that's an aliasing error, so the compiler optimizes the overwrite away.
Comment 39 Chris Pearce (:cpearce) 2010-04-02 13:42:21 PDT
[09:34]	<cpearce>	does comm-central build & run when built with VC2008 atm? I know mozilla-central still doesn't...
[09:35]	<cjones>	cpearce, eh? i build m-c exclusively with vs2008, no issues
[09:35]	<bent>	me too
[09:36]	<cpearce>	bug 555727, you can build but FF crashes on startup...
[09:36]	<cpearce>	are you guys using vs2008 pro?
[09:36]	<cjones>	i am, yeah
[09:36]	<bent>	yes...
[09:36]	<cpearce>	I'm using express.
[09:36]	<bent>	hm
[09:36]	<sewardj-laptop>	cpearce: i'm using express too
[09:37]	<cpearce>	ho humm
[09:37]	<sewardj-laptop>	cjones: so you have a successful build with vs2008 pro in the last 24 hours?
[09:37]	<cjones>	sewardj-laptop, last hour in fact
[09:37]	<sewardj-laptop>	that can actually start up?
[09:38]	<sewardj-laptop>	cpearce: then maybe we should record that in 555727
[...]
[09:38]	<sdwilsh>	yeah, mine works too from this morning
[...]
[09:40]	* sdwilsh	is on pro


Looks like VC Express is the common factor, it seems to work when built from VS2008 Pro.
Comment 40 Bas Schouten (:bas.schouten) 2010-04-02 14:53:51 PDT
I work on VS2008 pro and have the issue... could be that service packs are a factor.
Comment 41 Bas Schouten (:bas.schouten) 2010-04-02 15:08:02 PDT
FYI:

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
Comment 42 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-04-02 15:09:34 PDT
(In reply to comment #37)
> I decorated nsIFrame::GetOverflowAreaProperty as shown below, and got
> the following output, which suggests that that it doesn't:

I think roc's question was about whether OverflowAreaProperty() itself is returning different things, not things that use it.  (If it were compiled in multiple translation units such that it returned different statics, this could happen...)

(In reply to comment #38)
> This is a union, so we're expecting these values to overwrite the mType. But
> that's an aliasing error, so the compiler optimizes the overwrite away.

Why is that an aliasing error?  We're talking about two values in the same union, which compilers should handle correctly.
Comment 43 Bill Gianopoulos [:WG9s] 2010-04-02 16:08:17 PDT
I wonder if the issue really is not that the area is not being properly overwritten, but instead that when this is not a pointer, due to some optimization path it is being dereferenced as a pointer anyway.
Comment 44 Bill Gianopoulos [:WG9s] 2010-04-02 16:18:10 PDT
I tried bypassing the optimization for small overflow areas and always doing the code for larger overflow and that does not appear to help at all.
Comment 45 Bill Gianopoulos [:WG9s] 2010-04-02 16:19:51 PDT
Created attachment 436799 [details] [diff] [review]
patch that does not help

For reference, this is the patch I applied that did not help.
Comment 46 Tanner M. Young [:tmyoung] 2010-04-02 16:26:58 PDT
  if (l <= NS_FRAME_OVERFLOW_DELTA_MAX &&
      t <= NS_FRAME_OVERFLOW_DELTA_MAX &&
      r <= NS_FRAME_OVERFLOW_DELTA_MAX &&
      b <= NS_FRAME_OVERFLOW_DELTA_MAX &&
      (l | t | r | b) != 0) {
    // It's a "small" overflow area so we store the deltas for each edge
    // directly in the frame, rather than allocating a separate rect.
    // Note that we do NOT store in this way if *all* the deltas are zero,
    // as that would be indistinguishable from the complete absence of
    // an overflow rect.
+   mOverflow.mType = NS_FRAME_OVERFLOW_NONE;
    Properties().Delete(OverflowAreaProperty());
    mOverflow.mDeltas.mLeft   = l;
    mOverflow.mDeltas.mTop    = t;
    mOverflow.mDeltas.mRight  = r;
    mOverflow.mDeltas.mBottom = b;
  }

The above code didn't help either.
Comment 47 Ted Mielczarek [:ted.mielczarek] 2010-04-02 17:26:06 PDT
(In reply to comment #41)
> FYI:
> 
> Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
> Copyright (C) Microsoft Corporation.  All rights reserved.

FWIW, VC9 and VC9SP1 have the exact same version number (down to the .01).
Comment 48 Philip Chee 2010-04-02 18:47:37 PDT
> Looks like VC Express is the common factor, it seems to work when built from
> VS2008 Pro.

I have both installed and my c-c build is still crashing on startup. I'm pretty sure that mozilla-build is picking up VC9Pro. One possible issue is that I didn't do a default install of VS2008. I used the custom installl and deselected bits that I did not expect to use such as MSSQL, Silverlight, etc. Possibly I'm missing some library?
Comment 49 ot 2010-04-02 21:15:59 PDT
VC9(x64 compiler) and VC10(X86 and X64 compiler) crash, too. Everything clash in nsIFrame::GetOverflowRect().

Microsoft(R) C/C++ Optimizing Compiler Version 15.00.30729.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30128.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30128.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.
Comment 50 Bill Gianopoulos [:WG9s] 2010-04-03 06:54:52 PDT
In the course of muddling through this code trying to follow what might be happening, I had to reread the following convoluted if over a dozen times to figure out what it was trying to do and if that made sense:

nsRect*
nsIFrame::GetOverflowAreaProperty(PRBool aCreateIfNecessary)
{
  if (!((mOverflow.mType == NS_FRAME_OVERFLOW_LARGE) ||
        aCreateIfNecessary)) {

Shouldn't his just be:

  if (mOverflow.mtype != NS_FRAME_OVERFLOW_LARGE && !aCreateIfNecessary) {

Which seems much clearer (at least to me) and fits on one-line.
Comment 51 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-04-03 07:11:19 PDT
!(X || Y) is equivalent to (!X && !Y), yes.

I'm not sure it's more readable though.  Does changing the written condition affect the problem?
Comment 52 Bill Gianopoulos [:WG9s] 2010-04-03 07:24:14 PDT
(In reply to comment #51)
> !(X || Y) is equivalent to (!X && !Y), yes.
> 

Yes, but just as in English grammar, avoiding double negatives is usually advisable.

> I'm not sure it's more readable though.  Does changing the written condition
> affect the problem?

Well, no the compiler can certainly parse that correctly.  Just me that had to look at it a few times, despite the fact that at one time (obviously long ago) I wrote something to convert FORTRAN if statements from a non-standard format to ANSI standard format that ended up creating things that looked exactly like this if.
Comment 53 Bill Gianopoulos [:WG9s] 2010-04-03 08:18:12 PDT
Created attachment 436865 [details] [diff] [review]
the fix

This trivial patch fixes the issue for me.
Comment 54 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-04-03 08:43:49 PDT
Looks fine, except I'd like to see the change be at least #ifdef _MSC_VER, since |const| can have advantages as to what section of the binary things go in (although actually probably not in this case).  That probably means either #ifdef-ing the whole macro or temporarily defining SHOULD_BE_CONST to empty ifdef _MSC_VER and then #undef SHOULD_BE_CONST after defining the macro (I probably slightly prefer the latter).  r=dbaron with that.

(Did you confirm that what's happening is that OverflowAreaProperty() returns different pointers depending on what translation unit it's in?)
Comment 55 neil@parkwaycc.co.uk 2010-04-03 09:26:22 PDT
(In reply to comment #42)
> (In reply to comment #38)
> > This is a union, so we're expecting these values to overwrite the mType. But
> > that's an aliasing error, so the compiler optimizes the overwrite away.
> Why is that an aliasing error?  We're talking about two values in the same
> union, which compilers should handle correctly.
Well, it turned out to be irrelevant here, but my understanding was that the compiler is allowed to reorder reads with writes to different union members.
Comment 56 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-04-03 09:34:40 PDT
(In reply to comment #55)
> Well, it turned out to be irrelevant here, but my understanding was that the
> compiler is allowed to reorder reads with writes to different union members.

No, it's allowed to reorder reads/writes to things that are *not* allowed to alias each other.  Members of a union are allowed to alias each other.
Comment 57 Bill Gianopoulos [:WG9s] 2010-04-03 10:23:17 PDT
(In reply to comment #54)
> (Did you confirm that what's happening is that OverflowAreaProperty() returns
> different pointers depending on what translation unit it's in?)

NO, I got stuck between not being able to get enough information with a non-debug build and not being able to reproduce the failure with a debug build.
Comment 58 Bill Gianopoulos [:WG9s] 2010-04-03 10:25:39 PDT
(In reply to comment #54)
> Looks fine, except I'd like to see the change be at least #ifdef _MSC_VER,
> since |const| can have advantages as to what section of the binary things go in
> (although actually probably not in this case).  That probably means either
> #ifdef-ing the whole macro or temporarily defining SHOULD_BE_CONST to empty
> ifdef _MSC_VER and then #undef SHOULD_BE_CONST after defining the macro (I
> probably slightly prefer the latter).  r=dbaron with that.

As soon as I verify the builds I have in progress all work I will be attaching a new patch the if-defs the whole macro.  I could not get the other approach to build.  Seems it did not like nesting defines or at least not nesting defines in the case where it impacts the type definition of a variable.
Comment 59 Bill Gianopoulos [:WG9s] 2010-04-03 11:30:45 PDT
Created attachment 436872 [details] [diff] [review]
better patch

This includes the ifdefs as well as adding an XXX comment referring to this bug.
Comment 60 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-04-03 12:10:25 PDT
Comment on attachment 436872 [details] [diff] [review]
better patch


>+// XXX Workaround MSVC issue.  See bug 555727.

Maybe say "Workaround MSVC issue by making the static FramePropertyDescriptor non-const" so people can spot the difference between the two halves.

r=dbaron with that
Comment 61 Bill Gianopoulos [:WG9s] 2010-04-03 13:02:36 PDT
Created attachment 436877 [details] [diff] [review]
Patch for checkin

With comment change suggested in review comment.
Comment 62 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-04-03 13:26:52 PDT
http://hg.mozilla.org/mozilla-central/rev/d309d665cb76
Comment 63 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-03 19:57:02 PDT
So as far as we can tell this was a Microsoft compiler bug, right?
Comment 64 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-04-03 19:57:14 PDT
(Thanks for helping out here!)
Comment 65 Ted Mielczarek [:ted.mielczarek] 2010-04-04 05:50:08 PDT
If it's a compiler bug, and you can produce a testcase, it'd be great if you can file it at http://connect.microsoft.com/

I actually filed one not long ago and they fixed it (for the release after VC2010).
Comment 66 Tanner M. Young [:tmyoung] 2010-04-05 09:51:19 PDT
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.3a4pre) Gecko/20100405 Minefield/3.7a4pre

Verified Fixed.
Comment 67 Kyle Huey [:khuey] (khuey@mozilla.com) 2010-06-01 10:13:53 PDT
(In reply to comment #65)
> If it's a compiler bug, and you can produce a testcase, it'd be great if you
> can file it at http://connect.microsoft.com/
> 
> I actually filed one not long ago and they fixed it (for the release after
> VC2010).

I don't think this was a compiler bug.  See http://msdn.microsoft.com/en-us/library/bxwfs976%28VS.80%29.aspx

"/OPT:ICF can result in the same address being assigned to different functions or read only data members (const variables compiled with /Gy). So, /OPT:ICF can break a program that depends on the address of functions or read-only data members being different."

I don't have the standard in front of me but I would imagine that we're not allowed to assume much about where const data is in memory.  If that's what we were doing in this instance we could run into this problem on other linkers in the future ...
Comment 68 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-01 11:45:23 PDT
I don't think that documentation is relevant here. The property code doesn't rely on the addresses of identical read-only data members being different.
Comment 69 Jonathan Kew (:jfkthame) 2012-04-26 08:50:22 PDT
Re-opening this bug, as I'm seeing the same problem on OS X 10.7 in optimized builds (using the default Apple toolchain). The initial symptom was crashing at startup due to a null dereference at http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsFrame.cpp#6622, due to |overflow| being null (i.e. the frame had no OverflowAreasProperty(), even though mOverflow.mType == NS_FRAME_OVERFLOW_LARGE).

Eventually tracked the problem down to the fact that in the optimized build, OverflowAreasProperty() and PreTransformOverflowAreasProperty() are returning the same pointer value, as confirmed by adding printfs to write out the values (since using gdb on the optimized build was proving painful).

The result is that code (e.g. in nsIFrame::FinishAndStoreOverflow) that wants to delete the PreTransformOverflowAreasProperty() is actually removing the frame's OverflowAreasProperty(), but not clearing the mOverflow.mType field, which leaves the frame in an inconsistent state.

In nsIFrame.h, we have:

  NS_DECLARE_FRAME_PROPERTY(PreTransformOverflowAreasProperty,
                            DestroyOverflowAreas)

which means we'll have a class static function

  const FramePropertyDescriptor*
  nsIFrame::PreTransformOverflowAreasProperty() {
    static const FramePropertyDescriptor descriptor = {
      DestroyOverflowAreas, nsnull
    };
    return &descriptor;
  }

in nsIFrame. And in nsFrame.cpp, we have

  NS_DECLARE_FRAME_PROPERTY(OverflowAreasProperty,
                          nsIFrame::DestroyOverflowAreas)

which gives us a file-static function

  static const FramePropertyDescriptor*
  OverflowAreasProperty() {
    static const FramePropertyDescriptor descriptor = {
      nsIFrame::DestroyOverflowAreas, nsnull
    };
    return &descriptor;
  }

Note that these two functions each have a local static const variable |descriptor|, whose address they intend to return for use as a unique identifier for the property. Apparently with optimization enabled, the linker is deciding (accurately) that these two static const objects are identical, and folding them into a single instance, so that the two functions return the same address. Boom!

Interestingly, this folding isn't happening for multiple properties all defined (with identical FramePropertyDescriptor statics) within nsIFrame; so for example PreTransformOverflowAreasProperty() and InitialOverflowProperty() continue to return distinct addresses. It only folds the file-static case from nsFrame.cpp with one of the class-statics in nsIFrame. I don't think this is safe, though; AFAICS there's nothing to guarantee that a future more-aggressively-optimizing compiler/linker can't see that they're identical and fold them together as well. Same goes for a bunch more properties.

The solution that was used for VC9 of removing "const" from the static FramePropertyDescriptor variables also resolves the problem on OS X; the multiple statics no longer get folded together, the Property() functions return unique addresses as intended, and the browser no longer crashes. I think we should just remove the "const" unconditionally, instead of relying on knowledge of exactly which toolchains will or won't perform this identical-const folding.
Comment 70 Jonathan Kew (:jfkthame) 2012-04-26 08:52:14 PDT
Created attachment 618670 [details] [diff] [review]
patch, remove NS_PROPERTY_DESCRIPTOR_CONST to avoid excessive const-folding by eager optimizers
Comment 71 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-26 09:41:19 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #69)
> Re-opening this bug, as I'm seeing the same problem on OS X 10.7 in
> optimized builds (using the default Apple toolchain). The initial symptom

Does that toolchain involve gcc or clang?
Comment 72 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-26 09:47:03 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #69)
> Interestingly, this folding isn't happening for multiple properties all
> defined (with identical FramePropertyDescriptor statics) within nsIFrame; so
> for example PreTransformOverflowAreasProperty() and
> InitialOverflowProperty() continue to return distinct addresses. It only
> folds the file-static case from nsFrame.cpp with one of the class-statics in
> nsIFrame. I don't think this is safe, though; AFAICS there's nothing to
> guarantee that a future more-aggressively-optimizing compiler/linker can't
> see that they're identical and fold them together as well. Same goes for a
> bunch more properties.

I think there's an assumption underlying this argument that this patch makes the code more correct.  I don't buy that.  Without reference to the language standard, I don't see why it would be valid for a compiler to make this optimization when the object is declared |const| but invalid when it's not declared |const| (but still obviously never mutated).  And |const| may well have advantages (e.g., in library loading performance or in performance of these function calls).
Comment 73 Jonathan Kew (:jfkthame) 2012-04-26 09:50:32 PDT
It's whatever our default build system picks up on a vanilla 10.7 installation.... let's see.....

Adding configure options from /Users/jkew/mozdev/mozilla-central/mozconfig-opt:
  --enable-application=browser
  --enable-debug
  --enable-optimize
  --with-ccache=/opt/local/bin/ccache
loading cache ./config.cache
checking host system type... x86_64-apple-darwin11.3.0
checking target system type... x86_64-apple-darwin11.3.0
checking build system type... x86_64-apple-darwin11.3.0
checking for mawk... (cached) gawk
checking for gcc-4.2... (cached) /usr/bin/gcc
checking for g++-4.2... (cached) /usr/bin/g++
checking for perl5... (cached) /opt/local/bin/perl5
checking for gcc... (cached) /usr/bin/gcc
checking whether the C compiler (/usr/bin/gcc  ) works... yes
checking whether the C compiler (/usr/bin/gcc  ) is a cross-compiler... no
checking whether we are using GNU C... (cached) yes
checking whether /usr/bin/gcc accepts -g... (cached) yes
checking for c++... (cached) /usr/bin/g++
checking whether the C++ compiler (/usr/bin/g++  ) works... yes
checking whether the C++ compiler (/usr/bin/g++  ) is a cross-compiler... no
checking whether we are using GNU C++... (cached) yes
checking whether /usr/bin/g++ accepts -g... (cached) yes
...etc

and to be more specific, /usr/bin/gcc reports that it is:

$ /usr/bin/gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)

However, I note that ld reports:

$ ld -v
@(#)PROGRAM:ld  PROJECT:ld64-128.2
llvm version 3.1svn, from Apple Clang 3.1 (build 318.0.58)

which might be significant, if this is a link-time optimization.
Comment 74 Jonathan Kew (:jfkthame) 2012-04-26 09:57:43 PDT
(In reply to David Baron [:dbaron] from comment #72)
> (In reply to Jonathan Kew (:jfkthame) from comment #69)
> > Interestingly, this folding isn't happening for multiple properties all
> > defined (with identical FramePropertyDescriptor statics) within nsIFrame; so
> > for example PreTransformOverflowAreasProperty() and
> > InitialOverflowProperty() continue to return distinct addresses. It only
> > folds the file-static case from nsFrame.cpp with one of the class-statics in
> > nsIFrame. I don't think this is safe, though; AFAICS there's nothing to
> > guarantee that a future more-aggressively-optimizing compiler/linker can't
> > see that they're identical and fold them together as well. Same goes for a
> > bunch more properties.
> 
> I think there's an assumption underlying this argument that this patch makes
> the code more correct.  I don't buy that.  Without reference to the language
> standard, I don't see why it would be valid for a compiler to make this
> optimization when the object is declared |const| but invalid when it's not
> declared |const| (but still obviously never mutated).  And |const| may well
> have advantages (e.g., in library loading performance or in performance of
> these function calls).

Fair point - I wondered about that, though at least for the properties declared in nsIFrame.h (and therefore widely visible across many compilation units) it would take some pretty extensive analysis for an optimizer to confirm that nobody ever modifies the value (through the pointer that the function returns, thus exposing its address).

To be completely safe in the longer term, though, maybe we should re-think what we use as the property identifier?
Comment 75 Jonathan Kew (:jfkthame) 2012-04-26 10:23:02 PDT
Created attachment 618706 [details] [diff] [review]
patch, ensure uniqueness of property descriptors

Here's an alternative that ensures the descriptors are unique by including the property name as a string. This also resolves the problem, and allows us to keep the const - but at the cost of adding extra data to the build.
Comment 76 Nathan Froyd [:froydnj] 2012-04-26 10:43:27 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #75)
> Here's an alternative that ensures the descriptors are unique by including
> the property name as a string. This also resolves the problem, and allows us
> to keep the const - but at the cost of adding extra data to the build.

Please don't do this; now we have extra string data, pointer to string data, and relocation for said pointer for all properties on all platforms to work around a broken Apple toolchain.  That's not a good tradeoff.

I can't find the C++ language standardese that indicates what sort of objects for which an implementation is permitted to do this sort of merging.  CC'ing glandium and rafael, as they might know--and rafael might have some LLVM insight.
Comment 77 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-26 11:06:51 PDT
I fixed this on llvm some time ago:

http://llvm.org/bugs/show_bug.cgi?id=8927
Comment 78 Nathan Froyd [:froydnj] 2012-04-26 11:31:14 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #69)
> Note that these two functions each have a local static const variable
> |descriptor|, whose address they intend to return for use as a unique
> identifier for the property. Apparently with optimization enabled, the
> linker is deciding (accurately) that these two static const objects are
> identical, and folding them into a single instance, so that the two
> functions return the same address. Boom!

Is the folding happening at the compiler level or the linker level?  We might be able to get away with passing -fno-merge-constants for just that file if it's at the compiler level.  (There might be a similar option for the linker, but I am not familiar with the OS X toolchain.)
Comment 79 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-26 11:34:19 PDT
both. The compiler will merge if it can. It will also fail to tell the linker it should not merge them.

Note that llvm-gcc doesn't exist in the upstream project anymore. We should probably just reject it  in configure.
Comment 80 Jonathan Kew (:jfkthame) 2012-04-26 12:34:55 PDT
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #77)
> I fixed this on llvm some time ago:
> 
> http://llvm.org/bugs/show_bug.cgi?id=8927

Aha - so according to the standard as quoted there, it _is_ a bug for two static const variables in different functions to be folded to the same object, if those variables might ever have their addresses compared. Thanks for the clarification.
Comment 81 Jonathan Kew (:jfkthame) 2012-04-26 14:10:38 PDT
So this really is a compiler (and/or linker bug), it seems. Still, it's pretty sad if our codebase won't work with the default dev tools that Apple is shipping for OS X 10.7. I think we need to push a workaround of some kind, rather than expect everyone building on a recent OS X system to set up some alternative toolchain in order to get a working opt build.

Personally, I think taking the patch that just removes the "const" would be a reasonable pragmatic solution at this point; I highly doubt we'd be able to detect any perf hit as a result, and it avoids the problem with minimal effort.
Comment 82 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-26 14:45:18 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #81)
> So this really is a compiler (and/or linker bug), it seems. Still, it's
> pretty sad if our codebase won't work with the default dev tools that Apple
> is shipping for OS X 10.7. I think we need to push a workaround of some
> kind, rather than expect everyone building on a recent OS X system to set up
> some alternative toolchain in order to get a working opt build.

The default is more from our configure script than anywhere else. The default compiler for 10.7 is clang. It is our build system that goes looking for gcc and finds llvm-gcc.
Comment 83 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-26 14:50:38 PDT
I'd be happy to take a patch extending the current const workaround to an additional problematic compiler (and I think your code comments explaining the problem are also better), but I'd rather not drop the |const| across the board.
Comment 84 Jonathan Kew (:jfkthame) 2012-04-27 02:44:24 PDT
Created attachment 618958 [details] [diff] [review]
patch, un-constify property descriptors for apple gcc 4.2.1

OK, here's a version that extends the const-removal to Apple GCC 4.2.1 only, which fixes the opt build on 10.7.

I'm still somewhat concerned the issue may resurface with other compilers/versions. (See Kyle's prediction in comment 67, and the note at http://llvm.org/bugs/show_bug.cgi?id=8927#c2.)
Comment 85 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 05:19:11 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #84)
> Created attachment 618958 [details] [diff] [review]
> patch, un-constify property descriptors for apple gcc 4.2.1

The problem is not gcc 4.2.1, it is llvm-gcc. Make sure this also doesn't change the behavior for clang, which gets this right for some time now.

If you really want to do to this, your best bet is a configure check for the bug.

Given that llvm-gcc in known to be broken and *will not be fixed*, the best option is still to just reject it in configure.
Comment 86 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-27 06:41:36 PDT
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #85)
> (In reply to Jonathan Kew (:jfkthame) from comment #84)
> > Created attachment 618958 [details] [diff] [review]
> > patch, un-constify property descriptors for apple gcc 4.2.1
> 
> The problem is not gcc 4.2.1, it is llvm-gcc. Make sure this also doesn't
> change the behavior for clang, which gets this right for some time now.

Do you know what macro can be used for testing for llvm-gcc?

(Note that a good way to see what macros a compiler defines by default, for
gcc at least, is "gcc -E -dM -x c++ /dev/null".)
Comment 87 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:00:46 PDT
Created attachment 619018 [details]
macros defined by clang
Comment 88 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:01:22 PDT
Created attachment 619019 [details]
macros defined by llvm-gcc
Comment 89 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:01:49 PDT
Created attachment 619020 [details]
macros defined by gcc
Comment 90 Jonathan Kew (:jfkthame) 2012-04-27 07:03:17 PDT
There's an __llvm__, but both llvm-gcc and clang have that defined. However, we could add !__clang__ to avoid affecting that.

So, maybe something like

#if _MSC_VER || (__APPLE__ && __llvm__ && !__clang__)
....don't use const
#endif

Or do you also want to check the specific version number?
Comment 91 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:07:06 PDT
> Do you know what macro can be used for testing for llvm-gcc?

I have attached the macro list. You might be able to use __llvm__ and __clang__, but I still think we shouldn't do it. What I think you should do in something like what we did in mozilla-central/build/autoconf/gcc-pr49911.m4. Write a configure check that looks for the bug directly. This is way more reliable than using builtin macros.

Once you know the compiler has the bug, the best options are IMHO
1 Abort the build. This is not a supported compiler anyway.
2 If the 1 is not an option, change CXXFLAGS to hide the problem
3 If 2 is not an option, define a macro and use that in the source code.
Comment 92 Ted Mielczarek [:ted.mielczarek] 2012-04-27 07:21:33 PDT
Is Lion's XCode shipping clang as well? Maybe we should just be fixing configure to pick that up before gcc on OS X.
Comment 93 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:23:34 PDT
(In reply to Ted Mielczarek [:ted] from comment #92)
> Is Lion's XCode shipping clang as well? Maybe we should just be fixing
> configure to pick that up before gcc on OS X.

It is. I fully agree that we should select that first. I even think configure should abort the build if the uses explicitly sets CXX to llvm-gcc, that is probably a mistake by the user and not something I see a lot of value in supporting.
Comment 94 Bill Gianopoulos [:WG9s] 2012-04-27 07:26:57 PDT
I still like Rafael's idea from comment 85.  Have configure actually do a compile and run and see if the const's get merged and go by that.  That way we will avoid this issue showing up again with yet some other compiler/linker combo.
Comment 95 Jonathan Kew (:jfkthame) 2012-04-27 07:52:38 PDT
(In reply to Bill Gianopoulos [:WG9s] from comment #94)
> I still like Rafael's idea from comment 85.  Have configure actually do a
> compile and run and see if the const's get merged and go by that.  That way
> we will avoid this issue showing up again with yet some other
> compiler/linker combo.

Assuming we can write a configure test that successfully triggers the bug for all vulnerable compilers, which - given the vagaries of optimizers - may not be trivial. I can write a small program that demonstrates it on llvm-gcc-4.2.1, but if it were to depend (for example) on link-time merging across compilation units, it wouldn't be detected by this example.

And what about cross-compilation scenarios, where you can't execute the generated code?

I guess I don't understand why there's such resistance to applying a trivial workaround in our codebase and moving on.

It's not like removing the "const" even has any negative impact on the generated code, at least under llvm-gcc ... the *Property() functions are in effect single-line accessors that just return the address of a global data item, and they get inlined as a simple load-address instruction. I compared the code generated for nsFrame.cpp with and without "const" in NS_DECLARE_FRAME_PROPERTY. Here, for example, is nsIFrame::ClearOverflowRects, when "const" is present:

__ZN8nsIFrame18ClearOverflowRectsEv:
0000000000006538	pushq	%rbp
0000000000006539	movq	%rsp,%rbp
000000000000653c	pushq	%rbx
000000000000653d	subq	$0x08,%rsp
0000000000006541	movl	0x48(%rdi),%eax
0000000000006544	testl	%eax,%eax
0000000000006546	je	0x0000657f
0000000000006548	movq	%rdi,%rbx
000000000000654b	cmpl	$0x000000ff,%eax
0000000000006550	jne	0x00006571
0000000000006552	movq	0x20(%rbx),%rax
0000000000006556	movq	__ZN7nsFrame24SetVerifyStyleTreeEnableEb(%rax),%rax
000000000000655a	movl	$__ZN7nsFramenwEmP12nsIPresShell,%edi
000000000000655f	addq	(%rax),%rdi
0000000000006562	leaq	__ZZN8nsIFrame33PreTransformOverflowAreasPropertyEvE10descriptor(%rip),%rdx
0000000000006569	movq	%rbx,%rsi
000000000000656c	callq	__ZN7mozilla18FramePropertyTable6DeleteEP8nsIFramePKNS_23FramePropertyDescriptorE
0000000000006571	movl	$__ZN7nsFrame16ShowFrameBordersEb,0x48(%rbx)
0000000000006578	movl	$0x00000001,%eax
000000000000657d	jmp	0x00006581
000000000000657f	xorl	%eax,%eax
0000000000006581	addq	$0x08,%rsp
0000000000006585	popq	%rbx
0000000000006586	popq	%rbp
0000000000006587	ret

And here's the same function when "const" has been removed from the descriptor's declaration:

__ZN8nsIFrame18ClearOverflowRectsEv:
0000000000006538	pushq	%rbp
0000000000006539	movq	%rsp,%rbp
000000000000653c	pushq	%rbx
000000000000653d	subq	$0x08,%rsp
0000000000006541	movl	0x48(%rdi),%eax
0000000000006544	testl	%eax,%eax
0000000000006546	je	0x0000657f
0000000000006548	movq	%rdi,%rbx
000000000000654b	cmpl	$0x000000ff,%eax
0000000000006550	jne	0x00006571
0000000000006552	movq	0x20(%rbx),%rax
0000000000006556	movq	__ZN7nsFrame24SetVerifyStyleTreeEnableEb(%rax),%rax
000000000000655a	movl	$__ZN7nsFramenwEmP12nsIPresShell,%edi
000000000000655f	addq	(%rax),%rdi
0000000000006562	leaq	__ZZL21OverflowAreasPropertyvE10descriptor(%rip),%rdx
0000000000006569	movq	%rbx,%rsi
000000000000656c	callq	__ZN7mozilla18FramePropertyTable6DeleteEP8nsIFramePKNS_23FramePropertyDescriptorE
0000000000006571	movl	$__ZN7nsFrame16ShowFrameBordersEb,0x48(%rbx)
0000000000006578	movl	$0x00000001,%eax
000000000000657d	jmp	0x00006581
000000000000657f	xorl	%eax,%eax
0000000000006581	addq	$0x08,%rsp
0000000000006585	popq	%rbx
0000000000006586	popq	%rbp
0000000000006587	ret

The code sequence generated is identical, except that the "const" version has used the wrong symbol in its leaq instruction, because it decided that __ZZL21OverflowAreasPropertyvE10descriptor could be folded with __ZZN8nsIFrame33PreTransformOverflowAreasPropertyEvE10descriptor. 

(Comparing the assembly output reveals that ViewProperty and nsIFrame::SplitSpecialSibling exhibit the same problem, incidentally.)
Comment 96 Jonathan Kew (:jfkthame) 2012-04-27 07:55:01 PDT
(In reply to Ted Mielczarek [:ted] from comment #92)
> Is Lion's XCode shipping clang as well? Maybe we should just be fixing
> configure to pick that up before gcc on OS X.

Do we support building with clang on all versions of OS X where Apple has shipped it, or would we need to make this a version-dependent choice? IIRC, it was available (though not the default compiler) on some earlier releases, but may not have been mature enough for us to use it successfully.
Comment 97 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 08:02:07 PDT
> And what about cross-compilation scenarios, where you can't execute the
> generated code?

This OS X. The tools support the different architectures, so you can check the produced .o file.

Still, why do you care so much about supporting llvm-gcc? You can just abort the build if "$CC -v 2>&1 | grep llvm-gcc" prints anything.

> I guess I don't understand why there's such resistance to applying a trivial
> workaround in our codebase and moving on.

Because it is a workaround on a compiler that is not supported. The bug is that our build system is trying to use it.
Comment 98 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 08:03:20 PDT
> Do we support building with clang on all versions of OS X where Apple has
> shipped it, or would we need to make this a version-dependent choice? IIRC,
> it was available (though not the default compiler) on some earlier releases,
> but may not have been mature enough for us to use it successfully.

So far we don't support clang or llvm-gcc. Only gcc-4.2. You can make configure select that by default if you want.

The main difference is that llvm-gcc will never be supported, it is a dead project.
Comment 99 Jonathan Kew (:jfkthame) 2012-04-27 08:42:47 PDT
AFAICT, my Xcode installation (4.3.2) on OS X 10.7 doesn't include gcc-4.2, only clang or llvm-gcc.
Comment 100 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 08:46:37 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #99)
> AFAICT, my Xcode installation (4.3.2) on OS X 10.7 doesn't include gcc-4.2,
> only clang or llvm-gcc.

I think that is correct. The options we have are

* Telling the user: sorry, you don't have any supported compilers and aborting the build.
* Selecting a second best. llvm-gcc is not supported because it miscompiles firefox and will never be fixed. Clang is not supported yet because the produced code is not as fast as gcc 4.2, so I guess clang is the second best.

I would be OK with a configure patch that on OS X would try to first use gcc-4.2, if that fails try clang and if that fails, abort (no known combinations of OS X and Xcode gets here).
Comment 101 Jonathan Kew (:jfkthame) 2012-04-27 09:17:13 PDT
Created attachment 619072 [details] [diff] [review]
patch, un-constify property descriptors to work around apple llvm-gcc bug

One more shot at this, now targeted at apple llvm-gcc only.

If someone wants to modify the build system so that we use clang instead on Lion, that'd be fine, but I'm no configure wizard. Meanwhile, ISTM this is the simplest way to resolve the immediate issue, so that the default configuration, built on a current Mac, actually produces a working browser.
Comment 102 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 09:22:14 PDT
Comment on attachment 619072 [details] [diff] [review]
patch, un-constify property descriptors to work around apple llvm-gcc bug

If all you want is a working browser, pass CC=clang CXX=clang++.

If you want to fix this, fix the real bug: configure should not be selecting llvm-gcc.
Comment 103 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-27 09:49:44 PDT
Comment on attachment 619072 [details] [diff] [review]
patch, un-constify property descriptors to work around apple llvm-gcc bug

r=dbaron

While it's a good idea to fix configure choose a better default compiler, it's also good for our code to build on more compilers, for various reasons:  it makes it easier for people to get started on Mozilla, it makes it easier for us to do comparisons between compilers, etc.

(I slightly prefer explicitly using defined() rather than depending on the undefined-is-false in the preprocessor; I also think that tends to be our local style most of the time.)
Comment 104 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-27 10:09:54 PDT
Comment on attachment 619072 [details] [diff] [review]
patch, un-constify property descriptors to work around apple llvm-gcc bug

ok, I don't have time for this fight today
Comment 105 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-04-27 10:14:57 PDT
I'd note that:

 * I think it's fine to reject a compiler if it's explicitly chosen and the rejection gives a useful error message

 * we shouldn't reject the compiler we choose by default on a reasonable configuration of a tier 1 platform; if Jonathan is ending up with this by default the fix for that ought to be picking a different compiler by default rather than making the gcc-llvm choice and then rejecting it.
Comment 106 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 10:19:46 PDT
(In reply to David Baron [:dbaron] from comment #105)
> I'd note that:
> 
>  * I think it's fine to reject a compiler if it's explicitly chosen and the
> rejection gives a useful error message
> 
>  * we shouldn't reject the compiler we choose by default on a reasonable
> configuration of a tier 1 platform; if Jonathan is ending up with this by
> default the fix for that ought to be picking a different compiler by default
> rather than making the gcc-llvm choice and then rejecting it.

I agree with both. Patch in a sec.
Comment 107 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 10:26:39 PDT
Created attachment 619086 [details] [diff] [review]
select clang over llvm-gcc

https://tbpl.mozilla.org/?tree=Try&rev=07992586a1c9

I think this covers the three cases I can think of

* really old OS X: gcc-4.2 and clang don't exist, we end up with gcc which is 4.0
* after that but before xcode 4.2, we find gcc-4.2 and stay with it
* very recent Xcode. We don't find gcc 4.2, but we find clang.

If for some reason we still end up setting CC and CXX to llvm-gcc, we reject it.
Comment 108 Ted Mielczarek [:ted.mielczarek] 2012-05-03 12:49:07 PDT
Comment on attachment 619086 [details] [diff] [review]
select clang over llvm-gcc

Review of attachment 619086 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/compiler-opts.m4
@@ +1,5 @@
>  dnl Add compiler specific options
>  
> +AC_DEFUN([MOZ_DEFAULT_COMPILER],
> +[
> +dnl Default to MSVC for win32 and gcc for darwin

This comment seems outdated re: Darwin.
Comment 109 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-03 14:04:36 PDT
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cd1e10a767fd
Comment 110 Ed Morley [:emorley] 2012-05-04 09:28:54 PDT
https://hg.mozilla.org/mozilla-central/rev/cd1e10a767fd
Comment 111 Markus Stange [:mstange] [away until June 27] 2012-05-07 06:52:54 PDT
*** Bug 732467 has been marked as a duplicate of this bug. ***
Comment 112 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-07-05 10:10:39 PDT
*** Bug 739123 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.