Closed Bug 1232696 Opened 8 years ago Closed 7 years ago

Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds

Categories

(Firefox Build System :: General, defect)

43 Branch
defect
Not set
normal

Tracking

(firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: mliska, Assigned: jseward)

References

(Blocks 1 open bug)

Details

Attachments

(9 files, 9 obsolete files)

957 bytes, text/plain
Details
381.99 KB, text/x-log
Details
29.74 KB, text/plain
Details
107.08 KB, text/plain
Details
4.13 KB, patch
Details | Diff | Splinter Review
18.52 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
11.33 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
9.47 KB, patch
Details | Diff | Splinter Review
4.63 KB, patch
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36

Steps to reproduce:

Compiling Firefox using latest GCC 6 version (pre-release candidate) causes segmentation fault during start up:




Actual results:

$ gdb ./obj-classic/dist/bin/firefox

(gdb) bt
#0  0x0000000000000000 in  ()
#1  0x00007ffff17893cb in nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (this=<optimized out>, aContractID=0x7ffff483cd60 "@mozilla.org/network/safe-file-output-stream;1", aDelegate=0xc842e0, aIID=..., aResult=aResult@entry=0x7fffffffc4b8)
    at /home/marxin/Programming/gecko-dev/xpcom/components/nsComponentManager.cpp:1223
#2  0x00007ffff1789492 in nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (this=<optimized out>, aContractID=<optimized out>, aDelegate=<optimized out>, aIID=..., aResult=aResult@entry=0x7fffffffc4b8) at /home/marxin/Programming/gecko-dev/xpcom/components/nsComponentManager.cpp:1196
#3  0x00007ffff17ba02b in nsCreateInstanceByContractID::operator()(nsID const&, void**) const (aResult=0x7fffffffc4b8, aIID=..., aDelegate=<optimized out>, aContractID=<optimized out>) at /home/marxin/Programming/gecko-dev/xpcom/glue/nsComponentManagerUtils.cpp:151
#4  0x00007ffff17ba02b in nsCreateInstanceByContractID::operator()(nsID const&, void**) const (this=0x7fffffffc4f0, aIID=..., aInstancePtr=0x7fffffffc4b8)
    at /home/marxin/Programming/gecko-dev/xpcom/glue/nsComponentManagerUtils.cpp:197
#5  0x00007ffff17b1f18 in nsCOMPtr_base::assign_from_helper(nsCOMPtr_helper const&, nsID const&) (this=0x7fffffffc4e0, aHelper=..., aIID=...)
    at /home/marxin/Programming/gecko-dev/xpcom/glue/nsCOMPtr.cpp:125
#6  0x00007ffff1817a17 in NS_NewSafeLocalFileOutputStream(nsIOutputStream**, nsIFile*, int, int, int) (aHelper=..., this=0x7fffffffc4e0) at ../../dist/include/nsCOMPtr.h:559
#7  0x00007ffff1817a17 in NS_NewSafeLocalFileOutputStream(nsIOutputStream**, nsIFile*, int, int, int) (result=0x7fffffffc570, file=file@entry=0xc83c70, ioFlags=ioFlags@entry=-1, perm=perm@entry=384, behaviorFlags=behaviorFlags@entry=0) at /home/marxin/Programming/gecko-dev/netwerk/base/nsNetUtil.cpp:1012
#8  0x00007ffff17c10f3 in mozilla::Preferences::WritePrefFile(nsIFile*) (this=<optimized out>, aFile=0xc83c70)
    at /home/marxin/Programming/gecko-dev/modules/libpref/Preferences.cpp:950
#9  0x00007ffff3b74cd6 in nsAppStartup::TrackStartupCrashBegin(bool*) (this=0x8bf540, aIsSafeModeNecessary=0x7fffffffc75f)
    at /home/marxin/Programming/gecko-dev/toolkit/components/startup/nsAppStartup.cpp:911
#10 0x00007ffff3bceddf in nsXREDirProvider::DoStartup() (this=this@entry=0x7fffffffc990) at /home/marxin/Programming/gecko-dev/toolkit/xre/nsXREDirProvider.cpp:845
#11 0x00007ffff3bc841a in XREMain::XRE_mainRun() (this=this@entry=0x7fffffffc950) at /home/marxin/Programming/gecko-dev/toolkit/xre/nsAppRunner.cpp:4154
#12 0x00007ffff3bc90e4 in XREMain::XRE_main(int, char**, nsXREAppData const*) (this=this@entry=0x7fffffffc950, argc=argc@entry=1, argv=argv@entry=0x7fffffffddc8, aAppData=aAppData@entry=0x7fffffffcb50) at /home/marxin/Programming/gecko-dev/toolkit/xre/nsAppRunner.cpp:4381
#13 0x00007ffff3bc93d1 in XRE_main(int, char**, nsXREAppData const*, uint32_t) (argc=1, argv=0x7fffffffddc8, aAppData=0x7fffffffcb50, aFlags=<optimized out>)
    at /home/marxin/Programming/gecko-dev/toolkit/xre/nsAppRunner.cpp:4483
#14 0x0000000000404a75 in do_main(int, char**, nsIFile*) (argc=argc@entry=1, argv=argv@entry=0x7fffffffddc8, xreDirectory=0x465b20)
    at /home/marxin/Programming/gecko-dev/browser/app/nsBrowserApp.cpp:212
#15 0x0000000000404381 in main(int, char**) (argc=1, argv=0x7fffffffddc8) at /home/marxin/Programming/gecko-dev/browser/app/nsBrowserApp.cpp:399

valgrind log:
....

==6228== Mismatched free() / delete / delete []
==6228==    at 0x4C2A7FB: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6228==    by 0x944957C: operator delete (mozalloc.h:210)
==6228==    by 0x944957C: deallocate (new_allocator.h:110)
==6228==    by 0x944957C: deallocate (alloc_traits.h:308)
==6228==    by 0x944957C: _M_destroy (basic_string.h:186)
==6228==    by 0x944957C: _M_dispose (basic_string.h:181)
==6228==    by 0x944957C: ~basic_string (basic_string.h:559)
==6228==    by 0x944957C: (anonymous namespace)::HistogramGet(char const*, char const*, unsigned int, unsigned int, unsigned int, unsigned int, bool, base::Histogram**) (Telemetry.cpp:988)
==6228==    by 0x944991C: (anonymous namespace)::CloneHistogram(nsACString_internal const&, mozilla::Telemetry::ID, base::Histogram&) (Telemetry.cpp:1057)
==6228==    by 0x9449AE1: (anonymous namespace)::GetSubsessionHistogram(base::Histogram&) (Telemetry.cpp:1110)
==6228==    by 0x9449B0F: (anonymous namespace)::HistogramAdd(base::Histogram&, int, unsigned int) [clone .part.282] (Telemetry.cpp:1124)
==6228==    by 0x944B783: HistogramAdd (Telemetry.cpp:3779)
==6228==    by 0x944B783: mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) (Telemetry.cpp:3777)
==6228==    by 0x9441C38: nsAppStartup::TrackStartupCrashBegin(bool*) (nsAppStartup.cpp:887)
==6228==    by 0x949BDDE: nsXREDirProvider::DoStartup() (nsXREDirProvider.cpp:845)
==6228==    by 0x9495419: XREMain::XRE_mainRun() (nsAppRunner.cpp:4154)
==6228==    by 0x94960E3: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:4381)
==6228==    by 0x94963D0: XRE_main (nsAppRunner.cpp:4483)
==6228==    by 0x404A74: do_main(int, char**, nsIFile*) (nsBrowserApp.cpp:212)
==6228==  Address 0x22ed8480 is 0 bytes inside a block of size 27 alloc'd
==6228==    at 0x4C2A2FF: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6228==    by 0x94466C6: void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.248] (basic_string.tcc:219)
==6228==    by 0x94495E4: _M_construct_aux<char const*> (basic_string.h:196)
==6228==    by 0x94495E4: _M_construct<char const*> (basic_string.h:215)
==6228==    by 0x94495E4: basic_string (basic_string.h:457)
==6228==    by 0x94495E4: (anonymous namespace)::HistogramGet(char const*, char const*, unsigned int, unsigned int, unsigned int, unsigned int, bool, base::Histogram**) (Telemetry.cpp:985)
==6228==    by 0x944991C: (anonymous namespace)::CloneHistogram(nsACString_internal const&, mozilla::Telemetry::ID, base::Histogram&) (Telemetry.cpp:1057)
==6228==    by 0x9449AE1: (anonymous namespace)::GetSubsessionHistogram(base::Histogram&) (Telemetry.cpp:1110)
==6228==    by 0x9449B0F: (anonymous namespace)::HistogramAdd(base::Histogram&, int, unsigned int) [clone .part.282] (Telemetry.cpp:1124)
==6228==    by 0x944B783: HistogramAdd (Telemetry.cpp:3779)
==6228==    by 0x944B783: mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) (Telemetry.cpp:3777)
==6228==    by 0x9441C38: nsAppStartup::TrackStartupCrashBegin(bool*) (nsAppStartup.cpp:887)
==6228==    by 0x949BDDE: nsXREDirProvider::DoStartup() (nsXREDirProvider.cpp:845)
==6228==    by 0x9495419: XREMain::XRE_mainRun() (nsAppRunner.cpp:4154)
==6228==    by 0x94960E3: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:4381)
==6228==    by 0x94963D0: XRE_main (nsAppRunner.cpp:4483)
==6228== 
==6228== Jump to the invalid address stated on the next line
==6228==    at 0x0: ???
==6228==    by 0x70563CA: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) [clone .part.40] (nsComponentManager.cpp:1223)
==6228==    by 0x708702A: CallCreateInstance (nsComponentManagerUtils.cpp:151)
==6228==    by 0x708702A: nsCreateInstanceByContractID::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:197)
==6228==    by 0x707EF17: nsCOMPtr_base::assign_from_helper(nsCOMPtr_helper const&, nsID const&) (nsCOMPtr.cpp:125)
==6228==    by 0x70E4A16: nsCOMPtr (nsCOMPtr.h:559)
==6228==    by 0x70E4A16: NS_NewSafeLocalFileOutputStream(nsIOutputStream**, nsIFile*, int, int, int) (nsNetUtil.cpp:1012)
==6228==    by 0x708E0F2: mozilla::Preferences::WritePrefFile(nsIFile*) (Preferences.cpp:950)
==6228==    by 0x9441CD5: nsAppStartup::TrackStartupCrashBegin(bool*) (nsAppStartup.cpp:911)
==6228==    by 0x949BDDE: nsXREDirProvider::DoStartup() (nsXREDirProvider.cpp:845)
==6228==    by 0x9495419: XREMain::XRE_mainRun() (nsAppRunner.cpp:4154)
==6228==    by 0x94960E3: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:4381)
==6228==    by 0x94963D0: XRE_main (nsAppRunner.cpp:4483)
==6228==    by 0x404A74: do_main(int, char**, nsIFile*) (nsBrowserApp.cpp:212)
==6228==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

Thanks,
Martin
Attached file mozconfig
Attached file valgrind log file
My mozconfig utilizes -O2 optimization level, however if a add -fsanitize=undefined, I can start the application. The sanitizer output is one of attachments.
If anyone considers that the component is not the right one, please change it to a more appropriate one.
Component: Untriaged → Build Config
Product: Firefox → Core
Well, the problem is probably caused by -flifetime-dse, where starting from GCC 6.0, the compiler produces a clobber for every constructor. Thus, every memory store that precedes that is removed.

Quite common situation of the described optimization is usage of memset for a memory that is after that passed to a constructor. I'll attach hint of places that are affected by the optimization.

Martin
I wonder which build flags do you use because I don't see such issue on Fedora 24 (except the JIT ones).
gcc -v:
gcc version 6.0.0 20160215 (experimental) (GCC) 

The more aggressive optimization (constructor clobbering with -flifetime-dse) has been added to GCC 6.
The problem occurs with -O2 and -flto.

Trevor is looking on the issue.

Martin
This is what building with GCC 6.1 looks on automation:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20df2b5901930713f5a14504121545ce235da18f

It's not pretty.

This is the tooltool manifest change required to use GCC 6.1 on automation for anyone who would want to try it some more:
https://hg.mozilla.org/try/diff/20df2b590193/browser/config/tooltool-manifests/linux64/releng.manifest

I didn't try to disable lifetime DSE yet.
It looks equally bad with -fno-lifetime-dse -fno-delete-null-pointer-checks:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=86d7709526d4ddacaac38fd3b31cbdb0b900b2a6
It seems likely to me that the more aggressive DCE, as described at
https://gcc.gnu.org/gcc-6/porting_to.html, is interacting badly with
our NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW and NS_IMPL_ZEROING_OPERATOR_NEW
as they look very similar to what is described in that web page.
(In reply to Julian Seward [:jseward] from comment #12)
> It seems likely to me that the more aggressive DCE, as described at
> https://gcc.gnu.org/gcc-6/porting_to.html, is interacting badly with
> our NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW and NS_IMPL_ZEROING_OPERATOR_NEW
> as they look very similar to what is described in that web page.

Yep, actually I spent quite some time debugging -flto -O2 build of Firefox.
However passing -flifetime-dse=1 fixes the build (this option places clobber just in dtors).

Martin
(In reply to Martin Liška from comment #13)
> However passing -flifetime-dse=1 fixes the build (this option places clobber
> just in dtors).

How did you test it?  I ask because this seems to be in conflict with Mike's
observations in comment 11.
(In reply to Julian Seward [:jseward] from comment #14)
> (In reply to Martin Liška from comment #13)
> > However passing -flifetime-dse=1 fixes the build (this option places clobber
> > just in dtors).
> 
> How did you test it?  I ask because this seems to be in conflict with Mike's
> observations in comment 11.

Well, I've just build it with LTO and tried that the assembled binary runs and is able to render couple of web pages. I didn't try to run any Firefox tests.
Attached patch bug1232696-c16-hack.diff (obsolete) — Splinter Review
Hacky patch, that presumably defeats DSE in
NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW and NS_IMPL_ZEROING_OPERATOR_NEW.
With this in place, a plain gcc-6 -O2 build can start up and browse
a couple of pages, without any -flifetime-dse-* or 
-fno-delete-null-pointer-checks flags.

So how do we fix (uses of) the above two macros? 
https://gcc.gnu.org/gcc-6/porting_to.html implies that they are
essentially "undefined behaviour" when used in conjunction with
placement new.
(In reply to Julian Seward [:jseward] from comment #16)
> Created attachment 8747028 [details] [diff] [review]
> bug1232696-c16-hack.diff
> 
> Hacky patch, that presumably defeats DSE in
> NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW and NS_IMPL_ZEROING_OPERATOR_NEW.
> With this in place, a plain gcc-6 -O2 build can start up and browse
> a couple of pages, without any -flifetime-dse-* or 
> -fno-delete-null-pointer-checks flags.
> 
> So how do we fix (uses of) the above two macros? 
> https://gcc.gnu.org/gcc-6/porting_to.html implies that they are
> essentially "undefined behaviour" when used in conjunction with
> placement new.


iirc froydnj filed a bug to remove them a while ago so we should probably just do that? the one tricky part is the classes that use them have tons of members so making sure you got them all in the ctor is kind of annoying.
(In reply to Trevor Saunders (:tbsaunde) from comment #17)
> iirc froydnj filed a bug to remove them a while ago so we should probably
> just do that? the one tricky part is the classes that use them have tons of
> members so making sure you got them all in the ctor is kind of annoying.

Trevor, do you have a bug number for that?  Getting them all in the ctor
is not a big problem because valgrind can tell us if we're using a member
that is uninitialised.  So we're pretty safe there.
(In reply to Julian Seward [:jseward] from comment #18)
> (In reply to Trevor Saunders (:tbsaunde) from comment #17)
> > iirc froydnj filed a bug to remove them a while ago so we should probably
> > just do that? the one tricky part is the classes that use them have tons of
> > members so making sure you got them all in the ctor is kind of annoying.
> 
> Trevor, do you have a bug number for that?  Getting them all in the ctor
> is not a big problem because valgrind can tell us if we're using a member
> that is uninitialised.  So we're pretty safe there.

Sorry, I can't find it now.  Yeah, its certainly doable, its just annoying enough I hadn't gotten to it yet, sorry.
QA Contact: jseward
Assignee: nobody → jseward
QA Contact: jseward
WIP patch.  Partly removes uses of NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW
and so makes it at least possible to start the browser using a gcc 6 build
at -O2, without having to use any optimisation-disabling flags (-f*).
Attachment #8747028 - Attachment is obsolete: true
Summary: Firefox using GCC 6 crashes during start up → Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds
We need to get rid of NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it interacts
badly with GCC 6's Lifetime-DSE optimization.  The implication from
https://gcc.gnu.org/gcc-6/porting_to.html, assuming that GCC is correct
about this, is that NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW's memset-ting of
the allocated memory to zero is undefined behaviour and therefore invalid C++.

Before I invest any more time in this, I'd like to get opinions on the approach
shown in comment 21.  Basically, my strategy is to find classes marked with
NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW, add explicit initialisations instead in
the constructor(s) of the class (usually there's only one), and then verify that
I didn't miss anything, by running the resulting build through mochitests whilst
Valgrind/Memcheck watches for undefined value uses that weren't there previously.
Nathan, do you have any views on this?
Flags: needinfo?(nfroyd)
Note that you don't need explicit (zero) values. That said, that kind of sucks, because it's very easy to forget one, but that's already the case for plenty of classes...

Also note that in C++11 another way to do this is to add a data member initializer, like:
struct Foo {
  Foo() {}
  int a {};
};
Although, IIRC, the latter doesn't work in MSVC2013.
(In reply to Mike Hommey [:glandium] from comment #25)
> Although, IIRC, the latter doesn't work in MSVC2013.

https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code says it's supported.
(In reply to Julian Seward [:jseward] from comment #23)
> Nathan, do you have any views on this?

The approach seems reasonable enough (it's hard to think of other ones, though maybe something like:

struct ZeroConstruct
{
  ZeroConstruct(void* p, size_t derivedSize) {
    memset(p, 0, derivedSize);
  }
};

class Foo : public ZeroConstruct
{
  Foo() : ZeroConstruct(this, sizeof(*this)) {}
  ...
};

would work?  Guess that falls apart pretty hard for non-POD classes.)

I'd very much like to understand where in the standard this optimization is derived from.
Flags: needinfo?(nfroyd)
It seems that only two uses of NS_IMPL_ZEROING_OPERATOR_NEW are actually
causing the gcc6 segfaulting, and they are easy to remove.  However, removing
all uses is quite some work given that some of the classes involved have
dozens if not hundreds of member variables.

Because of that, I'm going to split this into a new minimal-fix bug
that will get gcc6 -O2 builds working again quickly, and retain this
bug to deal with removal of the apparently-not-critical other uses.
Depends on: 1312344
Blocks: 1316555
Attached patch bug1232696-3-WIP.cset (obsolete) — Splinter Review
WIP patch.  I decided to jump in at the deep end.  I commented out the
memsets in NS_DECL_{,AND_IMPL_}ZEROING_OPERATOR_NEW and then started to
fix up constructors in the order in which valgrind first reports
uninitialised value uses.  This patch fixes up about ten classes.  The
browser still crashes before producing any window.  I estimate that it
runs about a billion insns now before getting into trouble.  There might
be around another 15 classes to fix up after this.
Attachment #8756249 - Attachment is obsolete: true
Attached patch bug1232696-9.cset (obsolete) — Splinter Review
WIP patch.  Needs further testing, but appears to work and
is close to being ready for review.
Attachment #8812719 - Attachment is obsolete: true
Come to think of it, this seems like a big footgun, and I think we should keep some annotation on those classes, and add static analysis that ensures all fields in those classes are initialized.
(In reply to Mike Hommey [:glandium] from comment #32)
> Come to think of it, this seems like a big footgun, and I think we should
> keep some annotation on those classes, and add static analysis that ensures
> all fields in those classes are initialized.

We are working on a static analysis for this for all class members.  This is a *huge* project, but I agree that we should fix it.  I'll CC you guys on the relevant bug.  If others want to be involved as well, please let me know.

(FWIW I agree that we should just delete these macros and fix our constructors instead.)
Attached patch bug1232696-13-1view.diff (obsolete) — Splinter Review
Fixes for the view/ subtree.
Attachment #8821364 - Attachment is obsolete: true
Attached patch bug1232696-13-2dom.diff (obsolete) — Splinter Review
Fixes for the dom/ subtree.
Attached patch bug1232696-13-3layout.diff (obsolete) — Splinter Review
Fixes for the layout/ subtree.
Attached patch bug1232696-13-4parser.diff (obsolete) — Splinter Review
Fixes for the parser/ subtree.
Attached patch bug1232696-13-5docshell.diff (obsolete) — Splinter Review
Fixes for the docshell/ subtree.  Also contains the hunks that
remove the definitions of the offending macros.
Attachment #8828321 - Flags: review?(tnikkel)
Attachment #8828322 - Flags: review?(bkelly)
Attachment #8828323 - Flags: review?(dholbert)
Attachment #8828325 - Flags: review?(hsivonen)
Attachment #8828326 - Flags: review?(bzbarsky)
Comment on attachment 8828326 [details] [diff] [review]
bug1232696-13-5docshell.diff

Looks to me like you need to initialize mProgressStateFlags to 0 in the nsDocLoader constructor as well, right?

r=me with that
Attachment #8828326 - Flags: review?(bzbarsky) → review+
Attachment #8828321 - Flags: review?(tnikkel) → review+
Comment on attachment 8828323 [details] [diff] [review]
bug1232696-13-3layout.diff

Review of attachment 8828323 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for sorting through all these!  This is close, but I've got a few comments that merit another round of review, so r- for the moment.

first, needs a commit message, obviously :) I assume you'll add one before landing.

::: layout/base/PresShell.cpp
@@ +780,5 @@
> +    , mDrawEventTargetFrame(nullptr)
> +#endif
> +    , mPaintCount(0)
> +    , mWeakFrames(nullptr)
> +    , mCanvasBackgroundColor(0)

nit: this is a nscolor (an integer-backed representation of a color), so assigning it to a raw integer value is semi-bogus.

Let's use NS_RGBA(0,0,0,0) instead of 0 here.  (Same thing as 0 under the hood, but just expressed as a nscolor-flavored expression.)

@@ +822,5 @@
> +  , mMouseLocation(NS_UNCONSTRAINEDSIZE, NS_UNCONSTRAINEDSIZE)
> +  , mCurrentEventFrame(nullptr)
> +  , mFirstCallbackEventRequest(nullptr)
> +  , mLastCallbackEventRequest(nullptr)
> +  , mLastAnchorScrollPositionY(0)

I think you missed a variable here -- you need to add this between mLastCallbackEventRequest and mLastAnchorScrollPositionY:
  , mLastReflowStart(0.0)

That variable has type DOMHighResTimeStamp, which is just a typedef for double:
https://dxr.mozilla.org/mozilla-central/rev/96cb95af530477edb66ae48d98c18533476e57bb/dom/base/nsDOMNavigationTiming.h#17

::: layout/base/nsDocumentViewer.cpp
@@ +524,5 @@
> +    mClosingWhilePrinting(0),
> +#if NS_PRINT_PREVIEW
> +    mPrintPreviewZoomed(0),
> +    mPrintIsPending(0),
> +    mPrintDocIsFullyLoaded(0),

This bunch of variables -- mStopped through mPrintDocIsFullyLoaded -- are really meant to be boolean flags, not unsigned values (despite the fact that they're of type "unsigned :1").

So, they should be initialized to "false" rather than 0, for better readability (and for consistency with how we already initialize "mIsSticky" to "true" rather than "1").

(Also, we should convert them to have type 'bool' - they're likely 'unsigned' simply because they predate our usage of 'bool' in Gecko. I filed bug 1332454 on that.)

::: layout/base/nsIPresShell.h
@@ +195,5 @@
>  
>  public:
> +  nsIPresShell();
> +
> +public:

Drop this "public" line -- it's redundant. (This is a "public" section already -- that's declared 3 lines above this.)

::: layout/base/nsPresContext.cpp
@@ +225,5 @@
> +    mActiveLinkColor(0),
> +    mVisitedLinkColor(0),
> +    mFocusBackgroundColor(0),
> +    mFocusTextColor(0),
> +    mBodyTextColor(0),

As noted above, let's use NS_RGBA(0, 0, 0, 0) instead of just "0" for all of these mXYZColor variables (mDefaultColor through mBodyTextColor), so that we're using the correct type [disregarding the fact that there's technically a typedef].

@@ +235,5 @@
> +    mInterruptChecksToSkip(0),
> +    mElementsRestyled(0),
> +    mFramesConstructed(0),
> +    mFramesReflowed(0),
> +    mHasPendingInterrupt(0),

Two nits:
(1) You missed one variable here -- "bool mInteractionTimeEnabled" (should go between these last 2 variables and be initialized to false).

(2) Most of the variables after this point (starting with mHasPendingInterrupt) are really declared as "unsigned mFoo : 1" which represent boolean state (and in practice will be treated as booleans and set to true & false).  So we should init them to "false" rather than to "0".  (And ultimately we should convert their type to bool, too - I've noted that in my followup bug 1332454.)

Just one exception to that -- mPrefScrollbarSide, which is declared with " : 2" instead of ": 1" and which is used as a 2-bit integer rather than as a boolean.

@@ +298,5 @@
>  
>    SetBackgroundImageDraw(true);		// always draw the background
>    SetBackgroundColorDraw(true);
>  
>    mBackgroundColor = NS_RGB(0xFF, 0xFF, 0xFF);

The body of this constructor has a *bunch* of unconditional assignments, which look quite odd in this new world of Having-An-Init-List.  They make the init list's initializations useless & misleading. (E.g. in the init list, we initialize mDoScaledTwips to 0/false, and then set it to true. We initialize mBackgroundColor to transparent black, and then set it to NS_RGB() represetning opaque white)

So, we shouldn't leave the code in this state -- as much as possible, we should clear out this constructor and move its assignments into the init list. That could happen as part of this patch or in a followup -- whichever you prefer.
Attachment #8828323 - Flags: review?(dholbert) → review-
Comment on attachment 8828321 [details] [diff] [review]
bug1232696-13-1view.diff

Review of attachment 8828321 [details] [diff] [review]:
-----------------------------------------------------------------

::: view/nsViewManager.cpp
@@ +68,2 @@
>  {
>    mRootViewManager = this;

As noted for the layout patch, you should flush the assignments in this constructor up into the init list. Specifically:
 (1) "mRootViewManager = this;" -- just update the init list to have mRootViewManager(this) instead of nullptr, and get rid of this assignment.

 (2) mHasPendingWidgetGeometryChanges/ mRecursiveRefreshPending = false (at the bottom of this constructor) -- these lines can just be removed since they're redundant (and were redundant before this patch)
Comment on attachment 8828322 [details] [diff] [review]
bug1232696-13-2dom.diff

Review of attachment 8828322 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by note on the DOM patch:

::: dom/xml/XMLDocument.cpp
@@ +238,3 @@
>  {
>    // NOTE! nsDocument::operator new() zeroes out all members, so don't
>    // bother initializing members to 0.

This comment needs to be removed (you're removing it elsewhere but missed this one).  Probably worth doing a grep for "new.*zero" after applying your patch stack to be sure there aren't any others that you missed.
Comment on attachment 8828325 [details] [diff] [review]
bug1232696-13-4parser.diff

Review of attachment 8828325 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/html/nsHtml5StreamParser.cpp
@@ +150,5 @@
>                                           nsHtml5Parser* aOwner,
>                                           eParserMode aMode)
> +  : mSniffingLength(0)
> +  , mBomState(eBomState::BOM_SNIFFING_NOT_STARTED)
> +  , mCharsetSource(0)

Please use kCharsetUninitialized here.
Attachment #8828325 - Flags: review?(hsivonen) → review+
Comment on attachment 8828322 [details] [diff] [review]
bug1232696-13-2dom.diff

Review of attachment 8828322 [details] [diff] [review]:
-----------------------------------------------------------------

I think just a couple things missed.  The nsIDocument and nsDocument classes are really hard to review, though.  I would feel better about this if we had static analysis checking for uninitialized members.

::: dom/base/nsDOMMutationObserver.h
@@ -368,5 @@
>    }
>  
>    void Disconnect(bool aRemoveFromObserver);
>  
> -  NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW

I think the nsMutationReceiverBase members need to be zero'd.  See:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDOMMutationObserver.h#307

::: dom/base/nsDocument.cpp
@@ +1298,2 @@
>      mVisible(true),
>      mRemovedFromDocShell(false),

Seems to be missing mHasReferrerPolicyCSP here.
Attachment #8828322 - Flags: review?(bkelly) → review-
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
> The nsIDocument and nsDocument classes are really hard to review, though.

Note that you have to review all subclasses of them too, right?
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #46)
> Note that you have to review all subclasses of them too, right?

Yep.
Blocks: 1333094
Attachment #8828321 - Attachment is obsolete: true
Attachment #8828322 - Attachment is obsolete: true
Attachment #8828325 - Attachment is obsolete: true
Attachment #8828326 - Attachment is obsolete: true
Attachment #8828323 - Attachment is obsolete: true
dholbert, bkelly, bz, hsivonen, tnikkel: thank you for the quick reviews.
I know it's fiddly and tedious stuff.

Here is a new patch set, bug1232696-14-{1view,2dom,3layout,4parser,5docshell}
that fixes all of the review comments.  Only one point of note: regarding
nsPresContext::nsPresContext(nsIDocument* aDocument, nsPresContextType aType),
dholbert writes:

> The body of this constructor has a *bunch* of unconditional
> assignments, which look quite odd in this new world of
> Having-An-Init-List. [..]
> So, we shouldn't leave the code in this state -- as much as possible,
> we should clear out this constructor and move its assignments into the
> init list. [..]

I filed a followup, bug 1333094, for this.  Simply removing the
zeroing macros without breaking anything is difficult enough and I
don't want to add to the validation difficulty by converting
assignments to initialisations in the same patch.
Attachment #8829483 - Flags: review?(dholbert)
Attachment #8829483 - Flags: review?(dholbert) → review?(bkelly)
Attachment #8829484 - Flags: review?(dholbert)
Comment on attachment 8829484 [details] [diff] [review]
bug1232696-14-3layout.cset

Review of attachment 8829484 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8829484 - Flags: review?(dholbert) → review+
Comment on attachment 8829483 [details] [diff] [review]
bug1232696-14-2dom.cset

Review of attachment 8829483 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, although I would feel better if we could at least spot check this with the unlanded patches that perform static analysis uninitialized members.
Attachment #8829483 - Flags: review?(bkelly) → review+
(In reply to Ben Kelly [:bkelly] from comment #55)
> r=me, although I would feel better if we could at least spot check this with
> the unlanded patches that perform static analysis uninitialized members.

Ben, do you have a pointer to the set of patches involved?
Ehsan made reference to this in comment 33 but I'm clear what the
specifics are.
Flags: needinfo?(bkelly)
We spoke on IRC about this.
Flags: needinfo?(bkelly)
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a9f258d04248
Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds (1 of 5, fixes for view/).  r=tnikkel.
https://hg.mozilla.org/integration/mozilla-inbound/rev/572be1a80312
Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds (2 of 5, fixes for dom/).  r=bkelly.
https://hg.mozilla.org/integration/mozilla-inbound/rev/12d3b462f7dc
Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds (3 of 5, fixes for layout/).  r=dholbert.
https://hg.mozilla.org/integration/mozilla-inbound/rev/382a4dd6b1f8
Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds (4 of 5, fixes for parser/).  r=hsivonen.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ad27da8fe35
Remove NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW as it causes segfaulting for GCC 6 builds (5 of 5, fixes for docshell/).  r=bzbarsky.
Comment on attachment 8829483 [details] [diff] [review]
bug1232696-14-2dom.cset

Review of attachment 8829483 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsDocument.cpp
@@ +1337,3 @@
>      mCompatMode(eCompatibility_FullStandards),
> +    mReadyState(ReadyState::READYSTATE_UNINITIALIZED),
> +    mStyleBackendType(mozilla::StyleBackendType::Gecko),

Ah... this one effectively disables stylo universally. I'll work on a patch to fix it.
Depends on: 1334938
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.