Last Comment Bug 709193 - Win PGO builds hitting 3GB virtual address space limit, failing with: "nswindowmediator.cpp(821) : fatal error C1001: An internal error has occurred in the compiler. LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage"
: Win PGO builds hitting 3GB virtual address space limit, failing with: "nswind...
Status: RESOLVED FIXED
[see comment 36 & comment 62]
:
Product: Firefox
Classification: Client Software
Component: Build Config (show other bugs)
: Trunk
: x86 Windows Server 2003
: -- critical with 5 votes (vote)
: ---
Assigned To: Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary)
:
Mentors:
Depends on: 474043 deadcode 543034 msvc2010 679352 709192 PGOSilverBullet 709657 709721 710246 710473 713169
Blocks: 609976 648407 750661
  Show dependency treegraph
 
Reported: 2011-12-09 10:48 PST by Phil Ringnalda (:philor)
Modified: 2013-09-29 12:41 PDT (History)
71 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Phil Ringnalda (:philor) 2011-12-09 10:48:52 PST
Twice I've hounded patches that seemed to be the cause out of the tree, only to have them demonstrate that either another build while they were still in, or a try build, showed that they were not the cause, and then reland.

I don't have any idea what is the cause, or how to determine it, but inbound is currently totally hosed and unable to merge as a result.

https://tbpl.mozilla.org/php/getParsedLog.php?id=7856152&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7856634&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7853995&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7856443&tree=Mozilla-Inbound
Comment 1 Frank Yan (:fryn) 2011-12-09 14:17:55 PST
UX branch's Windows nightly builds are also exhibiting this error:
https://tbpl.mozilla.org/php/getParsedLog.php?id=7837809&tree=UX
https://tbpl.mozilla.org/php/getParsedLog.php?id=7854869&tree=UX
Comment 2 Marco Bonardo [::mak] 2011-12-09 14:21:39 PST
we believe the cause may be
59c363453713
Ehsan Akhgari – Bug 703444. Port SPS profiler to Windows. r=jmuizelaar

Waiting for some confirmation then someone will backout.
Comment 3 Marco Bonardo [::mak] 2011-12-09 14:23:56 PST
though the fact UX has the same error, but doesn't have that patch, makes things foggy.
Comment 4 Matt Brubeck (:mbrubeck) 2011-12-09 14:27:56 PST
There's an even earlier occurence on UX, from the Wednesday December 7 nightly:
https://tbpl.mozilla.org/php/getParsedLog.php?id=7797830&tree=UX
Comment 5 Marco Bonardo [::mak] 2011-12-09 14:50:03 PST
Khuey suggested taking a look at bug 543034. That's interesting since the symptoms are similar here:
- we have a failure across different trees
- different patches seem to be able to cause the same failure
- we have a green in the middle of a lot of reds, and that's a backout
- the push immediately after the green adds code, and it's red

So it's possible we are on a boundary, and any new code causes red, while the backout went green because it actually removed code, bringing us below the limit.
Comment 6 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-12-09 14:59:25 PST
Seems likely.

If the Windows builders are updated to Win64, then we can probably get 4GB of address space for the linker instead of 3GB. That might fix this, for a while.
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-12-09 15:01:48 PST
BTW you could test this theory by disabling some random component (e.g., --disable-webm) and see if it fixes the build bustage.
Comment 8 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-09 15:18:08 PST
Our current theory is that we're running out of virtual address space in the linker (again :-/).

We're already operating the PGO builds in a configuration that gives them 3 GB of virtual address space.  If this is what's going wrong our only options are to split libxul apart or to move to doing 32 builds on 64 bit windows builders.
Comment 9 Richard Newman [:rnewman] 2011-12-09 15:40:58 PST
Watching this to keep on top of s-c merges.
Comment 10 John Ford [:jhford] CET/CEST Berlin Time 2011-12-09 15:47:24 PST
Nothing has changed on our windows build machines in the last day and mozilla-central is still green.  This looks like its a code issue with something on mozilla-inbound.
Comment 11 Bill Gianopoulos [:WG9s] 2011-12-09 16:00:07 PST
Oddly, today we have a green in the midst of reds that was NOT a backout.
Comment 12 Marco Bonardo [::mak] 2011-12-09 16:03:32 PST
(In reply to Bill Gianopoulos from comment #11)
> Oddly, today we have a green in the midst of reds that was NOT a backout.

yes PGO can cause any sort of randomness.

btw, this happens on UX branch at least from 7th, and that code is already in central.
Comment 13 Marco Bonardo [::mak] 2011-12-09 16:04:10 PST
Or better, UX is based on the same code that is green in central
Comment 14 Ed Morley [:emorley] 2011-12-09 16:23:18 PST
To hopefully confirm the theory in comment 5, I've pushed the following to try:

* inbound tip + MOZ_PGO=1 (to confirm it does fail as a clobber on try):
  https://tbpl.mozilla.org/?tree=Try&rev=e2ca2498c427

* inbound tip + MOZ_PGO=1 + disable accessibility/webm:
  https://tbpl.mozilla.org/?tree=Try&rev=3c0d1907b02e

* inbound tip + MOZ_PGO=1 + msvc2010:
  https://tbpl.mozilla.org/?tree=Try&rev=e8c57013b187

* inbound tip + MOZ_PGO=1 but with ehsan's SPS patch backed out (bug 703444):
  https://tbpl.mozilla.org/?tree=Try&rev=bde21781ca39

* m-c tip + MOZ_PGO=1 + ehsan's SPS patch applied:
  https://tbpl.mozilla.org/?tree=Try&rev=d62a63cb667e
Comment 15 Mike Hommey [:glandium] 2011-12-09 23:44:55 PST
Interesting results.
 
> * inbound tip + MOZ_PGO=1 (to confirm it does fail as a clobber on try):
>   https://tbpl.mozilla.org/?tree=Try&rev=e2ca2498c427

2 greens out of 6 builds.

> * inbound tip + MOZ_PGO=1 + disable accessibility/webm:
>   https://tbpl.mozilla.org/?tree=Try&rev=3c0d1907b02e

All green
 
> * inbound tip + MOZ_PGO=1 + msvc2010:
>   https://tbpl.mozilla.org/?tree=Try&rev=e8c57013b187

All green
 
> * inbound tip + MOZ_PGO=1 but with ehsan's SPS patch backed out (bug 703444):
>   https://tbpl.mozilla.org/?tree=Try&rev=bde21781ca39

One green out of 6 builds

> * m-c tip + MOZ_PGO=1 + ehsan's SPS patch applied:
>   https://tbpl.mozilla.org/?tree=Try&rev=d62a63cb667e

4 greens, 2 still running.

This seems to indicate two things:
- code size is a factor, but randomness makes it work sometimes.
- MSVC2010 uses less memory.
Comment 16 Mike Hommey [:glandium] 2011-12-10 00:32:32 PST
(In reply to Mike Hommey [:glandium] from comment #15)
> > * m-c tip + MOZ_PGO=1 + ehsan's SPS patch applied:
> >   https://tbpl.mozilla.org/?tree=Try&rev=d62a63cb667e
> 
> 4 greens, 2 still running.

Err, 4 reds (and now one green).
Comment 17 Mike Hommey [:glandium] 2011-12-10 03:32:03 PST
Two other strategies:

- Building intermediate libraries (in case the linker behaves better). Note this may fail to build completely for other reasons, but that should be fixable.
https://tbpl.mozilla.org/?tree=Try&rev=ed91f23b4488

- Building without SPDY (backout of all changesets listed by hg log -k SPDY)
https://tbpl.mozilla.org/?tree=Try&rev=60c146c38886
Comment 18 Marco Bonardo [::mak] 2011-12-10 03:34:58 PST
So, question for RelEng, can we do PGO builds on Win64 boxes per the suggestion on comment 6? Would not be the final fix but would give us some breath with 1 additional GB of virtual space
Comment 19 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-12-10 04:06:36 PST
Going from 2GB to 3GB bought us nearly two years. Hopefully 3GB to 4GB would buy us another couple of years. VS2010 helps too apparently so that might give us more. Maybe by then we'll be more focused on 64bit builds and can disable PGO on 32-bit...
Comment 20 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-10 04:29:14 PST
(In reply to Mike Hommey [:glandium] from comment #17)
> Two other strategies:
> 
> - Building intermediate libraries (in case the linker behaves better). Note
> this may fail to build completely for other reasons, but that should be
> fixable.
> https://tbpl.mozilla.org/?tree=Try&rev=ed91f23b4488

I distinctly remember being up till 3 am one time getting rid of intermediate libraries so we could fix an address space exhaustion problem.

> - Building without SPDY (backout of all changesets listed by hg log -k SPDY)
> https://tbpl.mozilla.org/?tree=Try&rev=60c146c38886

This is basically "if we get rid of a big chunk of code do things start working again" test, right?
Comment 21 Mike Hommey [:glandium] 2011-12-10 05:05:13 PST
Another one: disabling graphite:
https://tbpl.mozilla.org/?tree=Try&rev=9d45b2ffa7b6
Comment 22 Mike Hommey [:glandium] 2011-12-10 05:07:58 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19)
> Going from 2GB to 3GB bought us nearly two years. Hopefully 3GB to 4GB would
> buy us another couple of years. VS2010 helps too apparently so that might
> give us more. Maybe by then we'll be more focused on 64bit builds and can
> disable PGO on 32-bit...

Unfortunately, this takes time to put in place. I don't think we can wait that long. Which brings us to ...

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> > - Building without SPDY (backout of all changesets listed by hg log -k SPDY)
> > https://tbpl.mozilla.org/?tree=Try&rev=60c146c38886
> 
> This is basically "if we get rid of a big chunk of code do things start
> working again" test, right?

If we can buy a few days/weeks by disabling/removing big chunks, I think we should go for it now, until we can do something better.
Comment 23 Emanuel Hoogeveen [:ehoogeveen] 2011-12-10 05:18:35 PST
The other option would be "Disable PGO for Windows builds" until MSVC 2010 is ready, right?
Comment 24 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-10 05:20:00 PST
(In reply to Mike Hommey [:glandium] from comment #22)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19)
> > Going from 2GB to 3GB bought us nearly two years. Hopefully 3GB to 4GB would
> > buy us another couple of years. VS2010 helps too apparently so that might
> > give us more. Maybe by then we'll be more focused on 64bit builds and can
> > disable PGO on 32-bit...
> 
> Unfortunately, this takes time to put in place. I don't think we can wait
> that long. Which brings us to ...

Agreed.

> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> > > - Building without SPDY (backout of all changesets listed by hg log -k SPDY)
> > > https://tbpl.mozilla.org/?tree=Try&rev=60c146c38886
> > 
> > This is basically "if we get rid of a big chunk of code do things start
> > working again" test, right?
> 
> If we can buy a few days/weeks by disabling/removing big chunks, I think we
> should go for it now, until we can do something better.

Agreed.

(In reply to Emanuel Hoogeveen from comment #23)
> The other option would be "Disable PGO for Windows builds" until MSVC 2010
> is ready, right?

Right, the problem is that then we won't be building/testing the configuration we ship to 90+% of our users.
Comment 25 Emanuel Hoogeveen [:ehoogeveen] 2011-12-10 05:48:56 PST
Oh, I see. Yes, lack of testing would be bad (unless it was only for a few days and nowhere near the split to aurora/beta).
Comment 26 Ed Morley [:emorley] 2011-12-10 06:25:12 PST
Shortly after the bustage started, graphite landed too, (72 files changed, 16165 insertions(+), 0 deletions(-)), which would explain why backing out Ehsan's SPS patch from tip on it's own didn't work.

As such I suggest we:

1) Stop building graphite on windows for now (ie: https://hg.mozilla.org/try/rev/8a247ce99d61).
2) Back out Ehsan's windows-only SPS landing (https://hg.mozilla.org/integration/mozilla-inbound/rev/59c363453713).
3) Land the remove libreg patch that has been sitting around for a while (bug 679352).

I've pushed a try run with #1 and #2:
https://tbpl.mozilla.org/?tree=Try&rev=1cb485c4f804
...and #3 is pending an unbitrot, which is underway.

Since the above will only stop the problem momentarily, we need to:
  a) Request that no large libxul landings occur for the rest of this cycle (suboptimal, but our only other option is holding the trees closed for 10 days).
  b) Run PGO builds more often than 4 hours (bug 709192 or maybe a compromise between the two extremes), so we can spot offending landings more quickly.

This will at least allow the trees to reopen and fixes to land, even if large new components can't.

In parallel with the above:
* Have a concerted effort to look for more unused code quick wins (bug 457262 and friends).
* Get the ball rolling on doing 32 builds on 64 bit windows builders (let's file this as another bug).
* Once Firefox 9 is released (now only 10 days) look at the service pack stats that come in for the ~week thereafter, and hope they are able to convince the product team that bug 563318 is viable for Firefox 12.
Comment 27 Ed Morley [:emorley] 2011-12-10 06:40:38 PST
Oh and also:
* Experiment with splitting libxul apart (comment 8).

CC'ing Ehsan and Jonathan, since comment 26 involves backing out/preffing off their landings.
Comment 28 Benoit Girard (:BenWa) 2011-12-10 08:49:41 PST
Ehsan SPS patch isn't particularly big. Is it really worth backing out?
Comment 29 Ed Morley [:emorley] 2011-12-10 08:59:38 PST
Agreed, at this point glandium and myself are looking at backing out SPDY and preffing Graphite off (awaiting try results).
Comment 30 Ed Morley [:emorley] 2011-12-10 10:26:52 PST
Just to keep anyone following along updated:

Preffing graphite off (in addition to backing out Ehsan's patch), wasn't sufficient:
https://tbpl.mozilla.org/?tree=Try&rev=1cb485c4f804

Results of backing out SPDY will be available in <90mins (or earlier, given that the failing builds end ~70mins quicker than those that succeed), tbpl link in comment 17.
Comment 31 Mike Hommey [:glandium] 2011-12-10 10:34:12 PST
FWIW, see bug 563318 comment 20, about msvc 2010.
Comment 32 Ryan VanderMeulen [:RyanVM] 2011-12-10 11:33:01 PST
Also as an FYI, I do my personal builds on Win7 x64 using MSVC2010 (32bit). Building m-c tip, my second pass link.exe Peak Working Set got up to about 1.92GB for xul.dll.

I don't have a fully standard build config, though. Relevant differences are:
--enable-optimize="-O2 -GL -arch:SSE2"
--disable-mochitest
--disable-tests
--disable-accessibility
--disable-crashreporter
--disable-installer
--disable-safe-browsing
--disable-updater

I would assume that the --enable-optimize if anything might increase memory usage. The rest of the disabled wouldn't have much effect on libxul linking since all of the above (except safe browsing) are linked separate from it anyway, yes? Also, is there a better measurement I should be using for address space?
Comment 33 Ed Morley [:emorley] 2011-12-10 13:37:34 PST
Ok so the SPDY backout got 5/6 greens, so helps but needs some of the rest backed out too.

I'm going to (on inbound):
* Backout SPDY
* Stop building Graphite (pending green: https://tbpl.mozilla.org/?tree=Try&rev=14d52fea5abb)
* Land bug 679352 (pending green: https://tbpl.mozilla.org/?tree=Try&rev=e1b435a80f0d)
* Retigger a load of PGO on inbound

...and see what state we're in after that. Unless anyone has any objections?
Comment 35 :Felipe Gomes (needinfo me!) 2011-12-10 19:30:48 PST
To further confirm the theory, I did what the compiler was suggesting to do about disabling optimization on the function where it is crashing. I pushed that change with the inbound tip before the backouts to try: https://tbpl.mozilla.org/?tree=Try&rev=c89cf617aa44

Sure enough, it crashed somewhere else, but this time with the more descriptive error "fatal error C1002: compiler out of heap space on pass 2", the same as bug 543034
Comment 36 Ed Morley [:emorley] 2011-12-10 19:38:35 PST
Summary, since I've been asked by people who've been backed out / we'll need it for the coming days to point at from TBPL:

(Please note I'm not a build peer - I'm speaking with my inbound sheriff hat on; khuey/ted please correct me if I've misunderstood anything).

* Win PGO builds were failing whilst linking, due to running out of virtual address space. This happened a few years ago (bug 543034), and the fix then was to start using the /3GB switch, which upped the limit from 2GB to 3GB. However that's now still not enough.

* As a temporary measure (and as all the other options are medium-long term), a couple of large landings within the last week were backed out, to (a) confirm what the try runs were telling us, and (b) so that the trees could at least be reopened for everything but large libxul landings. After trying a few combinations of backouts on try, we settled on** preffing off Graphite & backing out SPDY (as there is no easy way to disable it from building).

* An already r+'d patch to remove libreg (bug 679352) was landed, to reduce the pressure further.

* This has turned Win PGO green again (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=f5578fdc50ef). However, this only buys us a bit of extra time, and there still cannot be any large libxul landings until this is resolved :-((

* Our options:

  (a) Bug 457262: Kill off more unused code (still only stop-gap, but every little helps until the rest can be done). Ms2ger has spent today going through finding things that can be removed and has filed a load more bugs \o/. The more people we can have doing this, the better. Also, maybe now is a good time to kill RDF (bug 559505 and friends)?

  (b) Start using 64 bit builders for 32 bit windows builds, so the linker can use the full 4GB virtual address space. Non-trivial for releng, they can probably explain the hurdles/timeframe better than I. Have filed bug 709480.

  (c) Bug 563318: Switching to MSVC2010. Seems to use less memory during linking, so avoids the failures for now (comment 15), but no idea for how long without also doing (b). This is also blocked on Firefox 9 being released, since it's the first version that provides the service pack metrics, which the product-team require before making a decision (MSVC2010 will EOL WinXP SP1 and earlier). 

  (d) Experiment with splitting libxul apart (a la comment 8).

  (e) Turning off PGO for Windows builds. I'm presuming this would be too much of a perf loss, and so not viable, but it's been raised on IRC as an alternative to backing out people's work, so including it here for completeness.

** This is obviously really frustrating for the people who've been preffed off/backed out (sorry Patrick & Jonathan :-( ) - and it will obviously be the product team's call to decide what remains backed out. An extra apology to Patrick who got missed off the CC list additions in comment 27, and so wasn't aware of this bug until the backout comment on his SPDY bug.

Anyway, now to get some sleep finally :-)
Comment 37 Jonathan Kew (:jfkthame) 2011-12-11 01:15:36 PST
(In reply to Ed Morley [:edmorley] from comment #36)
> Summary, since I've been asked by people who've been backed out / we'll need
> it for the coming days to point at from TBPL:

Many thanks to Ed and all who've been working to figure out the issue here.... how very frustrating!

One comment:
> * This has turned Win PGO green again
> (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=f5578fdc50ef). However,
> this only buys us a bit of extra time, and there still cannot be any large
> libxul landings until this is resolved :-((
It's not clear what would constitute a "large libxul landing", as opposed to a "medium-sized" or "small" one; but in any case, AFAIK we can't tell exactly how close we are to hitting this problem, and so we don't know how large of a landing would break the build. And it'll break just as surely after a series of tiny landings that add up over a period of days or weeks. A harmless-looking single-line change somewhere could be the straw that breaks the linker's back..... so until we have a real solution, _every_ landing that touches libxul risks hitting this problem again.
Comment 38 Ed Morley [:emorley] 2011-12-11 06:23:42 PST
(In reply to Jonathan Kew (:jfkthame) from comment #37)
> A harmless-looking single-line change somewhere could be the straw that
> breaks the linker's back..... so until we have a real solution, _every_
> landing that touches libxul risks hitting this problem again.

Agreed.

As such, the decision on #developers has been to leave inbound closed (since it has quality-of-service guarantees that we can't make anymore), set mozilla-central to approval required - and for now not allow *any* (Windows) libxul landings that touch c/c++, unless they are code removals.

For the latest version of the tree approval rules, see:
https://etherpad.mozilla.org/sLWdJm0zwB
(corrections/feedback appreciated)
Comment 39 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-12-11 13:24:04 PST
Bug 709480 seems to be the only option that will fix the problem without collateral damage.
Comment 40 Marco Castelluccio [:marco] 2011-12-11 15:18:56 PST
Maybe some bugs in these meta bugs could help: bug 105431, bug 651016.
About RDF removal, there is also bug 474043.
Comment 41 Andreas Gal :gal 2011-12-11 17:50:00 PST
I don't think its a good idea to rely on 709480. That sounds a pretty high risk solution.

In parallel we should see if we can shrink libxul and squeeze by until we find a real fix. Lets get work going on removing unnecessary stuff from libxul.
Comment 42 Alfred Kayser 2011-12-12 00:14:39 PST
Bug 230675 - 'decom' of nsICacheVisitor.idl: saves 10% / 150K from nkcache_s.lib
Comment 43 Marco Castelluccio [:marco] 2011-12-12 13:27:49 PST
There's also bug 405407, that is waiting for a review for months.
Comment 44 Ryan VanderMeulen [:RyanVM] 2011-12-12 16:09:50 PST
(In reply to Ryan VanderMeulen from comment #32)
> Also, is there a better measurement I should be using for address space?

Now that I've figured out that the correct measurement is virtual bytes, I've done another build with PerfMon going to track the virtual byte usage. At peak, link.exe had 2.93GB reserved. This is with current m-c tip with all the various backouts/deletions included. In other words, don't count on MSVC2010 to buy you much if anything in terms of solving this issue.
Comment 45 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-13 05:12:57 PST
(In reply to Ryan VanderMeulen from comment #44)
> (In reply to Ryan VanderMeulen from comment #32)
> > Also, is there a better measurement I should be using for address space?
> 
> Now that I've figured out that the correct measurement is virtual bytes,
> I've done another build with PerfMon going to track the virtual byte usage.
> At peak, link.exe had 2.93GB reserved. This is with current m-c tip with all
> the various backouts/deletions included. In other words, don't count on
> MSVC2010 to buy you much if anything in terms of solving this issue.

Thanks for doing that.  It's great to know that MSVC 2010 alone won't help much.
Comment 46 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-13 07:51:34 PST
I've split some of intl/ out of libxul, and we're going to try to reland SPDY on top of that and see where we are.
Comment 47 Ted Mielczarek [:ted.mielczarek] 2011-12-13 09:59:03 PST
Note that despite the virtual memory measurements, it's possible that Microsoft has fixed bugs in the compiler so that it won't always crash when it hits this situation. It may be that it can map/unmap files as needed, for example. That would explain successful VC 2010 builds where VC 2005 has failed.
Comment 48 Mike Hommey [:glandium] 2011-12-13 11:08:11 PST
(In reply to Ted Mielczarek [:ted, :luser] from comment #47)
> Note that despite the virtual memory measurements, it's possible that
> Microsoft has fixed bugs in the compiler so that it won't always crash when
> it hits this situation. It may be that it can map/unmap files as needed, for
> example. That would explain successful VC 2010 builds where VC 2005 has
> failed.

One way to make sure of that is to either profile VC 2010 or limit the amount of memory it can map.
Comment 49 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-13 11:16:00 PST
It looks like most of the increase in libxul size from 10 to 11 is from Skia, which I'm told me can turn off if necessary.
Comment 50 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-13 11:44:29 PST
So, bad news, splitting out uconv doesn't help.
Comment 51 Matheus Kerschbaum 2011-12-13 13:39:10 PST
(In reply to Ted Mielczarek [:ted, :luser] from comment #47)
> Note that despite the virtual memory measurements, it's possible that
> Microsoft has fixed bugs in the compiler so that it won't always crash when
> it hits this situation. It may be that it can map/unmap files as needed, for
> example. That would explain successful VC 2010 builds where VC 2005 has
> failed.

That's not correct, unfortunately. The patch in bug 609976 fails with the same error even with MSVC2010.
Comment 52 Ryan VanderMeulen [:RyanVM] 2011-12-13 14:00:09 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #49)
> It looks like most of the increase in libxul size from 10 to 11 is from
> Skia, which I'm told me can turn off if necessary.

If someone wants to throw up a disable Skia patch, I can see what difference it makes to virtual byte usage on an otherwise vanilla m-c build.
Comment 53 Bill Gianopoulos [:WG9s] 2011-12-13 15:50:20 PST
(In reply to Ryan VanderMeulen from comment #52)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #49)
> > It looks like most of the increase in libxul size from 10 to 11 is from
> > Skia, which I'm told me can turn off if necessary.
> 
> If someone wants to throw up a disable Skia patch, I can see what difference
> it makes to virtual byte usage on an otherwise vanilla m-c build.

Is this just a matter of backing out Bug 699258?
Comment 54 :Ehsan Akhgari 2011-12-13 17:05:04 PST
I just landed a patch in bug 710473 which should enable us to reopen the tree.
Comment 55 :Ehsan Akhgari 2011-12-13 17:12:53 PST
It would be useful is somebody can do a PGO build measuring the address space of the linker with this changeset: https://hg.mozilla.org/mozilla-central/rev/221eccfa6a3f.
Comment 56 Ed Morley [:emorley] 2011-12-13 19:09:07 PST
(In reply to Ehsan Akhgari [:ehsan] from comment #55)
> It would be useful is somebody can do a PGO build measuring the address
> space of the linker with this changeset:
> https://hg.mozilla.org/mozilla-central/rev/221eccfa6a3f.

The try run in bug 710473 appears to be from mozilla-central and SPDY/Graphite were relanded on inbound only, so I've merged to inbound and will trigger PGO builds, to see if this works for sure :-)

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=6c5cb98336ba
Comment 57 Ryan VanderMeulen [:RyanVM] 2011-12-13 19:56:35 PST
(In reply to Ehsan Akhgari [:ehsan] from comment #55)
> It would be useful is somebody can do a PGO build measuring the address
> space of the linker with this changeset:
> https://hg.mozilla.org/mozilla-central/rev/221eccfa6a3f.

My virtual bytes peak is currently 3,139,252,224 (2.92GB). I'm using a vanilla .mozconfig this time with no --enable-optimize set and nothing disabled.
Comment 58 Phil Ringnalda (:philor) 2011-12-13 22:01:20 PST
Win PGO on the merge to inbound failed in the same old way.
Comment 59 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-14 03:49:44 PST
I did the following measurements last night (with --disable-accessibility and --disable-angle, so only relative numbers are useful)

tip: 2.754 GB
no nsDerivedSafe: 2.748 GB
no Skia: 2.725 GB
aurora: 2.690 GB
Comment 60 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-14 06:25:43 PST
Status update, I've disabled Skia on Windows and hsivonen has landed a code removal patch so we're currently 32 MB below tip (that appears to be half a release cycle at the rate we climb :-/).

The WebGL and media separation patches are good for another 41 MB (together all of this would get us under Aurora's memory usage).
Comment 61 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-14 11:14:59 PST
Situation is under control now, m-c is metered to clear the backlog, we're coordinating large landings until we can stand up better infrastructure.
Comment 62 Ed Morley [:emorley] 2011-12-15 01:25:22 PST
Both mozilla-central and mozilla-inbound are now fully open, no metering/sheriff approval required. 

However, please be gentle & coordinate large (read: importing brand new chunks of third party code or massive new features) C++ libxul landings with khuey, just to be safe.

Massive thanks to ehsan & mbrubeck for sheriffing mozilla-central last night, until the backlog had cleared.
Comment 63 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-15 15:25:54 PST
The backlog on the tree has been cleared, bugs are on file for everything else that needs to be done.

Many thanks to everyone who helped with this.
Comment 64 Alex Macfarlane Smith 2011-12-17 18:52:34 PST
This might be a silly question, but isn't it possible to use incremental linking? Or does this have other issues/the same memory issues?
Comment 65 Ted Mielczarek [:ted.mielczarek] 2011-12-19 06:15:01 PST
Incremental linking is unrelated, it's about making subsequent links faster when you make changes.
Comment 66 Serge Gautherie (:sgautherie) 2012-01-03 03:27:53 PST
Ftr, the following changeset is actually part of bug 709657:
http://hg.mozilla.org/comm-central/rev/f72675b4a48f
Revert changes from Bug 709193.

Note You need to log in before you can comment on or make changes to this bug.