Last Comment Bug 750661 - Win PGO builds hitting 3GB virtual address space limit again, failing with: "nshtml5attributename.cpp(1977) : fatal error C1002: compiler is out of heap space in pass 2"
: Win PGO builds hitting 3GB virtual address space limit again, failing with: "...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: x86 Windows Server 2003
: -- blocker with 3 votes (vote)
: ---
Assigned To: :Ehsan Akhgari
:
Mentors:
Depends on: 474043 deadcode 543034 709192 709193 PGOSilverBullet 709657 710246 710840 748343 750717 750728 750747 750859 750867 751186 751201 751273
Blocks: 609976 648407
  Show dependency treegraph
 
Reported: 2012-05-01 04:21 PDT by Ed Morley [:emorley]
Modified: 2013-04-03 00:16 PDT (History)
42 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Graph (18.72 KB, image/png)
2012-05-01 07:29 PDT, Ed Morley [:emorley]
no flags Details

Description Ed Morley [:emorley] 2012-05-01 04:21:03 PDT
Ok, so it appears that bug 709193 is back - Win PGO builds are failing again :-(

Has happened twice on inbound:

rev 0831ce6ba72f:
*  https://tbpl.mozilla.org/php/getParsedLog.php?id=11348221&tree=Mozilla-Inbound - "fatal error C1002: compiler is out of heap space in pass 2"
* a retrigger of this rev completed fine: https://tbpl.mozilla.org/php/getParsedLog.php?id=11345300&tree=Mozilla-Inbound - "linker max virtual size: 3021185024"

rev ac1504ff8740:
*  https://tbpl.mozilla.org/php/getParsedLog.php?id=11354573&tree=Mozilla-Inbound (same error)


I've created this to track the failure + short term mitigation, however there is also:
* Bug 710840 (tracking the increase over time) - where I'm about to start bisecting.
* Bug 709480 (switching to win64 builders), which is the real long-term solution here.
Comment 1 Ed Morley [:emorley] 2012-05-01 04:46:18 PDT
Filtered inbound TBPL view showing just win PGO:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=WINNT%205.2%20mozilla-inbound%20pgo-build

Failure rate is 2 out of 5 in the last 12 hours (bug 750611 caused a bit of a backlog, so still quite a few pending/running as I post this).

mozilla-central seems ok for now - khuey found that inbound was using 30mb more on the last green, so appears there has been a significant rise since the last merge, which we'll bisect now.
Comment 2 Ed Morley [:emorley] 2012-05-01 07:04:30 PDT
I've collected the win pgo peak linker values for the last month for inbound in bug 710840 (attachment 619911 [details]).

The most relevant part being:
ac1504ff8740: 3016626176; 3021967360; + 1x failed
221db28204cf: 3021996032
0831ce6ba72f: 3021553664; + 1x failed
32e001c1351b: 2962857984
c0822f99d850: 2962464768
f8c388f622f1: 3021185024
0e2658794e06: 2982510592
609aeba1b2fe: 2962075648
f99cf2f41355: 2993434624; 2962083840; 2993422336
043266d76bb3: 2992799744
Comment 3 Ed Morley [:emorley] 2012-05-01 07:29:01 PDT
Created attachment 619919 [details]
Graph

Posting this just to keep everyone in the loop (seeing as the trees have now been closed as of 15:14 UTC+1).

mbrubeck kindly graphed the values from attachment 619911 [details].

The large jump is in the range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0e2658794e06&tochange=f8c388f622f1

However ehsan's inspection of the changesets found mainly mobile/linux only or else very small patches :-(
Comment 4 Ed Morley [:emorley] 2012-05-01 08:03:04 PDT
The inbound win nightly has just failed too (three pushes after those listed in comment 0):

d2596504ce97
https://tbpl.mozilla.org/php/getParsedLog.php?id=11358759&tree=Mozilla-Inbound
"e:\builds\moz2_slave\m-in-w32-ntly\build\xpfe\appshell\src\nswindowmediator.cpp(810) : fatal error C1001: An internal error has occurred in the compiler.
LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage "
Comment 5 Ed Morley [:emorley] 2012-05-01 08:55:34 PDT
Also occurred on profiling branch, which doesn't yet have the mozilla-inbound changes:

WINNT 5.2 profiling nightly on 2012-04-30 04:02:17 PDT for push cf0acd702251
https://tbpl.mozilla.org/php/getParsedLog.php?id=11323765&tree=Profiling

and

WINNT 5.2 profiling nightly on 2012-05-01 04:02:23 PDT for push 1fe40e6e26b0
https://tbpl.mozilla.org/php/getParsedLog.php?id=11358377&tree=Profiling

Both being:
"nswindowmediator.cpp(810) : fatal error C1001: An internal error has occurred in the compiler."
Comment 6 Ed Morley [:emorley] 2012-05-01 09:55:27 PDT
To summarise what's been discussed on IRC, for those waiting on the closed tree:

* There's nothing hugely obvious (that we've been able to find) that has caused an increase, that could be backed out short term (it would seem we've been close to the limit for a while, but without bug 710840, there was no easy way to keep track). Also, the peak linker vsize values seem to be bi or even trimodal (see mbrubeck's attached graph), which makes finding ranges of increases a pain.

* Short term our options are yet again (bug 709193 déjà vu): remove deadcode, split as much as we can out of libxul, turn off PGO for our trunk nightlies. Ehsan has filed a number of bugs for splitting things out - see dependants. I think we exhausted much of the obvious deadcode removal last time - at least some of what's left still requires a fair amount of work before it can be removed, aiui (eg RDF, old parser).

* Longer term we're completely reliant on bug 709480.
Comment 7 Mike Hommey [:glandium] 2012-05-01 09:58:27 PDT
(In reply to Ed Morley [:edmorley] from comment #6)
> * Longer term we're completely reliant on bug 709480.

Or something else I mentioned on irc: Try doing PGO on subparts of libxul when building the static libraries (maybe gklayout + the rest would do), and then link the static libraries together as libxul. Performance impact would need to be studied, though.
Comment 8 Ed Morley [:emorley] 2012-05-01 11:21:17 PDT
To try and unblock people a bit, mozilla-central has now been set to approved required for landings that do not affect any part of the windows build. mozilla-inbound will remain closed for now.

For more info see:
https://wiki.mozilla.org/Tree_Rules
Comment 9 Ed Morley [:emorley] 2012-05-01 11:22:19 PDT
Bah, s/approved/approval/
Comment 10 Ed Morley [:emorley] 2012-05-02 03:13:03 PDT
Latest values from inbound:
(continuing on from comment 2, this time oldest first)

d2596504ce97: 3021832192, 3021963264
d60f77b10824: 2992345088 (test-only)
bfa638e5df16: 2962644992, 2992783360 (disabling graphite)
e1f1d4f79b2d: 2992783360, 2992779264 (NPOTB)
83ff77ce8d6c: 2992455680 (reenable graphite + move to libgkmedias)
c3813fbb1c9a: 2992640000, 2962808832, 2992627712, 2992619520 (bug 748343)

Now that bug 750717 has landed, the values show up in TBPL's middle stats panel (under linker max vsize) when the build is selected, no need to open the logs.
Comment 11 David Bolter [:davidb] ***PTO until 29th*** 2012-05-02 12:01:13 PDT
How far back does our linker virtual mem size data go?
Comment 12 Phil Ringnalda (:philor) 2012-05-02 12:05:23 PDT
It's in the build logs, so as far as logs go, 30 days.
Comment 13 David Bolter [:davidb] ***PTO until 29th*** 2012-05-02 12:30:13 PDT
(In reply to Phil Ringnalda (:philor) from comment #12)
> It's in the build logs, so as far as logs go, 30 days.

Darn. I would be curious to see 12 months worth :)
Comment 14 :Ehsan Akhgari 2012-05-02 21:15:49 PDT
The trees are reopened now, I'm gonna call this fixed.
Comment 15 Ed Morley [:emorley] 2012-05-03 04:16:59 PDT
Since we'll lose the logs after 30 days, posting a few more peak linker usage values after Ehsan's awesome work, for future reference:

(continuing on from comment 10; old to new)

75de3dfde0bd: 2962407424 (libjpeg ripped out of libxul)
b60dc9ae8aae: 2992517120, 2962640896 (and libpng)
81f7513ed312: 2962550784 (qcms)
e15be411dff8: 2992361472, 2962223104 (expat)
a642269f01a2: 2962214912 (rm unused cairo debugging code)
e0d9d5a0987b: 2962190336, 2992058368 (bholley's CAPS pruning)
828281d69978: 2980945920 (cairo + pixman ripped out from libxul)
75c104703999: 2980843520 (tree now reopened, normal landings...)
5900fe7cd355: 2963116032, 2980843520
807403a04a6a: 2980831232

Subtracting the highest value post libxul diet from those in comment 2, shows we now have ~39MB more headroom.

To give a rough idea of how long this may last us (obviously extremely dependant on what lands, but better than nothing), between 2012-02-27 (bug 710840 comment 6) and 2012-05-01 there was a ~96MB increase.
Comment 16 Chris AtLee [:catlee] 2012-05-03 05:29:51 PDT
We keep logs for nightly builds FOREVAAAAAH, e.g.

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/01/2012-01-01-03-10-15-mozilla-central/mozilla-central-win32-nightly-build42.txt.gz

linker max virtual size: 3028107264
Comment 17 Mike Hommey [:glandium] 2012-05-03 05:43:44 PDT
(In reply to Mike Hommey [:glandium] from comment #7)
> (In reply to Ed Morley [:edmorley] from comment #6)
> > * Longer term we're completely reliant on bug 709480.
> 
> Or something else I mentioned on irc: Try doing PGO on subparts of libxul
> when building the static libraries (maybe gklayout + the rest would do), and
> then link the static libraries together as libxul. Performance impact would
> need to be studied, though.

No cheese. lib.exe doesn't do anything really useful with /LTCG. It's still the final linkage doing all the work. The only possible way out with this technique would be create a PGOed dll for gklayout, and convert it to a static library. It would require that 1. gklayout is compilable as a dll and that 2. we have something to convert a dll to a static library.
Comment 18 :Ehsan Akhgari 2012-05-08 08:55:14 PDT
(In reply to Mike Hommey [:glandium] from comment #17)
> (In reply to Mike Hommey [:glandium] from comment #7)
> > (In reply to Ed Morley [:edmorley] from comment #6)
> > > * Longer term we're completely reliant on bug 709480.
> > 
> > Or something else I mentioned on irc: Try doing PGO on subparts of libxul
> > when building the static libraries (maybe gklayout + the rest would do), and
> > then link the static libraries together as libxul. Performance impact would
> > need to be studied, though.
> 
> No cheese. lib.exe doesn't do anything really useful with /LTCG. It's still
> the final linkage doing all the work.

boo!

> The only possible way out with this
> technique would be create a PGOed dll for gklayout, and convert it to a
> static library. It would require that 1. gklayout is compilable as a dll and
> that 2. we have something to convert a dll to a static library.

I'm pretty sure that we've broken (1) since everything moved into libxul.  Also, I don't know of a way to do (2) but that doesn't mean that it's not possible.

Note You need to log in before you can comment on or make changes to this bug.