Last Comment Bug 543034 - Windows builder failing, with nsannotationservice.cpp(457) : "fatal error C1001: An internal error has occurred in the compiler" or "fatal error C1002: compiler is out of heap space in pass 2"
: Windows builder failing, with nsannotationservice.cpp(457) : "fatal error C10...
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: x86 Windows XP
: P1 blocker (vote)
: ---
Assigned To: Nick Thomas [:nthomas]
:
Mentors:
http://tinderbox.mozilla.org/showlog....
: 542429 (view as bug list)
Depends on:
Blocks: 709193 750661
  Show dependency treegraph
 
Reported: 2010-01-29 09:09 PST by Daniel Holbert [:dholbert]
Modified: 2013-08-12 21:54 PDT (History)
44 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
workaround? [no, doesn't help] (1.50 KB, patch)
2010-02-02 13:48 PST, Daniel Holbert [:dholbert]
gavin.sharp: review+
Details | Diff | Splinter Review
process info (195.06 KB, image/png)
2010-02-15 02:03 PST, Nick Thomas [:nthomas]
no flags Details
[opsi-package-sources] Add /3GB to boot.ini (3.80 KB, patch)
2010-02-16 01:09 PST, Nick Thomas [:nthomas]
bhearsum: review+
bhearsum: checked‑in+
Details | Diff | Splinter Review

Description Daniel Holbert [:dholbert] 2010-01-29 09:09:07 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264769305.1264783731.25012.gz
WINNT 5.2 mozilla-central nightly on 2010/01/29 04:48:25
s: win32-slave12

> PGOMGR : warning PG0188: No .PGC files matching 'xul!*.pgc' were found.
>    Creating library xul.lib and object xul.exp
> Generating code
> 3700 of 103836 (  3.56%) profiled functions will be compiled for speed
> NEXT ERROR e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
> (compiler file 'F:\SP\vctools\compiler\utc\src\P2\main.c[0x10CBB356:0x339D0000]', line 182)
>  To work around this problem, try simplifying or changing the program near the locations listed above.
> Please choose the Technical Support command on the Visual C++ 
>  Help menu, or open the Technical Support help file for more information
> 
> LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
> 
>   Version 8.00.50727.762
> 
>   ExceptionCode            = C0000005
>   ExceptionFlags           = 00000000
>   ExceptionAddress         = 10CBB356 (10B00000) "d:\msvs8\VC\BIN\c2.dll"
>   NumberParameters         = 00000002
>   ExceptionInformation[ 0] = 00000000
>   ExceptionInformation[ 1] = 339D0000
> 
> CONTEXT:
>   Eax    = 474F0028  Esp    = 0012ED40
>   Ebx    = 000A0470  Ebp    = 00000000
>   Ecx    = 339D0000  Esi    = 339D0000
>   Edx    = 642E9998  Edi    = 642E9944
>   Eip    = 10CBB356  EFlags = 00010206
>   SegCs  = 0000001B  SegDs  = 00000023
>   SegSs  = 00000023  SegEs  = 00000023
>   SegFs  = 0000003B  SegGs  = 00000000
>   Dr0    = 00000000  Dr3    = 00000000
>   Dr1    = 00000000  Dr6    = 00000000
>   Dr2    = 00000000  Dr7    = 00000000
> make[5]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox/toolkit/library'
> make[4]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox'
> make[5]: *** [xul.dll] Error 232
> make[5]: *** Deleting file `xul.dll'
> make[4]: *** [libs_tier_toolkit] Error 2
> make[3]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox'
> make[3]: *** [tier_toolkit] Error 2
> make[2]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox'
> make[2]: *** [default] Error 2
> make[1]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build'
> make[1]: *** [build] Error 2
> make: *** [profiledbuild] Error 2
Comment 1 Daniel Holbert [:dholbert] 2010-01-29 16:53:57 PST
Alice kindly started a replacement nightly build, but it failed with the same problem:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264802234.1264811517.24068.gz
WINNT 5.2 mozilla-central nightly on 2010/01/29 13:57:14
s: win32-slave38

So, looks non-random... Maybe a checkin from yesterday broke something?
Comment 2 Daniel Holbert [:dholbert] 2010-01-29 17:26:26 PST
The file nsannotationservice.cpp hasn't changed in 13 days, and Mak doesn't think any checkins from yesterday look suspicious...

Comparing one of the broken buildlogs vs a non-broken one, the contextual lines look identical...

Perhaps we got an update to MSVC yesterday, and that broke something?
Comment 3 Daniel Holbert [:dholbert] 2010-01-29 18:07:14 PST
For code changes, mak pointed out that bug 500328's changeset looks like the only one remotely related to this:
http://hg.mozilla.org/mozilla-central/rev/dc7a04be6904

It makes some changes to the nsIVariant intervace & implmentation, and the line that the compiler flags with an internal error (nsannotationservice.cpp:457) has just done some work with an nsIVariant

I may try backing out that changeset later tonight and clobbering the nightly again...
Comment 4 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-01-29 18:47:12 PST
(In reply to comment #2)
> Perhaps we got an update to MSVC yesterday, and that broke something?

There were no updates/changes to the MSVC compilers installed on the build machines.
Comment 5 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-01-29 18:48:59 PST
dholbert ping'd me in irc; investigating.
Comment 6 Daniel Holbert [:dholbert] 2010-01-29 19:10:43 PST
(In reply to comment #3)
> I may try backing out that changeset later tonight and clobbering the nightly
> again...

Backed out that changeset:
http://hg.mozilla.org/mozilla-central/rev/6d50455cabaa
http://hg.mozilla.org/mozilla-central/rev/b0b9d8dca9d6

Joduinn is respinning the nightly... we'll see if the backout fixes anything.
Comment 7 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-01-29 19:13:40 PST
(In reply to comment #6)
> (In reply to comment #3)
> > I may try backing out that changeset later tonight and clobbering the nightly
> > again...
> 
> Backed out that changeset:
> http://hg.mozilla.org/mozilla-central/rev/6d50455cabaa
> http://hg.mozilla.org/mozilla-central/rev/b0b9d8dca9d6
> 
> Joduinn is respinning the nightly... we'll see if the backout fixes anything.

nightly started on win32-slave15.
Comment 8 Daniel Holbert [:dholbert] 2010-01-29 23:25:29 PST
aaaand we got a green cycle:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264821121.1264834641.15890.gz

So, looks like that changeset was indeed the 'guilty' one...
Comment 9 Marco Bonardo [::mak] 2010-01-30 02:55:03 PST
fwiw, these bugs should be filed against Microsoft since 90% of the times is a real compiler bug, btw it's most likely that just changing some minor thing will skip the problem... The best thing is that someone from build engineering works directly on the machine trying to get information for a MS bug and at the same time seeing what's different with the usual opt build boxes.

it's even possible that pushing again after some other change will directly go green, the optimization step optimizes all the code at once. Still i'd be scared of seeing this again in future, so either update MSVC or file the bug upstream.
Comment 10 Nick Thomas [:nthomas] 2010-01-30 11:29:10 PST
All the windows builds (on any mercurial based branch) allocate the work against  a pool of identical machines, so there is no difference between an opt and a nightly build in terms of compiler. The key difference is that "nightly" builds are always clobbers (so is try server), and "build" will be a mixture of depend and clobbers.
Comment 11 Ted Mielczarek [:ted.mielczarek] 2010-01-30 12:12:50 PST
Marco is right in that this appears to be a Microsoft compiler bug. That being said, even if we report it I wouldn't hold my breath for a fix anytime soon, since VC 2010 is coming out soon, and this is VC 2008, which will then be 2 releases behind.
Comment 12 Ted Mielczarek [:ted.mielczarek] 2010-01-30 12:13:00 PST
er, I meant VC 2005.
Comment 13 Jim Jeffery not reading bug-mail 1/2/11 2010-01-31 12:49:34 PST
This same error has crashed the m-c hourly build:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264955976.1264968176.1880.gz
Comment 14 Nick Thomas [:nthomas] 2010-01-31 14:23:41 PST
I think we have an intermittent problem and rev dc7a04be6904 was not at fault, or not wholly at fault. The data we have is collected at
 http://spreadsheets.google.com/pub?key=tVZQKFDccCvXx63C2OmVBNg&output=html
based on the hypothesis that it's clobber builds which fail. Some comments:
* "WINNT 5.2 mozilla-central build" will mostly use an existing objdir, and sometimes clobber (either forced, 7 days since last clobber, effective clobber because the disk space was needed by another build)
* "WINNT 5.2 mozilla-central nightly" is always a clobber build
* the failures have been on three separate slaves (win32-slave[12,19,38]), so it's not a slave that's gone mad
* the original identification of dc7a04be6904 and subsequent backout at b0b9d8dca9d6 gave us three green nightlies
* it doesn't explain why the clobber at 5ad17deecfe0 succeeded, nor why the most recent build on 3048d03980e7 failed
Comment 15 Daniel Holbert [:dholbert] 2010-01-31 17:02:52 PST
The cycle after comment 13 also failed, with a slightly different message:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264973998.1264982902.3272.gz
> e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
> LINK : fatal error LNK1257: code generation failed

Note that that's still the same line in nsAnnotationService.cpp -- line 457.

I wonder if "compiler is out of heap space" has been the problem here all along -- but depending on *when* it runs out space, the compiler just dies with a cryptic "internal error" message? (as it has done up until this particular log)
Comment 16 Robert Strong [:rstrong] (use needinfo to contact me) 2010-01-31 18:07:29 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264982541.1264989883.14988.gz
Comment 17 Nick Thomas [:nthomas] 2010-01-31 19:59:35 PST
So we've had two failures in non-clobbering builds now and this looks like a more general problem doing PGO. Philor also mentioned seeing the same error on the 26th.

A quick google search didn't turn up a way to control the size of the compiler's heap space. The memory usage data from the VM management isn't great but I don't think we're using more memory all of a sudden, nor taking significantly longer to complete a build. Has the build complexity increased recently ? Or become more recursive ?
Comment 18 Phil Ringnalda (:philor) 2010-01-31 20:16:17 PST
And the reason I knew it was the 26th was because I taunted cjones into filing bug 542429 about it.
Comment 19 Makoto Kato [:m_kato] (PTO 9/22-9/25) 2010-02-01 00:47:11 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265001315.1265008475.20904.gz
WINNT 5.2 mozilla-central build on 2010/01/31 21:15:15
Comment 20 Ted Mielczarek [:ted.mielczarek] 2010-02-01 06:03:21 PST
Nothing significant has changed in the build recently, AFAIK.
Comment 21 Daniel Holbert [:dholbert] 2010-02-01 11:45:59 PST
I just posted on bug 500328, clearing it of guilt for causing this bug, per comment 14 & beyond.
Comment 22 Aki Sasaki [:aki] 2010-02-01 11:49:37 PST
*** Bug 542429 has been marked as a duplicate of this bug. ***
Comment 23 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-01 14:22:18 PST
Reassigning to buildduty. I grabbed it only because dholbert ping'd me in irc Friday evening and I was able to trigger nightlies for him.
Comment 24 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-01 14:44:50 PST
(In reply to comment #8)
> aaaand we got a green cycle:
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264821121.1264834641.15890.gz
> 
> So, looks like that changeset was indeed the 'guilty' one...

jruderman reports that this is still happening, intermittently, so while this
changeset might have contributed to tickling a compiler bug, it seems to not be
the only tickler.

We already have vs9 installed on the same pool-o-slaves, and could get newer if
asked. However if we want to upgrade to this from vc2008, we'd have to be
careful about binary compat issues with the firefox releases still supported on
those same machines.
Comment 25 Ted Mielczarek [:ted.mielczarek] 2010-02-01 15:40:20 PST
There isn't anything newer than VC 2008 yet (2010 is still in beta). We would need SP1 installed, for --enable-jemalloc (bug 529169). I don't think switching would have any negative effects, lots of developers build with VC 2008.

Personally, I was holding out for VC 2010, since I've verified that that version does in fact fix a bug that impacts us (bug 520651).
Comment 26 Phil Ringnalda (:philor) 2010-02-01 21:52:27 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265079021.1265088295.14024.gz
WINNT 5.2 mozilla-central build on 2010/02/01 18:50:21
s: win32-slave24

e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
Comment 27 Daniel Holbert [:dholbert] 2010-02-02 12:12:12 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265131346.1265141207.4084.gz
WINNT 5.2 mozilla-central build on 2010/02/02 09:22:26
s: win32-slave16
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 28 Phil Ringnalda (:philor) 2010-02-02 13:05:31 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265133454.1265144280.6383.gz
WINNT 5.2 mozilla-central build on 2010/02/02 09:57:34
s: win32-slave20
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 29 Daniel Holbert [:dholbert] 2010-02-02 13:42:01 PST
This has made the Win32 opt builder pretty much perma-red today.  Looking into workarounds.

Bug 281158 is about a similar-looking internal compiler error elsewhere. It was suggested in that bug to wrap the affected code with 
      #pragma optimize( "", off )"
We didn't end up actually doing that in that bug, but we *did* in a different bug: bug 501082 (changeset linked in initial comment there).

For lack of any better ideas, I propose we do the same thing around the affected chunk of nsAnnotationService.cpp -- specifically, the implementation of nsAnnotationService::SetItemAnnotation.  Of course we'd prefer not to mark a chunk of code as "don't optimize me", but if it gets us non-burning builds, it's worth it as an interim stopgap at least.
Comment 30 Daniel Holbert [:dholbert] 2010-02-02 13:48:33 PST
Created attachment 424859 [details] [diff] [review]
workaround? [no, doesn't help]

This patch does what I suggest in previous comment.  Requesting r=gavin, since he wrote the similar patch on bug 501082.
Comment 31 :Gavin Sharp [email: gavin@gavinsharp.com] 2010-02-02 14:03:26 PST
Comment on attachment 424859 [details] [diff] [review]
workaround? [no, doesn't help]

worth a shot!
Comment 32 Daniel Holbert [:dholbert] 2010-02-02 14:08:48 PST
Landed "workaround?": http://hg.mozilla.org/mozilla-central/rev/0949b169357b
Comment 33 Daniel Holbert [:dholbert] 2010-02-02 15:55:08 PST
FWIW, the cycle just *before* my workaround-push ended up being green, after a string of 4-5 consecutive cycles that had this failure. :)

So, even if the cycle built from the workaround-patch ends up being green, that doesn't necessarily mean it worked... we'll have to see whether the greenness sticks.
Comment 34 Marco Bonardo [::mak] 2010-02-02 16:01:42 PST
do we know if relanding bug 500328 has increase the failure ratio or not? that could still give some useful information about what code we could try to change.
Comment 35 Daniel Holbert [:dholbert] 2010-02-02 16:07:39 PST
It's possible... however, note that on Friday, when we backed bug 500328 out, we'd really only seen this failure twice (on nightly builds) and we'd had tons of passing tinderbox opt builds since it had first landed.  So, I don't think there's any strong correlation at this point.
Comment 36 Marco Bonardo [::mak] 2010-02-02 16:12:08 PST
(In reply to comment #35)
> It's possible... however, note that on Friday, when we backed bug 500328 out,
> we'd really only seen this failure twice (on nightly builds) and we'd had tons
> of passing tinderbox opt builds since it had first landed. 

i'm not thinking to strong correlations with patches just trying to find code correlations.
But also this thing makes me think: why this has started being so frequent suddenly? Is it just due to some specific code change i nthe last week? Could be are working parallel to some VS2005 limit and we should plan to upgrade asap to VS2008, otherwise this will return in another form?
Comment 37 Ted Mielczarek [:ted.mielczarek] 2010-02-02 18:15:24 PST
It's certainly possible we're bumping up against some internal compiler limit. I kind of wanted to wait till VC 2010 was released to upgrade, though (as I said in comment 25).
Comment 38 Daniel Holbert [:dholbert] 2010-02-02 19:31:25 PST
Comment on attachment 424859 [details] [diff] [review]
workaround? [no, doesn't help]

The cycle built from comment 32's 'workaround' was green, but the next two cycles were both red with this same issue:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265156801.1265167524.16160.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265157211.1265167401.14815.gz

Backing workaround out, since it apparently didn't help.
Comment 39 Daniel Holbert [:dholbert] 2010-02-02 19:34:36 PST
backed out: http://hg.mozilla.org/mozilla-central/rev/56a02566af53
Comment 40 Phil Ringnalda (:philor) 2010-02-02 22:06:23 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265161691.1265175922.11593.gz
WINNT 5.2 mozilla-central build on 2010/02/02 17:48:11
s: win32-slave12
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\xre\nsapprunner.cpp(3596) : fatal error C1001: An internal error has occurred in the compiler.
Comment 41 Daniel Holbert [:dholbert] 2010-02-02 23:43:59 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265166315.1265178266.4715.gz
WINNT 5.2 mozilla-central build on 2010/02/02 19:05:15
s: win32-slave08
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\xre\nsapprunner.cpp(3596) : fatal error C1001: An internal error has occurred in the compiler.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265168403.1265179952.22913.gz
WINNT 5.2 mozilla-central build on 2010/02/02 19:40:03
s: win32-slave42
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 42 Phil Ringnalda (:philor) 2010-02-04 08:09:02 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265287341.1265299160.6835.gz
WINNT 5.2 mozilla-central build on 2010/02/04 04:42:21
s: win32-slave11
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 43 Phil Ringnalda (:philor) 2010-02-04 09:31:52 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265292985.1265303911.31448.gz
WINNT 5.2 mozilla-central build on 2010/02/04 06:16:25
s: win32-slave33
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 44 Phil Ringnalda (:philor) 2010-02-04 14:44:53 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265312255.1265321624.6881.gz
WINNT 5.2 mozilla-central build on 2010/02/04 11:37:35
s: win32-slave34
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 45 Phil Ringnalda (:philor) 2010-02-04 15:23:53 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265313560.1265323849.31706.gz
WINNT 5.2 mozilla-central build on 2010/02/04 11:59:20
s: win32-slave05
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 46 Phil Ringnalda (:philor) 2010-02-04 16:03:24 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265316791.1265327765.10740.gz
WINNT 5.2 mozilla-central build on 2010/02/04 12:53:11
s: win32-slave13
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 47 Phil Ringnalda (:philor) 2010-02-04 21:42:26 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265330901.1265341661.3602.gz
WINNT 5.2 mozilla-central build on 2010/02/04 16:48:21
s: win32-slave38
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 48 Armen Zambrano [:armenzg] (EDT/UTC-4) 2010-02-05 06:44:38 PST
Has anybody tried cloberring all the mozilla-central checkouts from the clobberer page?

Does this happen on the try server with a no-op build?
Comment 49 Ted Mielczarek [:ted.mielczarek] 2010-02-05 06:54:23 PST
This has appeared on lots of nightly builds, which are clobbers, so that seems unlikely to fix it.

Last I knew, try server win32 builds were not PGO. Has that changed yet? I highly suspect this is a PGO-only issue.
Comment 50 Daniel Holbert [:dholbert] 2010-02-05 13:01:01 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265390986.1265403415.14210.gz
WINNT 5.2 mozilla-central build on 2010/02/05 09:29:46
s: win32-slave09
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 51 Aki Sasaki [:aki] 2010-02-05 13:05:37 PST
Try builds are not currently PGO.
Comment 52 Daniel Holbert [:dholbert] 2010-02-05 15:22:04 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265402027.1265411107.3247.gz
WINNT 5.2 mozilla-central build on 2010/02/05 12:33:47
s: win32-slave16
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 53 Daniel Holbert [:dholbert] 2010-02-05 15:24:12 PST
(In reply to comment #49)
> Last I knew, try server win32 builds were not PGO. Has that changed yet? I
> highly suspect this is a PGO-only issue.

Agreed that this is PGO-only.  As shown in comment 0, the line right before the failure is always something like this:
> 3700 of 103836 (  3.56%) profiled functions will be compiled for speed
Comment 54 Phil Ringnalda (:philor) 2010-02-05 15:35:35 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265398835.1265410835.32524.gz
WINNT 5.2 mozilla-central build on 2010/02/05 11:40:35
s: win32-slave10
Comment 55 Phil Ringnalda (:philor) 2010-02-05 19:01:01 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265409314.1265423098.14815.gz
WINNT 5.2 mozilla-central build on 2010/02/05 14:35:14
s: win32-slave11
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 56 Ted Mielczarek [:ted.mielczarek] 2010-02-06 06:39:53 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265456057.1265466168.15613.gz
WINNT 5.2 mozilla-central build on 2010/02/06 03:34:17  
s: win32-slave40
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
Comment 57 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-02-06 08:50:32 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265466029.1265473450.4352.gz
s: win32-slave35e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
Comment 58 Phil Ringnalda (:philor) 2010-02-06 10:33:36 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265453038.1265462915.7475.gz
WINNT 5.2 mozilla-central build on 2010/02/06 02:43:58
s: win32-slave35
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 59 Phil Ringnalda (:philor) 2010-02-06 10:56:38 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265469057.1265481477.19077.gz
WINNT 5.2 mozilla-central build on 2010/02/06 07:10:57
s: win32-slave26
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 60 Daniel Holbert [:dholbert] 2010-02-06 13:37:37 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265480759.1265491908.6385.gz
WINNT 5.2 mozilla-central build on 2010/02/06 10:25:59
s: win32-slave11
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 61 Daniel Holbert [:dholbert] 2010-02-06 16:15:32 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265492246.1265498536.15152.gz
WINNT 5.2 mozilla-central build on 2010/02/06 13:37:26
s: win32-slave35
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 62 Daniel Holbert [:dholbert] 2010-02-06 17:54:44 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265495977.1265507330.14132.gz
WINNT 5.2 mozilla-central build on 2010/02/06 14:39:37
s: win32-slave18
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 63 Phil Ringnalda (:philor) 2010-02-06 19:47:17 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265504572.1265512923.9886.gz
WINNT 5.2 mozilla-central build on 2010/02/06 17:02:52
s: win32-slave09
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 64 Phil Ringnalda (:philor) 2010-02-07 09:54:24 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265544806.1265556316.6236.gz
WINNT 5.2 mozilla-central build on 2010/02/07 04:13:26
s: win32-slave08
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 65 Phil Ringnalda (:philor) 2010-02-07 10:03:09 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265542041.1265554118.7301.gz
WINNT 5.2 mozilla-central build on 2010/02/07 03:27:21
s: win32-slave17
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 66 Phil Ringnalda (:philor) 2010-02-07 10:35:32 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265558179.1265567390.11977.gz
WINNT 5.2 mozilla-central build on 2010/02/07 07:56:19
s: win32-slave32
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 67 Phil Ringnalda (:philor) 2010-02-07 10:37:30 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265540523.1265550443.27831.gz
WINNT 5.2 mozilla-central nightly on 2010/02/07 03:02:03
s: win32-slave35
e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 68 Nick Thomas [:nthomas] 2010-02-07 12:59:20 PST
Related issue:

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1265542320.1265547707.27169.gz
WINNT 5.2 tracemonkey nightly on 2010/02/07 03:32:00
s: win32-slave50

link -NOLOGO -DLL -OUT:mozjs.dll .... -LTCG:PGUPDATE ...
PGOMGR : warning PG0188: No .PGC files matching 'mozjs!*.pgc' were found.
   Creating library mozjs.lib and object mozjs.exp
Generating code
1759 of 5262 ( 33.43%) profiled functions will be compiled for speed
NEXT ERROR e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jshashtable.h(297) : fatal error C1001: An internal error has occurred in the compiler.
(compiler file 'F:\SP\vctools\compiler\utc\src\P2\main.c', line 216)
 To work around this problem, try simplifying or changing the program near the locations listed above.
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
LINK : fatal error LNK1257: code generation failed

LNK1257 is "failed to perform code generation" when using /GL.
Comment 69 Phil Ringnalda (:philor) 2010-02-07 17:03:50 PST
mozilla-central nightly number two for the day:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265575600.1265584996.14792.gz
WINNT 5.2 mozilla-central nightly on 2010/02/07 12:46:40
s: win32-slave26
e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 70 Phil Ringnalda (:philor) 2010-02-07 20:58:50 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265596494.1265603851.32325.gz
WINNT 5.2 mozilla-central build on 2010/02/07 18:34:54
s: win32-slave25
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.

and nightly number three:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265587011.1265595645.4782.gz
WINNT 5.2 mozilla-central nightly on 2010/02/07 15:56:51
s: win32-slave34
e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 71 Ted Mielczarek [:ted.mielczarek] 2010-02-08 04:48:29 PST
(In reply to comment #68)
> Related issue:

Ouch. mozjs.dll is much smaller than xul.dll, if this was a code size issue I wouldn't expect to see it there.
Comment 72 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-08 11:56:42 PST
Per irc with bhearsum, this caused the FF3.7a1 win32 builds to fail.
Comment 73 Phil Ringnalda (:philor) 2010-02-08 12:59:52 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265648812.1265661852.1294.gz
WINNT 5.2 mozilla-central build on 2010/02/08 09:06:52
s: win32-slave13
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 74 Phil Ringnalda (:philor) 2010-02-08 15:56:17 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265663168.1265671384.15320.gz
WINNT 5.2 mozilla-central build on 2010/02/08 13:06:08
s: win32-slave43
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265664574.1265671219.13465.gz
WINNT 5.2 mozilla-central build on 2010/02/08 13:29:34
s: win32-slave16
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 75 Chris AtLee [:catlee] 2010-02-08 16:54:40 PST
This (or something similar) is also happening on tracemonkey in the js library.

I tried out vs2008, and got a similar crash:
make export
make[1]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src'
make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config'
make[3]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/mkdepend'
make[3]: Nothing to be done for `export'.
make[3]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/mkdepend'
e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/nsinstall.exe nsinstall.exe ../../../dist/bin
make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config'
make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/shell'
d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \
	  js ../../../dist/bin
make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/shell'
make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsapi-tests'
d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \
	  jsapi-tests ../../../dist/bin
make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsapi-tests'
make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/tests'
make[2]: Nothing to be done for `export'.
make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/tests'
d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \
	  mozjs ./../../dist/bin
e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/nsinstall.exe -m 644 js-config.h jsautocfg.h /e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsautokw.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/js.msg /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsarray.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsarena.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsatom.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbool.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsclist.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscntxt.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscompat.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdate.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdbgapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdhash.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdtoa.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsemit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsfun.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsgc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jshash.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsinterp.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsinttypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsiter.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jslock.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jslong.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsmath.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsnum.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsobj.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsobjinlines.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/json.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsopcode.tbl /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsopcode.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsotypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsparse.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsprf.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsproto.tbl /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsprvtd.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jspubtd.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsregexp.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscan.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscope.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscript.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscriptinlines.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsstaticcheck.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsstr.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstask.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstracer.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstypedarray.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsutil.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsvector.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstl.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jshashtable.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsversion.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsxdrapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsxml.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbuiltins.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Assembler.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Allocator.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/CodeAlloc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Containers.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/LIR.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/avmplus.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Fragmento.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Native.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Nativei386.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/RegAlloc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/nanojit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/VMPI.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscpucfg.h ./../../dist/include
mkdir -p nanojit
make[1]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src'
make libs
make[1]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src'
link -NOLOGO -DLL -OUT:mozjs.dll -PDB:mozjs.pdb -SUBSYSTEM:WINDOWS  jsapi.obj jsarena.obj jsarray.obj jsatom.obj jsbool.obj jscntxt.obj jsdate.obj jsdbgapi.obj jsdhash.obj jsdtoa.obj jsemit.obj jsexn.obj jsfun.obj jsgc.obj jshash.obj jsinterp.obj jsinvoke.obj jsiter.obj jslock.obj jslog2.obj jsmath.obj jsnum.obj jsobj.obj json.obj jsopcode.obj jsparse.obj jsprf.obj jsregexp.obj jsscan.obj jsscope.obj jsscript.obj jsstr.obj jstask.obj jstypedarray.obj jsutil.obj jsxdrapi.obj jsxml.obj prmjtime.obj jstracer.obj Assembler.obj Allocator.obj CodeAlloc.obj Containers.obj Fragmento.obj LIR.obj RegAlloc.obj avmplus.obj Nativei386.obj jsbuiltins.obj VMPI.obj     -MANIFESTUAC:NO -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -OPT:REF -LTCG:PGUPDATE   e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/nspr4.lib e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/plc4.lib e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/plds4.lib  kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib   
PGOMGR : warning PG0188: No .PGC files matching 'mozjs!*.pgc' were found.
   Creating library mozjs.lib and object mozjs.exp
Generating code
1766 of 5262 ( 33.56%) profiled functions will be compiled for speed
e:\builds\moz2_slave\tracemonkey-win32\build\js\src\jshashtable.h(297) : fatal error C1001: An internal error has occurred in the compiler.
(compiler file 'f:\dd\vctools\compiler\utc\src\p2\main.c[0xE8575000:0xE8575000]', line 182)
 To work around this problem, try simplifying or changing the program near the locations listed above.
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information

LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage

  Version 9.00.21022.08

  ExceptionCode            = C0000005
  ExceptionFlags           = 00000000
  ExceptionAddress         = E8575000
  NumberParameters         = 00000002
  ExceptionInformation[ 0] = 00000000
  ExceptionInformation[ 1] = E8575000

CONTEXT:
  Eax    = FFFFFFFC  Esp    = 0012ECC8
  Ebx    = 00000008  Ebp    = 0012ECD8
  Ecx    = 51037C01  Esi    = 0012ED29
  Edx    = 0012ED2A  Edi    = 0852043C
  Eip    = E8575000  EFlags = 00010297
  SegCs  = 0000001B  SegDs  = 00000023
  SegSs  = 00000023  SegEs  = 00000023
  SegFs  = 0000003B  SegGs  = 00000000
  Dr0    = 00000000  Dr3    = 00000000
  Dr1    = 00000000  Dr6    = 00000000
  Dr2    = 00000000  Dr7    = 00000000
make[1]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src'
Comment 76 Phil Ringnalda (:philor) 2010-02-08 19:24:37 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265675212.1265684889.10919.gz
WINNT 5.2 mozilla-central build on 2010/02/08 16:26:52
s: win32-slave05
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 77 Phil Ringnalda (:philor) 2010-02-09 08:31:19 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265708141.1265713826.22130.gz
WINNT 5.2 mozilla-central build on 2010/02/09 01:35:41
s: win32-slave05
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 78 Ted Mielczarek [:ted.mielczarek] 2010-02-09 11:48:26 PST
I filed a Microsoft Connect bug on this:
https://connect.microsoft.com/VisualStudio/feedback/details/532306/internal-error-during-image-buildimage-during-pgo-link
Comment 79 Ted Mielczarek [:ted.mielczarek] 2010-02-09 11:48:40 PST
(on the JS build)
Comment 80 Chris AtLee [:catlee] 2010-02-09 11:52:36 PST
Ted and I are going to try reproducing on revision http://hg.mozilla.org/mozilla-central/rev/43e818c28059, which we know failed to build the first time for 3.7a1.

I'm going to try re-linking xul.dll continuously to see if it fails.

Ted is going to try re-building the whole tree continuously.
Comment 81 Phil Ringnalda (:philor) 2010-02-09 12:23:39 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265734335.1265746795.32657.gz
WINNT 5.2 mozilla-central build on 2010/02/09 08:52:15
s: win32-slave19
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 82 Phil Ringnalda (:philor) 2010-02-09 20:58:16 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265763290.1265773452.10156.gz
WINNT 5.2 mozilla-central build on 2010/02/09 16:54:50
s: win32-slave28
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 83 Phil Ringnalda (:philor) 2010-02-09 21:07:31 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265768094.1265777945.28286.gz
WINNT 5.2 mozilla-central build on 2010/02/09 18:14:54
s: win32-slave27
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Comment 84 Chris AtLee [:catlee] 2010-02-10 06:11:55 PST
Unable to reproduce by continually re-linking xul.dll.  Going to try full rebuilds now.
Comment 85 Chris AtLee [:catlee] 2010-02-10 06:18:03 PST
(In reply to comment #84)
> Unable to reproduce by continually re-linking xul.dll.  Going to try full
> rebuilds now.

xul.dll was re-linked 12 times by doing:
rm xul.dll
make MOZ_PROFILE_USE=1

in objdir/toolkit/library
Comment 86 Ted Mielczarek [:ted.mielczarek] 2010-02-10 07:26:06 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265803136.1265815323.2540.gz
WINNT 5.2 mozilla-central build on 2010/02/10 03:58:56
Comment 87 Ted Mielczarek [:ted.mielczarek] 2010-02-10 11:34:16 PST
I've had almost non-stop PGO builds running on my local machine on the changeset from comment 80 with no failures yet. My local environment differs from tinderbox in that it's:
1) Windows 7 x64 with 4GB ram
1) Visual C++ 2008

I think we'll see how catlee's testing goes, if he can reproduce it there, then maybe I'll try installing VC2005 and narrow down the differences.
Comment 88 Ryan VanderMeulen [:RyanVM] 2010-02-10 12:11:02 PST
FWIW, I haven't run into this error yet with any of my builds on my system. I'm also Win7 x64 w/ 4GB of RAM. I'm using VC2005SP1 still, though.
Comment 89 Phil Ringnalda (:philor) 2010-02-10 12:57:29 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265825366.1265835278.9909.gz
WINNT 5.2 mozilla-central build on 2010/02/10 10:09:26
s: win32-slave31
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 90 Phil Ringnalda (:philor) 2010-02-10 17:08:59 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265838075.1265850047.15954.gz
WINNT 5.2 mozilla-central build on 2010/02/10 13:41:15
s: win32-slave02
e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Comment 91 Chris AtLee [:catlee] 2010-02-11 05:43:51 PST
Full rebuilds (by deleting the object directory) have failed to reproduce this crash on the VM after 7 attempts.

I'll try putting disk / memory pressure on the VM to see if that causes it to fail.
Comment 92 Timothy Nikkel (:tnikkel) 2010-02-12 12:47:31 PST
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265986899.1265994905.14820.gz
This log is interesting because it fails in view\src\nsview.cpp, not nsannotationservice.cpp.
Comment 93 Chris AtLee [:catlee] 2010-02-12 13:00:30 PST
So, putting heavy memory pressure on the VM caused the machine to exhaust virtual memory and almost hang....but no crash!

After killing everything, I restarted full builds and got a crash after the 2nd attempt.  Re-linking xul.dll with MOZ_PROFILE_USE=1 also crashed a second time.

I'm currently re-linking with LINK_REPRO set so we can try and get a reproducible test case out of it.  I've also tarred up the entire directory (1.6 GB) if anybody wants to examine it.
Comment 95 Chris AtLee [:catlee] 2010-02-12 13:14:22 PST
The page file on this machine has grown to 1529 MB.  The maximum is 1536 MB.
Comment 96 Chris AtLee [:catlee] 2010-02-12 14:41:43 PST
One of the new hardware windows machines also failed with this error.  This machine has 4 GB of RAM, and a 2 GB page file, which makes me think we're hitting the 2 GB address space limit.
Comment 97 Jesse Ruderman 2010-02-12 15:46:24 PST
Does Microsoft ship 64-bit versions of the compiler? ;)
Comment 98 Ryan VanderMeulen [:RyanVM] 2010-02-12 15:49:03 PST
If you're bumping into a 2GB process limit, might the /3GB switch help?
http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx
Comment 99 Jesse Ruderman 2010-02-12 15:49:39 PST
The mozilla-central tree is now closed because we haven't had a successful Windows opt build in nearly 20 hours.
Comment 100 Ted Mielczarek [:ted.mielczarek] 2010-02-12 15:52:08 PST
The linker can use >2GB of address space if it's available. I don't know that Microsoft has a 64-bit version of the 32-bit compiler. It's possible the 3GB thing might help, since that'd give the process 3GB of VM. Switching to a 64-bit OS would give it 4GB of VM.
Comment 101 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-02-12 16:05:20 PST
Per http://msdn.microsoft.com/en-us/library/hs24szh9%28VS.100%29.aspx (and previous versions) there is no 64 bit x86 compiler.
Comment 102 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-12 16:37:43 PST
Response from MSFT 

We have managed to reproduce the problem with the link repro you have provided. Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time constraints, but we will make sure it is fixed in subsequent releases.

In the meantime, we suggest two workarounds for you. The code exposing the bug is around Line 286 in file jstracer.cpp, relating to the use of _BitScanReverse. The first workaround is to turn off optimization for the function altogether, using "#pragma optimize ("", off)". More details can be found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx .

The 1st workaround may have unpleasant performance impact. Another workaround is to rewrite some of the code. The source operand of this particular _BitScanReverse call is actually a constant (0x3ff). You should be able to avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get rid of the call.
Comment 103 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-12 16:44:35 PST
(In reply to comment #102)
> Response from MSFT 
> 
> We have managed to reproduce the problem with the link repro you have provided.
> Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time
> constraints, but we will make sure it is fixed in subsequent releases.
> 
> In the meantime, we suggest two workarounds for you. The code exposing the bug
> is around Line 286 in file jstracer.cpp, relating to the use of
> _BitScanReverse. The first workaround is to turn off optimization for the
> function altogether, using "#pragma optimize ("", off)". More details can be
> found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx .

Looks like gavin already tried that without success.


> The 1st workaround may have unpleasant performance impact. Another workaround
> is to rewrite some of the code. The source operand of this particular
> _BitScanReverse call is actually a constant (0x3ff). You should be able to
> avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get
> rid of the call.

Can someone who understands _BitScanReverse please give this a try?
Comment 104 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-02-12 16:49:39 PST
(In reply to comment #103)
> (In reply to comment #102)
> > Response from MSFT 
> > 
> > We have managed to reproduce the problem with the link repro you have provided.
> > Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time
> > constraints, but we will make sure it is fixed in subsequent releases.
> > 
> > In the meantime, we suggest two workarounds for you. The code exposing the bug
> > is around Line 286 in file jstracer.cpp, relating to the use of
> > _BitScanReverse. The first workaround is to turn off optimization for the
> > function altogether, using "#pragma optimize ("", off)". More details can be
> > found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx .
> 
> Looks like gavin already tried that without success.

Wait. dholbert/gavin tried this in nsAnnotationService.cpp, not jstracer.cpp. dholbert/gavin, can you also try this in jstracer.cpp?

(Thanks to "tn" in irc for catching that!)


> > The 1st workaround may have unpleasant performance impact. Another workaround
> > is to rewrite some of the code. The source operand of this particular
> > _BitScanReverse call is actually a constant (0x3ff). You should be able to
> > avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get
> > rid of the call.
> 
> Can someone who understands _BitScanReverse please give this a try?
Comment 105 Luke Wagner [:luke] 2010-02-12 18:15:05 PST
I already tried the #pragma trick for jstracer.cpp, but perhaps with inlining and optimization the #pragma got lost.  This patch

  http://hg.mozilla.org/tracemonkey/rev/feac51b74044

#pragmas the call sites.  If it works, perhaps the same would work for the nsAnnotationService.cpp bust as well.
Comment 106 Emanuel Hoogeveen [:ehoogeveen] 2010-02-13 06:09:13 PST
(In reply to comment #103)
> > The 1st workaround may have unpleasant performance impact. Another workaround
> > is to rewrite some of the code. The source operand of this particular
> > _BitScanReverse call is actually a constant (0x3ff). You should be able to
> > avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get
> > rid of the call.
> 
> Can someone who understands _BitScanReverse please give this a try?
0x3ff = 0b1111111111; 0 is the least significant bit, so the first operand will be set to 9, and the function will return 1.
_BitScanReverse itself is trivial - its result appears to be undefined for a mask of 0 (I've gotten it to return different values), but otherwise it can be defined like this: (idea taken, slightly modified, from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup )

#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
static unsigned long const LogTable256[256] = 
{-1,    0,     1,     1,     2,     2,     2,     2,
  3,    3,     3,     3,     3,     3,     3,     3,
 LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
 LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)};
#undef LT

inline unsigned char _BitScanReverse2(unsigned long * i,
                                      unsigned long   v)
{ static unsigned long t, tt;
  *i = (tt = v >> 16) ? (t = tt >> 8) ? 24 + LogTable256[t]
                                      : 16 + LogTable256[tt]
                      : (t = v  >> 8) ?  8 + LogTable256[t]
                                      :  0 + LogTable256[v];
  return (v ? 1 : 0);
}

Of course, you need _BitScanReverse64 for 64-bit values, but that's already the case anyway. The above matches the output from Microsoft's version for all 32-bit values.
Comment 107 Emanuel Hoogeveen [:ehoogeveen] 2010-02-13 06:11:11 PST
Ah, of course the above should be _BitScanReverse, not _BitScanReverse2 - that was left over from testing.
Comment 108 Emanuel Hoogeveen [:ehoogeveen] 2010-02-13 06:15:00 PST
Oh, and the table can be unsigned char (was playing around with the first value before I realized it was undefined), and maybe the temporaries shouldn't be static for a multi-threaded environment. Sorry about the comment spam.
Comment 109 Luke Wagner [:luke] 2010-02-13 15:23:19 PST
The #pragmas in comment 105 didn't work, so, since _BitScanReverse is the culprit, I just replaced the JS_CEILING_LOG2 with a dumb loop and that did fix the permanent red.  So, thanks Ted, John and others!
Comment 110 Emanuel Hoogeveen [:ehoogeveen] 2010-02-13 15:31:54 PST
Is that function at all performance critical? The version of _BitScanReverse I gave above is about as fast in terms of operation count as such a function gets, and you could probably use the code directly in JS_CEILING_LOG2 with slight modifications. Regardless, I'm glad this is fixed.
Comment 111 Jim Jeffery not reading bug-mail 1/2/11 2010-02-13 15:57:04 PST
Is then good to go on the production side so the Tree can reopen ?
Comment 112 Luke Wagner [:luke] 2010-02-13 16:22:58 PST
(In reply to comment #110)
> Is that function at all performance critical?

Nope (http://hg.mozilla.org/tracemonkey/rev/3c7e7c13c311), but thanks for the suggestions.

(In reply to comment #111)
> Is then good to go on the production side so the Tree can reopen ?

I'm sorry I wasn't more specific, but this is for the ICE in jshashtable.h on  TraceMonkey.  The m-c issues seem to be, AFAICS, unrelated to the use of intrinsics.
Comment 113 Jim Jeffery not reading bug-mail 1/2/11 2010-02-13 16:33:50 PST
 > (In reply to comment #111)
> > Is then good to go on the production side so the Tree can reopen ?
> 
> I'm sorry I wasn't more specific, but this is for the ICE in jshashtable.h on 
> TraceMonkey.  The m-c issues seem to be, AFAICS, unrelated to the use of
> intrinsics.

Oh, I was under the impression you were using TraceMonkey as a test guinea pig  before commiting to the production side.

So, in-other-words - we are still stuck for a fix on the compiler problem.
Comment 114 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-02-13 18:58:45 PST
The tree is still closed blocking on this bug for which appears there is no fix yet.

Jesse closed the tree specifically until we got one green cycle on Windows Opt, which we did.  The tree closing rationale has morphed (per Tinderbox) into waiting until this is fixed but it doesn't look like a fix is anywhere near in hand for this.  Should the tree still be blocking on this?
Comment 115 John Daggett (:jtd) 2010-02-13 23:24:38 PST
Has this been reproduced on a machine other than one of the build machines?  If so, what are the steps to reproduce this?  What OS version/compiler version/mozconfig is needed?  I tried to reproduce this with a VM based on Win XP SP2, VS2008 and a mozconfig similar to the nightly build but couldn't reproduce the compiler bug.

One thing that often helps in these cases is to track down the specific file where the problem occurs, then generate a preprocessed file which includes all header info. Then using pragma's, figure out the minimum range within the code that causes the problem.
Comment 116 Ted Mielczarek [:ted.mielczarek] 2010-02-14 08:54:51 PST
This seems to be very difficult to reproduce on a non-build VM. catlee ran about 10 builds in a row on a build VM and failed to reproduce it. He finally reproduced it by running a script to exhaust virtual memory and then linking.

John: it's harder than that, because it's the linker crashing while doing the final PGO link, so the current testcase is "all of xul.dll". We think we might be exhausting the 32-bit address space here, I think RelEng wanted to try taking a build VM, bumping the ram to 3GB, and booting it with the Windows 3GB address space flag.

Also, I ran multiple PGO builds on my Win7 x64/ VC2008 machine (probably at least 7 in a row) without reproducing.
Comment 117 Justin Dolske [:Dolske] 2010-02-14 12:02:14 PST
(In reply to comment #114)
> Should the tree still be blocking on this?

I think so, at least without an *active* sheriff to gate checkins. It looks like Windows builds were starting to fail most of the time, so given the long build-to-test times we want to avoid people rushing in on green. Plus, backouts get long and painful when builds don't reliably finish.
Comment 118 Jesse Ruderman 2010-02-14 14:19:12 PST
Also, with only one green build a day, noticing a perf regression could take a week.
Comment 119 Justin Dolske [:Dolske] 2010-02-14 14:25:37 PST
[Probably shouldn't have an unowned tree blocker... Assigning to Ted since I heard him talking about it last week, but feel free to reassign to whomever is on the hook for this.]
Comment 120 Nick Thomas [:nthomas] 2010-02-14 18:48:15 PST
Regarding the memory exhaustion hypothesis, I re-ran today's nightly on win32-slave39 at a time when the VM infrastructure was at minimal loading (a few try builds where running, plus the usual background from old branches that build continuously). It failed with
  LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
The VM had rebooted at 04:51 at the end of the previous run, then been idle until I started the nightly at 13:23.

I was monitoring the memory usage with the Task Manager during both link phases. Towards the end of the 2nd link (performing the PGO), link.exe has a 
 * "Mem Usage" of about 1170MB, which wikipedia informs me is the working set, the actual RAM in use
 * "VM Size" of 1111MB, which is a measure of the virtual address space of the process
In the "Physical Memory" box of the Performance tab of the task manager there was still more than 450MB "Available", more than 600MB in "System Cache" (which I take to be file cache in memory). Now I had looked away at the point that link blew up, but the graph for the Page File Usage (which is really plotting the Total Commit Charge, which is really the total virtual memory in use) did not increase dramatically. It went up a bit but not enough to exhaust the memory.

At catlee's suggestion I'm going to take the known-to-fail build dir from comment #93, put it on slave39 and see if I can reproduce the link failure. If I can then I'll try using /3G in boot.ini.

Would it help any to see if Dr Watson catches the link crash and we can submit it to MS ?
Comment 121 Nick Thomas [:nthomas] 2010-02-14 20:46:41 PST
(In reply to comment #120)
> At catlee's suggestion I'm going to take the known-to-fail build dir from
> comment #93, put it on slave39 and see if I can reproduce the link failure. If
> I can then I'll try using /3G in boot.ini.

Blew up at the first attempt. The sequence seems to be 
* link grabs a bunch of memory (>1GB), presumably by loading a bunch of files
* thinks for 45 minutes, slight increase in memory usage
* mspdbsvr uses cpu for a few seconds
* link gets the cpu back
* build craps out

Trying /3GB now.
Comment 122 Nick Thomas [:nthomas] 2010-02-15 02:03:59 PST
Created attachment 426957 [details]
process info

Got three successful links when booted with /3GB. The attachment shows link using up almost all of the 2GB of virtual memory space for applications (courtesy of VMMap from SysInternals), and lots of headroom left over when /3GB is set. This makes it clear that it's reserving a much larger hunk of the address space than task manager reports as actually in use. We'll have to check if there are any downsides to setting /3GB, eg see 
http://blogs.technet.com/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx for some warnings, but relinking for PGO is likely to be the most stressful thing these machines are doing.

The question then is why do some PGOs fail and others succeed? When we generate the optimization profile (by launching Firefox and running sunspider) do we sometimes exercise more or less code paths? In particular in the history code, given the frequency of nsAnnotationService.cpp in the error messages. The app is open for a little over 2 minutes.

There are some differences in what PGO says it's going to do right after starting, eg in a log where it fails
 2535 of 104349 (  2.43%) profiled functions will be compiled for speed
vs succeeding log (both without /3G)
 3674 of 103933 (  3.53%) profiled functions will be compiled for speed
I only looked at these two logs so far, so this might not be typical. When it did succeed there's also
 103933 of 106728 functions (97.4%) were optimized using profile data
 1770914794 of 1844595302 instructions (96.0%) were optimized using profile data
 Finished generating code
afterward, not clear how that relates to the original 3.53%.

It's late now so I'll have to pick this up again tomorrow.
Comment 123 Arisu 2010-02-15 07:11:45 PST
Just wanted to confirm that this crash is really caused by PGO.
I went through every build log that was posted here so far, which gave the
following statistics on where and why the compiler crashed:

     5x 0x10C9CFA1 => c2.dll!PogoReadSimpleProbes()+0xF1
        which dereferenced a pointer by pgodb80.dll!PogoDbReadSimpleProbeEx()
    25x 0x10CBB2DA => c2.dll!PogoValReadValueCountsEx()+0x7A,
        which dereferenced a pointer by pgodb80.dll!PogoDbReadValueProbeEx()
    11x 0x10CBB356 => c2.dll!PogoValReadValueCountsEx()+0xF6
        which dereferenced a pointer by pgodb80.dll!PogoDbReadValueProbeData()
    13x Out of heap space

Said pointer often points to memory that is obviously not committed at all and
sometimes even impossible to allocate such as NULL, -1 or in kernel space.
I can't tell whether this is due to unchecked allocation failures / memory
exhaustion or plain flawed program logic.

However, other times the pointer seems to be plausible, so considering that the
process's VM is very populated, I'm almost sure that the compiler often
successfully reads some totally unrelated value from memory because the it
happended to be allocated.

This would ultimately question the meaningfulness of the decisions PGO makes.
I hope raising the usermode VM to 3GB proves to be a reliable workaround, but
if it doesn't, we probably should do non-PGO builds until MSFT can fix this.

Debugging data can be collected by setting the LINK_REPRO to an empty folder:
    http://support.microsoft.com/kb/134650

or getting a recent version of the Windows Debugging Tools:
    http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

and then forcing link.exe to be run under the NTSD debugger to catch the crash:
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\
    Image File Execution Options\link.exe]
        "Debugger" = "ntsd.exe -g -G"

This essentially prepends every command line intending to execute link.exe
with the given NTSD command line, with the "-g" and "-G" switches specifying
that the program's startup and termination breakpoints should be ignored.

Everything should be able to build without intervention until something crashes
which then breaks into NTSD. A dump of all writeable pages can be done there:
    .dump /mFhutpwd link_crash.dmp
and then quit the debugger with
    q

But either way, since the crashes caused by compiling Tracemonkey occured at
the same offsets, these are probably equal or related and MSFT responded that
they can't fix them even in the upcoming VS2010 release:
    https://connect.microsoft.com/VisualStudio/feedback/details/532306

At least, the 3GB VM should be able to fix the out-of-heap errors, so I'm
curious how this all will turn out. Hopefully with a cured, healthy tree.
Comment 124 Justin Dolske [:Dolske] 2010-02-15 14:52:11 PST
(In reply to comment #123)

> I'm almost sure that the compiler often
> successfully reads some totally unrelated value from memory because the it
> happended to be allocated.

If that's true, it's a bit... terrifying. I wonder if any of the intermittent test failures could be due to this causing bad PGO builds (and, if so, maybe we should investigate a way to monitor memory usage during the build, so we can hard-fail the build if it starts using enough to do bad things).

> But either way, since the crashes caused by compiling Tracemonkey occured at
> the same offsets, these are probably equal or related and MSFT responded that
> they can't fix them even in the upcoming VS2010 release:
>     https://connect.microsoft.com/VisualStudio/feedback/details/532306

Shaver has MSFT contacts; if the /3GB flag doesn't fix the problem, escalating through him is an option.
Comment 125 Ted Mielczarek [:ted.mielczarek] 2010-02-15 15:44:57 PST
Sounds like Nick has a plan. This isn't anything I can fix, for sure.
Comment 126 Emanuel Hoogeveen [:ehoogeveen] 2010-02-15 21:09:20 PST
(In reply to comment #124)
> If that's true, it's a bit... terrifying. I wonder if any of the intermittent
> test failures could be due to this causing bad PGO builds (and, if so, maybe we
> should investigate a way to monitor memory usage during the build, so we can
> hard-fail the build if it starts using enough to do bad things).
Alternatively, can we sandbox the build process to keep track of exactly what memory it allocates, and return 0 (or some other predefined value) if it tries to access anything it never allocated? I'm not saying that would fix things, or be a realistic long-term solution (especially considering what it would no doubt do to compilation speed ... at least PGO should always be relative to everything else), but it would at least give us some consistency and an idea of where and when things are failing.
Comment 127 Nick Thomas [:nthomas] 2010-02-16 01:09:06 PST
Created attachment 427083 [details] [diff] [review]
[opsi-package-sources] Add /3GB to boot.ini

Confirmed today that /3GB in boot.ini also fixes PGO on an ix slave (using the broken objdir catlee created earlier).

This opsi package updates boot.ini with /3GB. I've tested it on win32-slave21 and mw32-ix-slave01 - it deploys, can be backed out, and deployed again as expected. They had the exactly the same boot.ini so that was helpful.

In order to get some full-build testing I've set up 
* staging-master02 with win32-slave03, 21, and mw32-ix-slave01 (/3GB in use)
* triggered a bunch of builds
 * WINNT 5.2 mozilla-central (build|nightly|leak test)
 * WINNT 5.2 mozilla-1.9.2 (build|unit test|leak test)
* the first builds started at Tue Feb 16 00:56 PST.
Please check the results there, and if they are OK go ahead and deploy this to the production VMs.

(win32-slave04 is also set to pickup the opsi package next time it reboots, it's on sm01).
Comment 128 Steve Shockley 2010-02-16 04:34:38 PST
You may want to be careful with /3gb; in my experience (with mostly server workloads) it can cause hard-to-diagnose problems with the base OS.  With /3gb enabled, if you have more than 4gb physical memory installed you can run out of PTEs, and you can run out of pool memory regardless of total memory.  The article Nick Thomas (comment 122) mentioned is good, http://blogs.technet.com/markrussinovich/archive/2009/03/26/3211216.aspx also has a good description of pool limits and how you can monitor them.
Comment 129 Ted Mielczarek [:ted.mielczarek] 2010-02-16 04:43:04 PST
Thanks for the heads up. These machines only have 2GB installed, AFAIK (although the new hardware slaves have 4GB), and their workload consists entirely of running the build, basically. Running the PGO link phase is the most demanding task any of them will do. I wouldn't rule out hitting weird problems due to this (I've learned that there's no problem so weird that we can't hit it), but I'm optimistic.

I think the better long-term fix would be to switch to a 64-bit OS on the build machines, so the linker gets a full 4GB of address space.
Comment 130 Ben Hearsum (:bhearsum) 2010-02-16 05:08:46 PST
Also note that we're rebooting after every build, which likely reduces the chance of hitting any weirdness.
Comment 131 Ben Hearsum (:bhearsum) 2010-02-16 06:26:47 PST
(In reply to comment #127)
> Created an attachment (id=427083) [details]
> [opsi-package-sources] Add /3GB to boot.ini
> 
> Confirmed today that /3GB in boot.ini also fixes PGO on an ix slave (using the
> broken objdir catlee created earlier).
> 
> This opsi package updates boot.ini with /3GB. I've tested it on win32-slave21
> and mw32-ix-slave01 - it deploys, can be backed out, and deployed again as
> expected. They had the exactly the same boot.ini so that was helpful.
> 
> In order to get some full-build testing I've set up 
> * staging-master02 with win32-slave03, 21, and mw32-ix-slave01 (/3GB in use)
> * triggered a bunch of builds
>  * WINNT 5.2 mozilla-central (build|nightly|leak test)
>  * WINNT 5.2 mozilla-1.9.2 (build|unit test|leak test)
> * the first builds started at Tue Feb 16 00:56 PST.
> Please check the results there, and if they are OK go ahead and deploy this to
> the production VMs.

I just had a look and there's no issues that I can see. I'll land this and get it rolling out.
Comment 132 Ben Hearsum (:bhearsum) 2010-02-16 06:33:26 PST
Comment on attachment 427083 [details] [diff] [review]
[opsi-package-sources] Add /3GB to boot.ini

changeset:   40:402576d3617e

I've set this package to roll out across all of the build farm. They'll pick it up on the next reboot -- expect to see at least a few more failures of this type, though. Not all of the slaves will be rebooting right away.
Comment 133 Ben Hearsum (:bhearsum) 2010-02-16 06:35:11 PST
(In reply to comment #132)
> (From update of attachment 427083 [details] [diff] [review])
> changeset:   40:402576d3617e

For posterity, I accidentally landed this as Lukas, rather than Nick. Sorry about that, both of you!
Comment 134 Nick Thomas [:nthomas] 2010-02-17 01:30:27 PST
win32-slave01 thru 59 are all have updated boot.ini. We'll have to update the try slaves too if we enable PGO there.
Comment 135 Ted Mielczarek [:ted.mielczarek] 2010-02-17 04:16:08 PST
Nick, Ben, thanks for tackling this and getting it fixed!
Comment 136 Ben Hearsum (:bhearsum) 2010-02-17 05:11:32 PST
(In reply to comment #134)
> win32-slave01 thru 59 are all have updated boot.ini. We'll have to update the
> try slaves too if we enable PGO there.

Thanks for updating this bug, I forgot to. I actually went ahead and rebooted all the try slaves while deploying this, so we're set to go there, too.
Comment 137 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-21 09:05:02 PDT
In bug 565402 I was looking to see if we need this change for 64-bit build machines.

From my reading of:
http://technet.microsoft.com/en-us/library/cc786709%28WS.10%29.aspx

It seems that we don't need to do the 4GT tuning (aka /3GB switch) as done for the Win2k3 32-bit machines [1].

The only thing that I noticed that might be needed is to set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag [2].

[1]:
> 4-gigabyte tuning (4GT), also known as application memory tuning, or the /3GB
> switch, is a technology (only applicable to 32 bit systems) that alters the
> amount of virtual address space available to user mode applications. Enabling
> this technology reduces the overall size of the system virtual address space
> and therefore system resource maximums.

[2]:
> 2 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE cleared (default)
> 4 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE set
Comment 138 Ted Mielczarek [:ted.mielczarek] 2011-03-21 11:38:15 PDT
IMAGE_FILE_LARGE_ADDRESS_AWARE is a flag that gets set on executable files. The compiler and linker already have this set.

Note You need to log in before you can comment on or make changes to this bug.