Closed Bug 913251 Opened 11 years ago Closed 11 years ago

Permanent Windows PGO linker failure on UX: build\layout\base\nsdisplaylist.cpp(962) : fatal error C1001: An internal error has occurred in the compiler. LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage

Categories

(Firefox Build System :: General, defect)

x86
Windows Server 2003
defect
Not set
critical

Tracking

(firefox24 unaffected, firefox25 unaffected, firefox26 fixed)

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 --- unaffected
firefox26 --- fixed

People

(Reporter: MattN, Assigned: Gijs)

References

(Blocks 1 open bug, )

Details

(Keywords: intermittent-failure, Whiteboard: [Australis:M9][Australis:P1][pgo])

PGO builds (including Nightlies) started failing on the UX branch due to a linker error:
e:\builds\moz2_slave\ux-w32-pgo-0000000000000000000\build\layout\base\nsdisplaylist.cpp(952) : fatal error C1001: An internal error has occurred in the compiler.
LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage 

linker max vsize: 3216949248

LibXUL Memory during link graph: http://graphs.mozilla.org/graph.html#tests=[[205,63,8],[205,59,8]]&sel=1370646167970,1378422167970&displayrange=90&datatype=running

We have had a much higher vsize before so I'm not sure why we would be hitting the address space limit now.

Pushlog: https://hg.mozilla.org/projects/ux/pushloghtml?startID=384&endID=387
From the thinking outside the box direction.  I t seems to me that continuing to try to make this work si a losing proposition.  All new Windows systems I have seen for a few years are 64-bit systems coming with a 64-bit operating system.  Therefore it would seem that the direction we should be taking is to do what is necessary to make 64-bit Windows build production ready and the reference platform and do PGO builds for Windows on 64-bit only.  We should still continue to produce non-PGO builds targeting 32-bit users.  Just my opinion, but it seems to me trying to keep this working is wasting resources on the wrong thing.
(In reply to Bill Gianopoulos [:WG9s] from comment #1)
> From the thinking outside the box direction.  I t seems to me that
> continuing to try to make this work si a losing proposition.  All new
> Windows systems I have seen for a few years are 64-bit systems coming with a
> 64-bit operating system.  Therefore it would seem that the direction we
> should be taking is to do what is necessary to make 64-bit Windows build
> production ready and the reference platform and do PGO builds for Windows on
> 64-bit only.  We should still continue to produce non-PGO builds targeting
> 32-bit users.  Just my opinion, but it seems to me trying to keep this
> working is wasting resources on the wrong thing.

This isn't the right bug to discuss how/why we do PGO the way we do. That being said, we *must* continue to support 32-bit Windows PGO builds for the foreseeable future.
(In reply to Gregory Szorc [:gps] from comment #2)
> (In reply to Bill Gianopoulos [:WG9s] from comment #1)
> > From the thinking outside the box direction.  I t seems to me that
> > continuing to try to make this work si a losing proposition.  All new
> > Windows systems I have seen for a few years are 64-bit systems coming with a
> > 64-bit operating system.  Therefore it would seem that the direction we
> > should be taking is to do what is necessary to make 64-bit Windows build
> > production ready and the reference platform and do PGO builds for Windows on
> > 64-bit only.  We should still continue to produce non-PGO builds targeting
> > 32-bit users.  Just my opinion, but it seems to me trying to keep this
> > working is wasting resources on the wrong thing.
> 
> This isn't the right bug to discuss how/why we do PGO the way we do. That
> being said, we *must* continue to support 32-bit Windows PGO builds for the
> foreseeable future.

I know that was hoping to get a where should I discuss this issue, and was not saying we should not support win-32 just maybe need to stop doing PGO on win-32.
MY point was people interested in running the latest/fastest are probably on a 64-bit OS by now.
This doesn't look like a linker memory issue, it just looks like an internal compiler error.
(In reply to Matthew N. [:MattN] from comment #6)
> Pushlog: https://hg.mozilla.org/projects/ux/pushloghtml?startID=384&endID=387

There were nightlies on https://tbpl.mozilla.org/?tree=UX&rev=c7e2dcb1a8eb which are clobber + PGO, and completed successfully, so I think this is just: https://tbpl.mozilla.org/?tree=UX&rev=8fae116e85c5 which is https://hg.mozilla.org/projects/ux/pushloghtml?startID=386&endID=387 .

Bisecting a merge cset isn't really my favourite job, but I'll start doing that shortly, as I don't see anything obvious in that list that'd cause this.
(In reply to :Gijs Kruitbosch from comment #6)
> Bisecting a merge cset isn't really my favourite job, but I'll start doing
> that shortly, as I don't see anything obvious in that list that'd cause this.

For https://hg.mozilla.org/projects/ux/rev/dd216d8a6a0a and its 50-odd csets: https://tbpl.mozilla.org/?tree=Try&rev=029f442194b6

For https://hg.mozilla.org/projects/ux/rev/47baeadc005b and its 80-odd csets: https://tbpl.mozilla.org/?tree=Try&rev=276f87efdd39

For https://hg.mozilla.org/projects/ux/rev/7c90e8e1b481 and its 120-odd csets: https://tbpl.mozilla.org/?tree=Try&rev=3cd6dca18e4b (with previous m-c merge followed by another m-i merge directly to ux, just for try).

Assuming I haven't screwed up, hopefully this will narrow things down. I have to say, as I was going through the list for these merges, that bug 730805 seems suspicious to me, but that's purely gut feeling...
Line 962 of nsDisplayList.cpp is a return statement in TreatAsOpaque, which returns an (empty) nsRegion. Looking at all the files touched by the csets that were merged, the nsRegion::Init change from bug 912054 jumped out, so I also fired off a try run where I speculatively back that out: https://tbpl.mozilla.org/?tree=Try&rev=33df960b3b8e
(In reply to :Gijs Kruitbosch from comment #7)
> (In reply to :Gijs Kruitbosch from comment #6)
> > Bisecting a merge cset isn't really my favourite job, but I'll start doing
> > that shortly, as I don't see anything obvious in that list that'd cause this.
> 
> For https://hg.mozilla.org/projects/ux/rev/dd216d8a6a0a and its 50-odd
> csets: https://tbpl.mozilla.org/?tree=Try&rev=029f442194b6
> 
> For https://hg.mozilla.org/projects/ux/rev/47baeadc005b and its 80-odd
> csets: https://tbpl.mozilla.org/?tree=Try&rev=276f87efdd39
> 
> For https://hg.mozilla.org/projects/ux/rev/7c90e8e1b481 and its 120-odd
> csets: https://tbpl.mozilla.org/?tree=Try&rev=3cd6dca18e4b (with previous
> m-c merge followed by another m-i merge directly to ux, just for try).
> 
> Assuming I haven't screwed up, hopefully this will narrow things down. I
> have to say, as I was going through the list for these merges, that bug
> 730805 seems suspicious to me, but that's purely gut feeling...

All of these are green. Assuming I didn't screw up my try pushes, and/or there isn't some stupid difference between try env and our UX env, this means the failure was introduced by the last 22-odd commits in the merge:

https://hg.mozilla.org/projects/ux/rev/13dcab876681  Steve Fink — Bug 912171 - Static rooting analysis: ignore ~AutoCompartment. r=NPOTB DONTBUILD                                               
https://hg.mozilla.org/projects/ux/rev/35cfea3fe4c0  Ryan VanderMeulen — Bug 902248 - Disable test_nsINavHistoryViewer.js on OSX.
https://hg.mozilla.org/projects/ux/rev/772246d9b5e8  Dan Minor — Bug 912004 - DeviceManager pushDir should take a timeout parameter; r=wlach
https://hg.mozilla.org/projects/ux/rev/ae9cbeff3bc6  Blake Kaplan — Bug 860123 - Use the right window's selection in some tests; r=ehsan
https://hg.mozilla.org/projects/ux/rev/94b9e5c9f994  Ms2ger — Bug 860123 - Make mochitests expect the new behavior. r=ehsan
https://hg.mozilla.org/projects/ux/rev/38792f737cf9  Blake Kaplan — Bug 860123 - Part 1: Disallow extending or collapsing selections across documents; r=ehsan
https://hg.mozilla.org/projects/ux/rev/df11d75d3271  Kannan Vijayan — Bug 909764 - Fix syntax parsing error that allows let bindings of eval in strict mode. r=jwalden r=jorendorff
https://hg.mozilla.org/projects/ux/rev/d5ee2c910a92  Joel Maher — Bug 912009 - update talos.json to remove "suite" specific filters as they are now in test.py. r=jhammel
https://hg.mozilla.org/projects/ux/rev/2255560d9b6c  Nathan Froyd — Bug 912054 - don't re-initialize mBoundRect in nsRegion::Init; r=BenWa
https://hg.mozilla.org/projects/ux/rev/1eacd8262e0f  Neil Deakin — Bug 896142, Entries appearing twice in form-fill autocomplete menus, r=mhammond
https://hg.mozilla.org/projects/ux/rev/8741f1f153ca  Neil Deakin — Bug 853901 - Add platform-specific completion operations for downloads. r=paolo
https://hg.mozilla.org/projects/ux/rev/1153ba67c760  Mihnea Dobrescu-Balaur — Bug 911347 - Fix xpcshell static file cleanup on Windows. r=ted
https://hg.mozilla.org/projects/ux/rev/e75808f87d67  Mihnea Dobrescu-Balaur — Bug 911249 - Don't block XPCShell test harness on hangs caused by os.kill on Windows. r=ted
https://hg.mozilla.org/projects/ux/rev/8a61a35876ff  Ryan VanderMeulen — Bug 912006 - Update pdf.js to version 0.8.478. r=bdahl
https://hg.mozilla.org/projects/ux/rev/eab3879def7d  Luke Wagner — Bug 911842 - OdinMonkey: disable Ion optimization passes that don't help (r=jandem)
https://hg.mozilla.org/projects/ux/rev/ed2a7239fc36  Luke Wagner — Bug 911834 - OdinMonkey: skip ApplyTypeInformation pass (r=jandem)
https://hg.mozilla.org/projects/ux/rev/45e1927c9136  Luke Wagner — Bug 911834 - OdinMonkey: create phis with a type directly (r=jandem)
https://hg.mozilla.org/projects/ux/rev/26b027841efe  Ryan VanderMeulen — Bug 627487 - Touch CLOBBER.
https://hg.mozilla.org/projects/ux/rev/a484e79df631  Blake Kaplan — Bug 912044 - Initialize rv so we don't use it uninitialized. r=bz
https://hg.mozilla.org/projects/ux/rev/321a502794eb  Eitan Isaacson — Bug 795984 - Implement speech output with Web Speech API. r=yzen
https://hg.mozilla.org/projects/ux/rev/a4fc8aae3053  Raymond Lee — Bug 627487 - Bookmark JSON backup should contain new-style GUIDs. r=mano
https://hg.mozilla.org/projects/ux/rev/192825330b3e  Po-Chun Chang — Bug 908527 - Avoid wasted work in PSM_SSL_BlacklistDigiNotar(). r=cviecoo
(In reply to :Gijs Kruitbosch from comment #8)
> Line 962 of nsDisplayList.cpp is a return statement in TreatAsOpaque, which
> returns an (empty) nsRegion. Looking at all the files touched by the csets
> that were merged, the nsRegion::Init change from bug 912054 jumped out, so I
> also fired off a try run where I speculatively back that out:
> https://tbpl.mozilla.org/?tree=Try&rev=33df960b3b8e

This is green too. It's in the range implied in comment #9, so it seems these results are saying this was the culprit. Can we just back this out of UX initially to see if Try isn't lying and this fixes things, and if so, back out of m-c and reopen the bug so people can figure out why this patch is breaking things? Ryan, asking you with your sheriff hat on! :-)
Blocks: 912054
Flags: needinfo?(ryanvm)
You can backout whatever you want from UX, it's not a sheriffed tree! :)
Flags: needinfo?(ryanvm)
Sanity check PGO try push with current UX tip: https://tbpl.mozilla.org/?tree=Try&rev=52c61061f1e7

(This should go red. If so, we'll back out https://hg.mozilla.org/projects/ux/rev/2255560d9b6c .)
The try push in comment 12 also failed to link (see comment 15) so I'm going to go ahead with backing bug 912054 out once my m-c update is done. Nathan is on PTO today and BenWa was fine with this.

Assigning this to Gijs since he did most of the investigation.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Backed out bug 912054: https://hg.mozilla.org/mozilla-central/rev/62ac395d5f0b
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 26
Component: Build Config → General
Product: Firefox → Firefox Build System
Target Milestone: Firefox 26 → mozilla26
You need to log in before you can comment on or make changes to this bug.