Bug 709480 (PGOSilverBullet)

Switch to using 64 bit builders for 32 bit windows builds, to avoid 3GB virtual address space limit

RESOLVED FIXED

Status

Release Engineering
General
P1
critical
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: emorley, Assigned: aki)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [opsi][buildslaves][Leave open after merge])

Attachments

(8 attachments, 2 obsolete attachments)

813 bytes, patch
ted
: review+
Details | Diff | Splinter Review
11.38 KB, patch
catlee
: review+
Details | Diff | Splinter Review
11.73 KB, patch
Details | Diff | Splinter Review
3.48 KB, patch
Details | Diff | Splinter Review
12.06 KB, patch
Details | Diff | Splinter Review
7.01 KB, patch
aki
: review-
Details | Diff | Splinter Review
1.82 KB, patch
aki
: review+
Details | Diff | Splinter Review
2.93 KB, patch
catlee
: review+
Details | Diff | Splinter Review
(Reporter)

Description

6 years ago
Bug 709193 discovered that the recent Win PGO build failures were due to hitting the virtual address space limit whilst linking. This happened before (bug 543034), and the fix then was to use the /3GB switch, which upped the limit from 2GB to 3GB. However we've now reached the point where this is still not enough.

Short term, bug 709193 is backing out some recent larger landings, so the trees can at least reopen, but people will not be able to add anything of significant size to libxul, which is going to limit development somewhat.

As such, whilst the transition is likely non-trivial (sorry releng!), we need to fairly quickly start using 64 bit builders for 32 bit windows builds, so the linker can use the full 4GB virtual address space.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from bug 709193 comment #8)
> We're already operating the PGO builds in a configuration that gives them 3
> GB of virtual address space.  If this is what's going wrong our only options
> are to split libxul apart or to move to doing 32 builds on 64 bit windows
> builders.

(2011-12-10 16:11) khuey: I think the best plan going forward is to turn off enough stuff to get us past the compiler upgrade
(2011-12-10 16:11) khuey: and then start doing builds with msvc2010 on 64 bit machines
Severity: critical → blocker

Comment 1

6 years ago
If I understood the way the linker works correctly (from Ms documentation), it enumerates all functions and connectivity between them to come to as optimal data transfer between functions as possible. This also means that the larger the xul lib gets, memory use will grow not linearly, but exponentially; if you're going to add a lot more features to the browser (which is the plan with graphite and SPDY, to name a couple) then this solution is still only temporary as each function added adds to the complexity and to the number of interrelations the linker runs through.

I've personally run into the address space limit on release code when using more aggressive optimization flags for a while already and have been using selective throttling back of the optimizations and GL- exclusions from PGO for less critical parts to work around the issue.

I also noticed when building on 64-bit platforms that (actually quite logically) the linker uses more memory than when using the same linker on 32-bit platforms (in the order of 300MB more in my runs), so I guess the memory gain will be partially mitigated by that too.

I think the only long-term solution would be, as already remarked, to split up the xul lib into at least a few larger, but more manageable chunks.
(In reply to Mark Straver from comment #1)
> This also means that
> the larger the xul lib gets, memory use will grow not linearly, but
> exponentially

Growth may be super-linear but the last 1GB increase kept us going for nearly two years, so we're not facing true exponential growth.

> I also noticed when building on 64-bit platforms that (actually quite
> logically) the linker uses more memory than when using the same linker on
> 32-bit platforms (in the order of 300MB more in my runs), so I guess the
> memory gain will be partially mitigated by that too.

This bug is not about using the 64-bit linker. This bug is about using the 32-bit linker on Win64. That shouldn't increase the linker's memory usage.

> I think the only long-term solution would be, as already remarked, to split
> up the xul lib into at least a few larger, but more manageable chunks.

Using Win64 and VS2010 should buy us a considerable amount of time. Maybe long enough for something else to happen: a) Microsoft reducing memory usage again in another VS release b) Microsoft releasing a 64-bit PGO linker that can build 32-bit binaries c) focus moving to 64bit Firefox builds on Windows. Anyway that should be discussed in bug 709193. The need for this bug is clear.

Comment 3

6 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #2)
> Growth may be super-linear but the last 1GB increase kept us going for
> nearly two years, so we're not facing true exponential growth.

I agree it's not really full-factor exponential, but there's definitely a clearly larger factor involved that does increase with additions.

> This bug is not about using the 64-bit linker. This bug is about using the
> 32-bit linker on Win64. That shouldn't increase the linker's memory usage.

Ah fair enough - I misunderstood then. So that would give the 32-bit linker the max LAA addressable space.

> The need for this bug is clear.
Absolutely. I wasn't suggesting otherwise, just pondering how much of a permanent solution it would be. I guess we'll see how quickly 4GB fills up over time.
Created attachment 580799 [details] [diff] [review]
Possible mozconfig to test win32 compile on win64 try

A couple of comments to get us moving along (while wearing a tarproof suit)

* The Win64 slaves have VS2008 and VS2010 on them [1], so we may have a choice on which compiler we want. They have 8GB of RAM. Armen set these machines up if there are more questsions

* developers may be able to get a head start on buildbot changes (and possible compile issues) by using the try server,
 * build just win64: 'try: -b o -p win64 -u none -t none'
 * this will use browser/config/mozconfigs/win64/nightly for the mozconfig
 * the build target will need fixing
 * you'll need to hack the PATH/LIBPATH/INCLUDE etc to point to the right compiler etc. [2] shows how to do this for a win32 slave with 2010 installed but which defaults to VS2005. The install paths on the win64 slaves are different - we did custom installs to C:\tools\msvs9 and C:\tools\msvc10. The sdks for DirectX10 and platform are in c:\tools\{dx10,v7.0} respectively. 
 * I'm attaching an untested attempt to get that going, which is pushed to try 
   as https://tbpl.mozilla.org/?tree=Try&rev=3c25364292f7
 * here is the baton, I'm passing it to you :-)

[1] https://wiki.mozilla.org/ReferencePlatforms/Win64
[2] http://hg.mozilla.org/mozilla-central/rev/13590cb94eab
Note that we don't currently have VS2005 on the 64-bit machines
Can we start the process of getting it on the 64-bit machines? Should we file a bug for that?

Comment 7

6 years ago
Ehsan is/was doing some exploratory work on one of our Win64 slaves (Bug 709898). How's that panning out?
OS: Windows Server 2003 → Windows Server 2008
Priority: -- → P1
Whiteboard: [opsi][buildslaves]
I've also spent some time looking at building on win64 with our vs2005. Short answer is that it works. Longer answer is that it's going to be painful to deploy.

Waiting to Ehsan to see if we can switch to VS2010 soon, and drop VS2005.
Severity: blocker → major

Comment 9

6 years ago
(In reply to Chris AtLee [:catlee] from comment #8)
> I've also spent some time looking at building on win64 with our vs2005.
> Short answer is that it works. Longer answer is that it's going to be
> painful to deploy.

Yes, we have no deployment solution on Win64 at present. We'd need to update the ref image and re-image the Win64 pool.
Priority: P1 → P3

Comment 10

6 years ago
Just a FYI: using the proposed mozconfig on my 64-bit build environment resulted in unstable binaries that would crash on start due to memory use spiking upon load (specifically for urlclassifier3.sqlite).
Possibly because some of it refers to .NET 2, while VS10 uses .NET 4?

Setting the paths manually might not even be needed - looks like start-msvc already sets all the proper paths for the 32-bit compiler and running the tools in WOW64. It all checked out for me, anyway. just specifying the target and host (to be sure) should be enough.
If you run from start-msvcX.bat (not the -x64 variant), then everything should work just fine. You don't even have to specify --target/--host, since it all looks like x86. I've built in this configuration for a long time.

However, the build slaves are trickier, since they need to run buildbot in some environment, and currently they're running inside a start-msvcX-x64.bat environment. If we want them to also be able to produce x86 builds we need to be able to fiddle the environment appropriately.

Comment 12

6 years ago
Wouldn't it be a lot simpler to create a 32-bit buildbot process using start-msvcX.bat instead of manually trying to adjust paths in -x64.bat? After all, that already works and has a proper setup for building 32-bit binaries.

As noted, the .mozconfig proposed doesn't seem to work very well, at least not for building -release with VC10, in my experience.

Updated

6 years ago
Duplicate of this bug: 674348

Comment 14

6 years ago
I am going to add a little bit of status to know what needs to be done/checked:
* verify that VS2010 from win32 builders is the same as on win64 slaves (hotfixes, SP)
* how close are we for this to become a hot potato again?

To answer the second we can look at the following:
* linker max virtual size: 3027337216 (from a log of Jan. 24th) [1]
* linker max virtual size: 2921103360 (from a log of Feb. 27th)[2]

It seems to be a 4% change in a month.

[1]
http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-pgo/1327456802/
[2]
https://tbpl.mozilla.org/php/getParsedLog.php?id=9664050&tree=Mozilla-Inbound&full=1
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #14)
> I am going to add a little bit of status to know what needs to be
> done/checked:
> * verify that VS2010 from win32 builders is the same as on win64 slaves
> (hotfixes, SP)
> * how close are we for this to become a hot potato again?
> 
> To answer the second we can look at the following:
> * linker max virtual size: 3027337216 (from a log of Jan. 24th) [1]
> * linker max virtual size: 2921103360 (from a log of Feb. 27th)[2]
> 
> It seems to be a 4% change in a month.

We switched compilers in that interval, so that delta is meaningless.
Duplicate of this bug: 583628
https://tbpl.mozilla.org/php/getParsedLog.php?id=11345300&tree=Mozilla-Inbound - "linker max virtual size: 3021185024"

https://tbpl.mozilla.org/php/getParsedLog.php?id=11348221&tree=Mozilla-Inbound - "fatal error C1002: compiler is out of heap space in pass 2"
(Reporter)

Comment 18

6 years ago
We've just had another on inbound:

ac1504ff8740
https://tbpl.mozilla.org/php/getParsedLog.php?id=11354573&tree=Mozilla-Inbound
"e:\builds\moz2_slave\m-in-w32-pgo\build\parser\html\nshtml5attributename.cpp(1977) : fatal error C1002: compiler is out of heap space in pass 2
LINK : fatal error LNK1257: code generation failed"

This is looking bad :-(
Severity: major → critical
Priority: P3 → P1
(Reporter)

Updated

6 years ago
Blocks: 750661

Comment 19

6 years ago
Did anyone figure out if VS2010 was now possible?
Yes, see comment #8

Comment 21

6 years ago
I was actually trying to figure out if there was anything new since comment #8.
(Reporter)

Comment 22

6 years ago
> Did anyone figure out if VS2010 was now possible?

VS2010 is already being used for 32bit builds (since the fx13 cycle, bug 563318).

Or did you mean the first bullet in comment 14?

Comment 23

6 years ago
Yeah. If what we have on the win64 slaves is good enough.

Updated

6 years ago
Alias: PGOSilverBullet

Updated

6 years ago
Depends on: 751256

Updated

6 years ago
Depends on: 751611

Updated

6 years ago
Depends on: 751694

Updated

6 years ago
Depends on: 751695

Updated

6 years ago
Depends on: 753132
(Assignee)

Comment 24

6 years ago
Created attachment 626659 [details] [diff] [review]
[needs testing] switch all win32 builds to win64 slaves

This patch switches all win32 builds to win64 slaves, with the exception of aurora,beta,release,esr10.

One side effect of this is a regression of bug 642243, since we don't have binscope installed on the win64 slaves.

I'm currently running some builds in staging to test this.
Attachment #580799 - Attachment is obsolete: true
(Assignee)

Comment 25

6 years ago
win32 m-c xulrunner on win64 error:

checking for custom <stdint.h> implementation... none specified
*** Fix above errors and then restart with               "make -f client.mk build"
make[2]: Leaving directory `/e/builds/moz2_slave/m-cen-w32-xr/build'
make[1]: Leaving directory `/e/builds/moz2_slave/m-cen-w32-xr/build'
configure: error: You are targeting i386 but using the 64-bit compiler.
make[2]: *** [configure] Error 1
make[1]: *** [obj-firefox/Makefile] Error 2
make: *** [build] Error 2
The environment there is wrong, it's not picking up the 32-bit toolchain.
(Assignee)

Comment 27

6 years ago
Should http://hg.mozilla.org/mozilla-central/file/2552b0541f87/xulrunner/config/mozconfigs/win32/xulrunner point to http://hg.mozilla.org/mozilla-central/file/2552b0541f87/build/win32/mozconfig.vs2010-win64 like http://hg.mozilla.org/mozilla-central/file/2552b0541f87/browser/config/mozconfigs/win32/nightly does ?
Yes, that's what you want.
(Assignee)

Comment 29

6 years ago
Created attachment 627292 [details] [diff] [review]
xulrunner mozconfig changes, like bug 751695

I was able to build xulrunner 32 on a win64 staging slave with this patch.
Attachment #627292 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

6 years ago
Depends on: 758701
(Assignee)

Comment 30

6 years ago
Created attachment 627384 [details] [diff] [review]
switch all win32 builds to win64 slaves

This, along with the xulrunner mozconfig changes and the binscope install from bug 758701, results in working builds for:

* m-c win32 dep build
* m-c win32 debug build
* m-c win32 xulrunner

I'm trying a nightly and associated l10n repacks now.

There's a question of timing here.  Also, let me know if you want me to remove the special-cased pgo platform code that landed in bug 753132.
Assignee: nobody → aki
Attachment #626659 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #627384 - Flags: review?(catlee)
(Assignee)

Comment 31

6 years ago
Comment on attachment 627384 [details] [diff] [review]
switch all win32 builds to win64 slaves

New patch coming to switch w32 l10n to w64 slaves as well.
Attachment #627384 - Attachment is obsolete: true
Attachment #627384 - Flags: review?(catlee)
(Assignee)

Comment 32

6 years ago
Created attachment 627535 [details] [diff] [review]
switch all win32 builds + l10n to win64 slaves

This switches all win32 builds + l10n repacks to win64 slaves.  However, the l10n repacks die since they don't use a mozconfig.  We can enable win32-l10n-on-win64 when we either change the buildbot factory to use a mozconfig or switch over to mozharness l10n (with mozconfig).

I'm going to un-obsolete the previous patch and ask for review again.
(Assignee)

Updated

6 years ago
Attachment #627384 - Attachment is obsolete: false
Attachment #627384 - Flags: review?(catlee)
Comment on attachment 627384 [details] [diff] [review]
switch all win32 builds to win64 slaves

Review of attachment 627384 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. yes, we can remove the 'pgo_platform' support as well once this lands.
Attachment #627384 - Flags: review?(catlee) → review+
Comment on attachment 627292 [details] [diff] [review]
xulrunner mozconfig changes, like bug 751695

Review of attachment 627292 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the wait, should have just r+ed this since it's so simple.
Attachment #627292 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Comment 35

6 years ago
Comment on attachment 627292 [details] [diff] [review]
xulrunner mozconfig changes, like bug 751695

https://hg.mozilla.org/integration/mozilla-inbound/rev/be3fe9b7f924
Attachment #627292 - Flags: checked-in+
(Assignee)

Updated

6 years ago
Whiteboard: [opsi][buildslaves] → [opsi][buildslaves][Leave open after merge]
(Assignee)

Comment 36

6 years ago
Comment on attachment 627384 [details] [diff] [review]
switch all win32 builds to win64 slaves

http://hg.mozilla.org/build/buildbot-configs/rev/980c8831b800
Attachment #627384 - Flags: checked-in+
(Assignee)

Comment 37

6 years ago
Created attachment 628185 [details] [diff] [review]
backout bug 753132: configs

Waiting to land this until we decide whether I jumped the gun in landing attachment 627384 [details] [diff] [review].
(Assignee)

Comment 38

6 years ago
Created attachment 628186 [details] [diff] [review]
backout bug 753132: custom
(Assignee)

Updated

6 years ago
Depends on: 758275
(Assignee)

Comment 39

6 years ago
Comment on attachment 628185 [details] [diff] [review]
backout bug 753132: configs

Checked in (backed out fefa4a79e1e6)
http://hg.mozilla.org/build/buildbot-configs/rev/d1f13fb7b2e2
Attachment #628185 - Flags: checked-in+
(Assignee)

Comment 40

6 years ago
Comment on attachment 628186 [details] [diff] [review]
backout bug 753132: custom

Pushed:
http://hg.mozilla.org/build/buildbotcustom/rev/f76c6eaee5f5 (three changesets).
Again, backing out bug 753132.
Attachment #628186 - Flags: checked-in+
(Assignee)

Comment 41

6 years ago
Comment on attachment 627384 [details] [diff] [review]
switch all win32 builds to win64 slaves

That should be http://hg.mozilla.org/build/buildbot-configs/rev/a9472951c669 instead.  (Landed the +l10n patch, then backed it out and landed this one. Sorry for the bugspam everyone.)
(Reporter)

Comment 42

6 years ago
(In reply to Aki Sasaki [:aki] from comment #35)
> Comment on attachment 627292 [details] [diff] [review]
> xulrunner mozconfig changes, like bug 751695
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/be3fe9b7f924

https://hg.mozilla.org/mozilla-central/rev/be3fe9b7f924
Created attachment 628477 [details] [diff] [review]
Remove w32-ix slaves, add new w64 slaves

New slaves will be w64-ix-slave[43-84], with 1 allocated to preprod, 20 to try, and 21 to build.
Attachment #628477 - Flags: review?(aki)
(Assignee)

Comment 44

6 years ago
Comment on attachment 628477 [details] [diff] [review]
Remove w32-ix slaves, add new w64 slaves

|trial test/test_slave_allocation.py| (in buildbot-configs/mozilla) gives me:

twisted.trial.unittest.FailTest: Staging and preproduction slaves should be the same

Other than that I approve of this patch, however.
Attachment #628477 - Flags: review?(aki) → review-
(In reply to Aki Sasaki [:aki] from comment #44)
> twisted.trial.unittest.FailTest: Staging and preproduction slaves should be
> the same

Sigh, why do we even have separate slave lists for them then? I'll fix it up.
Comment on attachment 628477 [details] [diff] [review]
Remove w32-ix slaves, add new w64 slaves

Staging and preproduction lists are identical now.

https://hg.mozilla.org/build/buildbot-configs/rev/7d715a038f22
Attachment #628477 - Flags: checked-in+
(In reply to Chris Cooper [:coop] from comment #46)
> https://hg.mozilla.org/build/buildbot-configs/rev/7d715a038f22

Masters were reconfiged with this change tonight, so the new w64 slaves should come online as they are re-imaged. Only a handful have been re-imaged yet (7 of 42).
Depends on: 760117
(Assignee)

Comment 48

6 years ago
I think the code portions here are all done.
bug 758275 tracks imaging w32 boxes to w64.
bug 760117 tracks the Thunderbird mozconfigs.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Comment 49

6 years ago
Created attachment 629190 [details] [diff] [review]
move try win32 to win64 slaves

The upside here is Try stays close to m-c, and we take advantage of the larger win64 build pool we're creating.
The downside is it becomes harder (again) to build aurora/beta/release/esr10 pushes.
Attachment #629190 - Flags: review?(coop)
(Assignee)

Comment 50

6 years ago
Comment on attachment 629190 [details] [diff] [review]
move try win32 to win64 slaves

r+'ed by bhearsum on irc.
http://hg.mozilla.org/build/buildbot-configs/rev/6b615be4dee9
Attachment #629190 - Flags: review?(coop) → review+
We could branch-land bug 751611 and bug 751695, they're pretty simple. Alternately, we could just document somewhere that people can put those patches in their patch queue if pushing an older branch to try.
(In reply to Aki Sasaki [:aki] from comment #48)
> I think the code portions here are all done.
> bug 760117 tracks the Thunderbird mozconfigs.

The mozconfigs are now landed. Note that we haven't tested any Thunderbird 32 builds on Win 64 though. So we'll want to take that into account for whatever switch over plan we have.

Comment 53

6 years ago
Created attachment 629425 [details] [diff] [review]
update PDBSTR_PATH for Try builders

bustage fix:

[21:20]  <catlee-away> try win32 builds are failing because of environment problems
[21:21]  <catlee-away> e.g. http://buildbot-master14.build.scl1.mozilla.com:8101/builders/WINNT%205.2%20try%20build/builds/6664/steps/make_buildsymbols/logs/stdio
[21:22]  <catlee-away>   PDBSTR_PATH=/c/Program Files/Debugging Tools for Windows/srcsrv/pdbstr.exe
[21:22]  <catlee-away> is wrong for win64 machines
[21:30]  <catlee-away> hm
[21:30]  <catlee-away> maybe that master didn't get a reconfig?
[21:30]  <bear-afk> let me log in and see what value local is
[21:32]  <bear-afk> [cltbld@buildbot-master14 master]$ grep PDBSTR config.py
[21:32]  <bear-afk>                 'PDBSTR_PATH': '/c/Program Files/Debugging Tools for Windows (x64)/srcsrv/pdbstr.exe',
[21:32]  <bear-afk>                 "PDBSTR_PATH": '/c/Program Files/Debugging Tools for Windows (x64)/srcsrv/pdbstr.exe',
[21:32]  <bear-afk>                 'PDBSTR_PATH': '/c/Program Files/Debugging Tools for Windows (x64)/srcsrv/pdbstr.exe',
[21:32]  <bear-afk>     'PDBSTR_PATH': '/c/Program Files/Debugging Tools for Windows/srcsrv/pdbstr.exe',
[21:32]  <bear-afk> looks like it needs a reconfig
[21:34]  <catlee-away> it's not getting copied to try
[21:34]  <catlee-away> not sure why
[21:34]  <bear-afk> catlee-away - any reason to not just trigger a reconfig ?
[21:41]  <catlee-away> yes - we need a patch
[21:41]  <bear-afk> if you do the patch, i'll babysit the reconfig
[21:42]  <catlee-away> http://hg.mozilla.org/build/buildbot-configs/file/34350bf0f602/mozilla/production_config.py#l154
[21:42]  <catlee-away> needs updating
Attachment #629425 - Flags: review?(catlee)

Updated

6 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

6 years ago
Attachment #629425 - Flags: review?(catlee) → review+

Comment 54

6 years ago
Comment on attachment 629425 [details] [diff] [review]
update PDBSTR_PATH for Try builders

[production] moz:buildbot-configs bear$ hg out
comparing with ssh://hg.mozilla.org/build/buildbot-configs
searching for changes
changeset:   6195:a634a34a5915
parent:      6192:9a156e523839
user:        Mike Taylor <bear@mozilla.com>
date:        Fri Jun 01 22:00:10 2012 -0400
summary:     bug 709480 update PDBSTR_PATH for win64 try builders r=catlee

changeset:   6196:fb8813bc5f90
branch:      production
tag:         tip
parent:      6194:34350bf0f602
user:        Mike Taylor <bear@mozilla.com>
date:        Fri Jun 01 22:02:03 2012 -0400
summary:     bug 709480 update PDBSTR_PATH for win64 try builders r=catlee

[production] moz:buildbot-configs bear$ hg push
pushing to ssh://hg.mozilla.org/build/buildbot-configs
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 2 changesets with 3 changes to 3 files
remote: Trying to insert into pushlog.
remote: Please do not interrupt...
remote: Inserted into the pushlog db successfully.
Attachment #629425 - Flags: checked-in+

Comment 55

6 years ago
try masters reconfigured
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Blocks: 761646
No longer depends on: 758275
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.