Closed Bug 1093664 Opened 10 years ago Closed 9 years ago

Intermitent Windows 8 Build fail with LINK : fatal error LNK1102: out of memory

Categories

(Firefox Build System :: General, defect)

x86
Windows 8
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure)

WINNT 6.1 x86-64 mozilla-central pgo-build

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=582941&repo=mozilla-central

Finished generating code
LINK : fatal error LNK1102: out of memory 

not sure if there is anything we can do here.
Well that's exciting. I'm guessing this is a bug in the 64-bit compiler?
Blocks: 1084162
Flags: needinfo?(dmajor)
Uh-oh. I was hoping it would be a one-off, but a second one makes me worry.

I don't know of any good solution to this other than bring back MSVC_ENABLE_PGO (possibly used more liberally) for Win64. :-(

Anyone have any better ideas?
Flags: needinfo?(dmajor)
In bug 1093355, sfink is limiting the rooting analysis to -j4 to avoid using too much memory. Could we try that here, or is this a single process gobbling up all the memory on the builders?
It's a single instance of link.exe -- on my machine I've seen it take 5 or 6 GB.
So I guess one question is whether the machines are *actually* running out of memory or whether this is code for "some internal array reached some hardcoded limit".
Hmm it's interesting that in both cases we failed at the PGINSTRUMENT phase. I woulda thought that would be the easier of the two links!
I wonder if this could be useful. I'll try some local builds to see if it brings down peak memory usage.

http://msdn.microsoft.com/en-us/library/dn655038.aspx

The /CGTHREADS option specifies the maximum number of threads cl.exe uses in parallel for the optimization and code-generation phases of compilation when link-time code generation (/LTCG) is specified. By default, cl.exe uses four threads, as if /CGTHREADS:4 were specified. If more processor cores are available, a larger number value can improve build times.
How much physical memory do the win64 builders have? Do they have swap disabled? Something to consider: nowadays, other things can happen while xul.dll is linking. If the linker likes to suck all the memory it can and other things on the side sucks some memory, overall, that could be a problem.
(In reply to David Major [:dmajor] (UTC+13) from comment #8)
> Hmm it's interesting that in both cases we failed at the PGINSTRUMENT phase.
> I woulda thought that would be the easier of the two links!

IME it always was, although I haven't looked in a long time. It was the PGUPDATE link that ate memory/cpu.

(In reply to David Major [:dmajor] (UTC+13) from comment #7)
> So I guess one question is whether the machines are *actually* running out
> of memory or whether this is code for "some internal array reached some
> hardcoded limit".

Yeah, I'm pretty sure we hit bugs like that in the x86 PGO linker before, where it simply overflowed some internal limit and died. I guess the only way to figure this out would be to try to reproduce and send MS a linkrepro?
Mark, how much physical memory and pagefile are on b-2008-ix-0012 and -0013? (And is it the same for all Windows builders?)
Flags: needinfo?(mcornmesser)
The physical memory is 4GB. The paging file is 4048MB. This should be same across all the 2008 machines.
Flags: needinfo?(mcornmesser)
Seems totally possible that we could exhaust 8GB total physical+swap.
Wow! That's surprisingly low for year 2014.
So....we're pretty much hosed here is what everybody is saying?
Though I suppose there's no reason to panic since we aren't permafailing yet.
Well, it sounds like there's no 'no cost' solution here at least. Would it be possible to give the existing machines more RAM, or are they already maxed out?
As a stopgap we should try doubling the pagefile to 8192MB.

But long-term, if we stay at 4GB RAM, we're gonna have a bad time.
Out of curiosity, what would it take for us to upgrade the physical RAM in all of our Windows build slaves? Why do I have this recollection that we actually did something similar once long ago?
Flags: needinfo?(laura)
Of course, this also might get rendered moot if we can switch to building on AWS with MSVC2013 Community Edition. Ted, is there a bug for that?
Flags: needinfo?(ted)
Yes, bug 1121513.
Flags: needinfo?(ted)
Depends on: 1122975
Jordan, is b-2008-ix-0051 on your list?
Flags: needinfo?(laura) → needinfo?(jlund)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #51)
> Jordan, is b-2008-ix-0051 on your list?

0051 is in try pool. I only looked at build pool in https://bugzilla.mozilla.org/show_bug.cgi?id=1122975#c6

try pool audit will be completed in https://bugzil.la/1125870 and, depending on how many show up with only 4gb of RAM, we will also be requesting a RAM bump for those too.

fwiw I just checked 0051 by hand and can confirm it only has 4gb
Flags: needinfo?(jlund)
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.