Closed Bug 893078 Opened 11 years ago Closed 7 years ago

Linux (PGO-only?) build time regression in mid-may

Categories

(Firefox Build System :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: benjamin, Unassigned)

Details

http://people.mozilla.org/~catlee/sattap/e788bd53.png
http://people.mozilla.org/~catlee/sattap/251d0874.png

Windows PGO got better, and Linux got worse at about the same time. The Windows improvement was expected from bug 871712, but that bug was not expected to affect other platforms.

I didn't realize we were even using PGO on Linux. Does it actually provide any benefits?
Dual linking libxul (bug 844288) landed on June 29. So, uh, that's not it.
(In reply to Gregory Szorc [:gps] from comment #1)
> Dual linking libxul (bug 844288) landed on June 29. So, uh, that's not it.

I meant June 13.
I downloaded the build logs in May for Linux64 Nightlies and grepped for the "=== Started" line for the "compile" buildbot job and got the following data:

May 1: 1 hrs, 33 mins, 49 secs (bm65)
May 15: 1 hrs, 12 mins, 6 secs (bm57)
May 17: 1 hrs, 35 mins, 42 secs (bm63)
May 18: 2 hrs, 23 mins, 33 secs (bm63)
May 19: 2 hrs, 19 mins, 21 secs (bm61)
May 22: 2 hrs, 18 mins, 17 secs (bm63)
May 31: 2 hrs, 7 mins, 8 secs (bm57)

It's pretty clear there was a regression for the May 18 Nightly.

Regression range is ea767da526ff..6e2789a70f6b.
The gcc 4.7 switch was mentioned offhand on IRC, that is totally plausible.
FWIW, having tested GCC 4.4, 4.6, and 4.7 on my machine, ISTR that 4.6/4.7 were about 10-15% slower for a clobber build (warm disk cache) than 4.4 (don't have access to that machine atm).  You can claw some of that back with tweaking debug options, but I don't think 33% slow is explainable by a compiler switch alone.
The new GCC build isn't built in debug mode, is it?
(In reply to Gregory Szorc [:gps] from comment #8)
> The new GCC build isn't built in debug mode, is it?

Or without --enable-checking=release?
Perhaps GCC 4.7 uses more memory and something (likely linking or PGO) is swapping? Another reason why we need resource monitoring everywhere...
I'm not surprised gcc 4.7 would take more time. It's generally slower than previous versions, and PGO might add some extra layers of optimizations that take time. (I suspect the profile-use pass is taking much more time than the profile-generate pass)

Unfortunately, ther's not much we can do here, besides not doing PGO, but PGO does a great deal of a performance difference.

Now, as every time dropping PGO on Linux has been mentioned, I'd like to point out that we don't know how many Linux users overall are using our builds compared to distro builds. If it's a significant proportion, it's hard to choose to drop PGO, but if it's not, then it's another story. But we don't know.
(And I don't think 4.8 got any better at compile speed)
Agreed that the compiler switch is the most likely explanation here. RESO FIXED?
catlee, does this only affect the Linux PGO times, or were the normal Linux times also affected?
Flags: needinfo?(catlee)
pretty sure it's just PGO. Kind of hard to make out, but the regular and debug builds have a much more gradual slope here:
http://people.mozilla.org/~catlee/sattap/3c4e1375.png
Flags: needinfo?(catlee)
(In reply to Mike Hommey [:glandium] from comment #12)
> I'm not surprised gcc 4.7 would take more time. It's generally slower than
> previous versions, and PGO might add some extra layers of optimizations that
> take time. (I suspect the profile-use pass is taking much more time than the
> profile-generate pass)
> 
> Unfortunately, ther's not much we can do here, besides not doing PGO, but
> PGO does a great deal of a performance difference.
> 
> Now, as every time dropping PGO on Linux has been mentioned, I'd like to
> point out that we don't know how many Linux users overall are using our
> builds compared to distro builds. If it's a significant proportion, it's
> hard to choose to drop PGO, but if it's not, then it's another story. But we
> don't know.

akeybl: do you have any visibility into how many users we have using Mozilla-generated linux builds (explicitly, not users who are using distro-generated linux builds)? 

(For context, we generate debug, opt and PGO builds. PGO builds run ~1hour longer since the recent gcc compiler upgrade. We're investigating some machine monitoring/ram questions, in case that will help, but otherwise, it might come down to "if enough people use Mozilla-generated-builds, keep generating PGO builds and just eat the cost+infra load. However, if not enough people use our PGO builds, maybe not worth the cost"...)
Flags: needinfo?(akeybl)
Assuming I didn't screw something up, reverting to gcc 4.5 is inconclusive:

https://tbpl.mozilla.org/?tree=Try&rev=fdb6daf0a0f6

Builds took nearly three hours (!), which seems excessive according to the times in comment 3.
Not really actionable at this point. On the plus side, we have much better data for build times nowadays, so we catch these things much more reliably! (See bug 1339673 for an example.)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(akeybl)
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.