fix profile-guided optimization (pgo) on linux

RESOLVED FIXED

Status

()

Firefox
Build Config
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: ted, Assigned: (dormant account))

Tracking

(Blocks: 1 bug)

unspecified
x86
Linux
Points:
---
Dependency tree / graph
Bug Flags:
blocking-firefox3.5 -
blocking-firefox3.6 -
wanted-firefox3.5 ?
wanted-firefox3.6 -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [not needed for 1.9.0][ts])

Attachments

(5 attachments, 3 obsolete attachments)

(Reporter)

Description

9 years ago
Patch in a bit.
Flags: blocking1.9?
Depends on: 361343, 418772
(Reporter)

Comment 1

9 years ago
This was actually ok before bug 361343.  There was another older bug where I made the existing PGO support work again.

No longer depends on: 361343
(Reporter)

Comment 2

9 years ago
Created attachment 304838 [details] [diff] [review]
enable pgo on fx-linux-tbox

There you go.
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED

Updated

9 years ago
Attachment #304838 - Flags: review+
Blocks: 418926
(Reporter)

Comment 3

9 years ago
Created attachment 304854 [details] [diff] [review]
with clobbery

So, this is going to be an unpopular change, I think we need to make it clobber as well.
Attachment #304838 - Attachment is obsolete: true
Attachment #304854 - Flags: review?
(Reporter)

Updated

9 years ago
Attachment #304854 - Flags: review? → review?(ccooper)
From offline discussions with Damon, RobSayre, and then Ted:

1) This is being proposed for nightly/clobber builds, as well as hourly/incremental builds as well as release builds.

2) There are similar changes required for Mac. Ted is filing a bug for that.

3) This double-compile process will take extra time to cycle a build; seems like linux would be 2x, win32 would be just under 2x. This will need to be communicated to everyone watching Tinderbox.

4) We are not blocked on bug#418896 or bug#418894.
marking P1 as its needed for beta4.
Priority: -- → P1

Updated

9 years ago
Attachment #304854 - Flags: review?(ccooper) → review+

Comment 6

9 years ago
I had to back this out because of the following error:

/builds/tinderbox/Fx-Trunk/Linux_2.6.18-8.el5_Depend/mozilla/memory/jemalloc/jemalloc.c: In function 'choose_arena':
/builds/tinderbox/Fx-Trunk/Linux_2.6.18-8.el5_Depend/mozilla/memory/jemalloc/jemalloc.c:6043: error: corrupted profile info: edge from 5 to 6 exceeds maximal count
gmake[4]: *** [jemalloc.o] Error 1

Comment 7

9 years ago
Looks like this might be just jemalloc causing problems. We should be able to skip PGO for that.
(Reporter)

Comment 8

9 years ago
Created attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

This should do it.
Attachment #305385 - Flags: review?(benjamin)
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

File a followup? I imagine PGO for jemalloc would useful if simply for branch-prediction/locality optimizations.
Attachment #305385 - Flags: review?(benjamin) → review+
(Reporter)

Comment 10

9 years ago
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

Filed bug 419470, will mention it in a comment.
Attachment #305385 - Flags: approval1.9?
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

a1.9+=damons
Attachment #305385 - Flags: approval1.9? → approval1.9+
(Reporter)

Comment 12

9 years ago
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

sayrer: you should be clear to try this again, but given that we're freezing tomorrow night, it might be rough to do it before wednesday.
Attachment #305385 - Attachment description: skip PGO in jemalloc → skip PGO in jemalloc [checked in]
(Reporter)

Updated

9 years ago
Attachment #304854 - Flags: approval1.9b4?

Updated

9 years ago
Flags: tracking1.9? → blocking1.9?
Do you guys need a clean window to do this, like with Windows?

Comment 14

9 years ago
(In reply to comment #13)
> Do you guys need a clean window to do this, like with Windows?

That would be best, but I don't think it's critical we do it this week. Our best avenue for beta linux users seems to be distro betas. So I'm content to focus on mac and additional Windows profile input for the rest of this week.
Flags: blocking1.9? → blocking1.9+
Priority: P1 → P2
Comment on attachment 304854 [details] [diff] [review]
with clobbery

a=beltzner for after beta 4
Attachment #304854 - Flags: approval1.9b4?
Attachment #304854 - Flags: approval1.9b4-
Attachment #304854 - Flags: approval1.9+
Duplicate of this bug: 384899
(In reply to comment #15)
> (From update of attachment 304854 [details] [diff] [review])
> a=beltzner for after beta 4

Therefore, removing as blocker on bug#418926 (3.0beta4 tracking bug)
No longer blocks: 418926
(Reporter)

Comment 18

9 years ago
Rob: do you want to do this? This has been sort of hanging out here for a while.
Assignee: ted.mielczarek → nobody
Status: ASSIGNED → NEW
Component: Build & Release → Build Config
Flags: blocking1.9+
Product: mozilla.org → Firefox
QA Contact: build → build.config
Version: other → unspecified
Assignee: nobody → ted.mielczarek

Comment 19

9 years ago
Given limited time in the release we are punting pgo on mac and linux - let's pick this up in next release
(Reporter)

Updated

9 years ago
Assignee: ted.mielczarek → nobody
Priority: P2 → --
(Reporter)

Comment 20

9 years ago
Comment on attachment 304854 [details] [diff] [review]
with clobbery

Clearing approval flag to get this off the radar.
Attachment #304854 - Flags: approval1.9+
(Reporter)

Comment 21

9 years ago
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

Clearing approval flag to get this off the radar.
Attachment #305385 - Flags: approval1.9+
(Reporter)

Updated

9 years ago
Attachment #304854 - Flags: approval1.9+
(Reporter)

Updated

9 years ago
Attachment #305385 - Flags: approval1.9+
Whiteboard: [missed 1.9 checkin]
(Reporter)

Comment 22

9 years ago
We may take this in 3.next.
Whiteboard: [missed 1.9 checkin] → [not needed for 1.9]
Flags: blocking-firefox3.1?
(Reporter)

Comment 23

8 years ago
I wouldn't block on it, though.
Flags: blocking-firefox3.1? → blocking-firefox3.1-
Flags: wanted-firefox3.1?
So I just tried compiling with pgo on Linux.  I get:

../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o): In function `global constructors keyed to 65535_0_nsMorkReader.cpp':
/home/bzbarsky/mozilla/debug/mozilla/db/morkreader/nsMorkReader.cpp:579: undefined reference to `__gcov_init'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x24): undefined reference to `__gcov_merge_add'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x30): undefined reference to `__gcov_merge_single'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x3c): undefined reference to `__gcov_merge_single'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x48): undefined reference to `__gcov_merge_add'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x54): undefined reference to `__gcov_merge_ior'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o): In function `global constructors keyed to 65535_0_nsUnicharUtils.cpp':
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/intl/unicharutil/util/internal/nsUnicharUtils.cpp:226: undefined reference to `__gcov_init'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o):(.data.rel+0x24): undefined reference to `__gcov_merge_add'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o):(.data.rel+0x30): undefined reference to `__gcov_merge_single'
collect2: ld returned 1 exit status
gmake[7]: *** [libplaces.so] Error 1
I think Murali was using gcov on Linux lately... maybe he can give some pointers?
Preliminary symptoms seems to be issues with LDFLAGS.

I need to check the .mozconfig file to comment further.

Updated

8 years ago
Summary: turn on profile-guided optimization on fx-linux-tbox → turn on profile-guided optimization on linux
Whiteboard: [not needed for 1.9] → [not needed for 1.9.0]
bz, please post your .mozconfig for Murali, as per comment #26.
Created attachment 362727 [details]
The mozconfig in question

I had to do some manual copy/paste inclusion (I assumed you didn't want me to attach the 4-5 mozconfig files involved that include each other), and I removed all the comment lines.  Other than that, this is the mozconfig involved.

Building on FC9 with:

% g++ --version
g++ (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8)
% ld --version
GNU ld version 2.18.50.0.6-7.fc9 20080403

and ccache enabled.
Please add the following line to your .mozconfig file

export LDFLAGS="-lgcov -static-libgcc"

However, on a separate note, you are using '-g' flag for CFLAGS and CXXFLAGS and also using 'disable-debug'.  Is it by design.

Thanks
Murali
(Reporter)

Comment 30

8 years ago
Last time I tried this (with an older GCC, definitely not 4.3) it didn't require any special mozconfig hacking. It's possible gcc just sucks.
> export LDFLAGS="-lgcov -static-libgcc"

I can try that, but sounds like https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization needs updating.  Or the build target should perhaps handle this automatically.

AS for -g and --disable-debug, yes.  I want to be able to have debug symbols in my optimized builds, both for crash stacks and actual debugging of issues that only show up in the optimized build.
Reed, why do you think gcov is the same as PGO?  They're very different things, although I could understand if they shared some common infrastructure.  Is there something I'm missing?
(Reporter)

Comment 33

8 years ago
(In reply to comment #31)
> > export LDFLAGS="-lgcov -static-libgcc"
> 
> I can try that, but sounds like
> https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization
> needs updating.  Or the build target should perhaps handle this automatically.

The build system should handle this all automatically. It's not doing anything complicated, just -fprofile-generate / -fprofile-use:
http://mxr.mozilla.org/mozilla-central/source/configure.in#7030
(In reply to comment #32)
> Reed, why do you think gcov is the same as PGO?  They're very different things,
> although I could understand if they shared some common infrastructure.  Is
> there something I'm missing?

I only mentioned gcov because of what the errors in comment #24 said.
Adding those LDFLAGS doesn't help; I still get the error from comment 24.  The relevant command running at the time is:

c++  -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-long-long -pedantic -g -fno-strict-aliasing -fshort-wchar -pthread -pipe  -DNDEBUG -DTRIMMED -g -fno-inline -Os -freorder-blocks -fno-reorder-functions  -fPIC -shared -Wl,-z,defs -Wl,-h,libplaces.so -o libplaces.so  nsAnnoProtocolHandler.o nsAnnotationService.o nsFaviconService.o nsNavHistory.o nsNavHistoryExpire.o nsNavHistoryQuery.o nsNavHistoryResult.o nsNavBookmarks.o nsMaybeWeakPtr.o nsMorkHistoryImporter.o nsPlacesModule.o nsNavHistoryAutoComplete.o     -lpthread -lgcov -static-libgcc         -Wl,-rpath-link,/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -Wl,-rpath-link,/usr/local/lib  ../../../../db/morkreader/libmorkreader_s.a /home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/lib/libunicharutil_s.a -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -lxpcom -lxpcom_core  -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl  -Wl,--version-script -Wl,/home/bzbarsky/mozilla/debug/mozilla/build/unix/gnu-ld-scripts/components-version-script -Wl,-Bsymbolic -lasound -ldl -lm
Flags: wanted-firefox3.6?
Flags: blocking-firefox3.6?

Comment 36

8 years ago
Created attachment 372234 [details] [diff] [review]
Use -fprofile-correction and enable PGO for jemalloc

I tried Mozilla's PGO build and at one point or another, GCC would bail out complaining about corrupted profiling files, much like the issue that led to patch #305385. It turns out that GCC's profiling support was written without taking threaded applications into account, leading to inconsistencies in profiling information, which would then cause GCC to error out when reading them later.

In GCC-4.4, a new compiler flag was added to solve this issue:

"When using -fprofile-generate with a multi-threaded program, the profile counts may be slightly wrong due to race conditions. The new -fprofile-correction option directs the compiler to apply heuristics to smooth out the inconsistencies. By default the compiler will give an error message when it finds an inconsistent profile."

This patch adds a the -fprofile-correction flag to the PGO compiler flags and re-enables PGO for jemalloc. Naturally, this adds a dependency on GCC-4.4 for PGO.

(I was unable to reproduce bz's linking issue.)

Comment 37

8 years ago
Comment on attachment 372234 [details] [diff] [review]
Use -fprofile-correction and enable PGO for jemalloc

Several rebuilds later I ran into the jemalloc build error. So clearly there is more going on here than meets the eye.
Attachment #372234 - Attachment is obsolete: true
Summary: turn on profile-guided optimization on linux → turn on profile-guided optimization (pgo) on linux

Comment 38

8 years ago
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code -11 during test run
INFO | (automation.py) | Application ran for: 0:00:00.465828
make: *** [profiledbuild] Error 245

Comment 39

8 years ago
I've got the same error on x86-64 with gcc4.4.0 no matter what mozconfig options I use.

Comment 40

8 years ago
I also get the TEST-UNEXPECTED-FAIL error when building with pgo in the moz-central branch as well as the 191 branch if pgo is on. The firefox profile-generation binary is segfaulting in __gcov_init -> strlen. I've tried the profile-correction patch as well as disabling jemalloc pgo with no success.
I get this on both 4.3 and 4.4, i haven't tested <=4.2

Comment 41

8 years ago
I ran into an identical gcov issue while trying to build Wine on Ubuntu 9.04: http://bugs.winehq.org/show_bug.cgi?id=18832

I'm beginning to think there's a deeper problem exposing this for both of us, like something within gcc and gcov

This thread (about another gcov problem): http://nic.uoregon.edu/pipermail/tau-users/2009-April/000245.html  suggests that it might be a mismatch issue:

> I was able to reproduce this only by having a mismatch of GCC built 
> libraries.  When I used a GCC 3.4.6 built TAU/PDT, I got:
>
> % tau_cc.sh -fprofile-arcs -fprofile-generate foo.c
>
> ...
> : undefined reference to `__gcov_indirect_call_profiler'
>
> However, when I rebuilt TAU,PDT, and then used it, I had no problem.
>
> When I'd used all 3.4.6 built libraries, it all worked as well, only with 
> the mismatch could I reproduce it.
>
(Reporter)

Comment 42

8 years ago
It certainly sounds like a busted GCC or gcov or something, not at all a problem with our codebase.

Comment 43

8 years ago
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code -11 during test run
INFO | (automation.py) | Application ran for: 0:00:00.465828
make: *** [profiledbuild] Error 245

Seems to connected with pgo script. When I'm using simple shell script, that just start and close ff, it builds just fine.
(Reporter)

Comment 44

8 years ago
The PGO script simply sets up the environment and launches the browser. The "code -11" there is the browser crashing.

Comment 45

8 years ago
Strange, as the same config give woring non-pgo build, also as I mentioned before, shell script produces working pgo build. So what's wrong? :)

Comment 46

8 years ago
There may be multiple issues at play, but when i was getting the code -11, running the profile-generate binary manually or through any script segfaulted. Non-pgo builds work fine for me, it's only the profile-generation binary that has issues. Skipping that and doing the final build without profile data negates the purpose of PGO.

Its possible your script is running the wrong binary, not stopping the build should it fail, or you're simply running into a different issue.

I've tried gcc 4.3 and 4.4, 4.4's profile-correction, enabling and disabling optimization, bare minimum mozconfigs, central and 191src. I used to run nightly builds with pgo on, and they just broke one day. I haven't gotten one to compile since.

Comment 47

8 years ago
Here's an issue I found from googling for gcov symbol errors:

http://stackoverflow.com/questions/566472/where-is-the-gcov-symbols

---
I just spent an incredible amount of time debugging a very similar error.
Here's what I learned:

    * You have to pass -fprofile-arcs -ftest-coverage when compiling.
    * You have to pass -fprofile-arcs when linking.
    * You can still get weird linker errors when linking. They'll look like
this: libname.a(objfile.o):(.ctors+0x0): undefined reference to `global
constructors keyed to long_name_of_file_and_function'

This means that gconv's having problem with one of your compiler-generated
constructors (in my case, a copy-constructor). Check the function mentioned in
the error message, see what kinds of objects it copy-constructs, and see if any
of those classes doesn't have a copy constructor. Add one, and the error will
go away.

Edit: Whether or not you optimize can also affect this. Try turing on /
switching off optimizations if you're having problems with it.
---

This may be what's causing the problem in both Wine and Firefox.

Comment 48

8 years ago
fantastic
For my error, maybe the issue is the nsTArray use on MorkColumn (which has no copy ctor)?  It'd suck if gcov required explicit copy ctors on everything we use nsTArray on.  :(

Comment 50

8 years ago
Try fixing just one of the copy constructors and see if gcc errors in a different place - if so, then we'll indeed need one for every function :(
Why am I reading MorkAnything in 2009?

/be
Because you want to be able to migrate Firefox 2 user profiles?
MorkReader (not Mork itself) is still used for import from pre-Firefox 3.0 history.
(Personally, I don't mind if FF-after-3.5, or even FF 3.5.1 can't import history from FF 2.0.)
We should get that on file... I'm not opposed to killing migration-from-distant-past in .next, though 3.5.1 seems premature.

Comment 56

8 years ago
make -f client.mk profiledbuild not working at present
/media/sdb1/programs/mozilla/central/src/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:989: error: corrupted profile info: number of executions for edge 2-3 thought to be 30069
/media/sdb1/programs/mozilla/central/src/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:989: error: corrupted profile info: number of executions for edge 2-5 thought to be -1
nsHttpAuthManager.cpp
make[9]: *** [nsHttpConnectionMgr.o] Error 1

make -f client.mk build working

Last time I built successfully was about last Thursday or Friday (am at GMT+10) here
(Reporter)

Comment 57

8 years ago
This looks like a GCC problem. I think given the sheer number of comments here, GCC's PGO is so buggy/fickle as to be unusable.

Comment 58

8 years ago
And today
make -f client.mk profiledbuild  
is working
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090723 Minefield/3.6a1pre ID:20090723105705

It did complain and stop in the configure section about libiw-dev being missing (ubuntu here and apt-get fixed that) but I don't know if that is related to the previous failure.

Comment 59

8 years ago
Profiledbuild died again today 1245pm Monday 3 Aug 2009 Aust Eastern Std Time (GMT +10).  Don't know when the change to make it fail happened.  Sometime between now and last Thursday.

/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp: In member function ‘PRBool nsEventQueue::PutEvent(nsIRunnable*)’:
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 17-18 thought to be 3064
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 17-23 thought to be -1
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 23-24 thought to be 3064
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 23-26 thought to be -1
make[7]: *** [nsEventQueue.o] Error 1
make[7]: *** Waiting for unfinished jobs....
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsThreadPool.cpp: In member function ‘virtual nsresult nsThreadPool::Run()’:
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsThreadPool.cpp:163: warning: ‘idleSince’ may be used uninitialised in this function
make[7]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser/xpcom/threads'
make[6]: *** [libs] Error 2
make[6]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser/xpcom'
make[5]: *** [libs_tier_xpcom] Error 2
make[5]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[4]: *** [tier_xpcom] Error 2
make[4]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[3]: *** [default] Error 2
make[3]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[2]: *** [build] Error 2
make[2]: Leaving directory `/media/sdb1/programs/mozilla/central/src'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/media/sdb1/programs/mozilla/central/src'
make: *** [profiledbuild] Error 2

ordinary build works.

Is there any chance of having an official pgo build attempted once per day, so that there will exist a body of data about how and when it fails?
Flags: wanted-firefox3.6?
Flags: wanted-firefox3.6-
Flags: blocking-firefox3.6?
Flags: blocking-firefox3.6-
(Assignee)

Comment 60

7 years ago
(In reply to comment #57)
> This looks like a GCC problem. I think given the sheer number of comments here,
> GCC's PGO is so buggy/fickle as to be unusable.

That may be, but we have enough gcc-brains in mozilla now that we have little excuse to not try to deploy it. Here is why.

I am building with gcc 4.4.3 on fedora, didn't run into any issues mentioned in this bug(may be luck).

Here are some cold startup numbers. First is the name where ordered=(application of bug 549749), static=(bug 525013), and pgo=(pgo with debloating -freorder-blocks-and-partition). Second column is cold startup time. Third column is RSS(ie how much ram is paged in via mmap, in this experiment it basically demonstrates how horribly inefficiently our code is laid out right now).

firefox.stock: 2515ms 49452

firefox.ordered 1919ms 45344

firefox.static 2321ms  49616

firefox.static.ordered 1577ms 37072

firefox.static.pgo 1619ms 38436 

The improvements in startup stem completely from improved seeking behavior. There is a 3x reduction in pagefaults between the slowest and fastest build. 
The reason why pgo is fast(as far as I can tell it does not lay out the binary chronologically) is because -freorder-blocks-and-partition breaks up functions into their useful part(ie the part that executes) and a bloated part(ie error checking that isn't needed 99% of the time.
Unfortunately I haven't yet figured out how to chronologically order symbols in a pgo build, that would result in the fastest-feasible-firefox-binary.

Conclusion: PGO wins are massive and very real. Because of -freorder-blocks-and-partition pgo can effectively debloat our functions resulting in a nimbler-smaller firefox. This combined with likely more efficient-runtime performance means we should aggressively pursue PGO.

Aside: All this made me wonder if it's possible to build on top of -freorder-blocks-and-partition to output cold functions as thumb for mobile. From my limited understanding of arm stuff, that should give us a very nice footprint win.
(Assignee)

Updated

7 years ago
Whiteboard: [not needed for 1.9.0] → [not needed for 1.9.0][ts]
I think Taras just signed up.
Assignee: nobody → tglek
FWIW, all I got from trying to build with PGO is a crash during libxul initialization when the profile is being run, i.e. after the first build, not the second one. And the crash has a very nice mozilla unrelated backtrace:
 #0  0x00007ffff40807c1 in strlen () from /lib/libc.so.6
#1  0x00007ffff6823a92 in __gcov_init () from /tmp/xulrunner/dist/bin/libxul.so
#2  0x00007ffff6824f56 in __do_global_ctors_aux () from /tmp/xulrunner/dist/bin/libxul.so
#3  0x00007ffff51888ab in _init () from /tmp/xulrunner/dist/bin/libxul.so
#4  0x00007fffffffe908 in ?? ()
#5  0x00007ffff7dee429 in ?? () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7dee5af in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7de1b2a in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000001 in ?? ()
#9  0x00007fffffffeb8c in ?? ()
#10 0x0000000000000000 in ?? ()

I haven't tried on x86, though, only on x86-64. But that means PGO already doesn't work everywhere.

I do think, however, that what could be more interesting than PGO (and would work better) is profile guided code improvement. The idea would be to hint the compiler that chunks of code are likely or unlikely to be run, depending on data gathered from the profile, using __builtin_expect() for gcc, for example. I think that would make at the very least 50% (if not near 100%) of what PGO does, without having to build twice, and without resorting on gcc working properly for PGO.
Confirmed, with the same gcc options, pgo works on x86 but not on x86-64.
(Assignee)

Comment 64

7 years ago
(In reply to comment #63)
> Confirmed, with the same gcc options, pgo works on x86 but not on x86-64.

Which version of gcc? I can't seem to get anything other than 4.4 to build stuff.

Comment 65

7 years ago
I get basically the same crash after the first build on x86-64, with gcc 4.4.3.
> Which version of gcc? I can't seem to get anything other than 4.4 to build
> stuff.

4.4.3 from debian for both x86 and x86-64.

Updated

7 years ago
(Assignee)

Comment 67

7 years ago
Figured out the problem. profiledbuild doesn't work on GCC 4.3 because the build system occasionally drops the -fprofile-generate flags(ie while building parts of nss) which then confuses things at link time(newer compilers seem to be more lenient there).
Unfortunately even if one does pass the pgo options everywhere, the second build fails because gcc 4.3 does not have -Wcoverage-mismatch. Figuring out what compiler we should switch to now (4.5 or 4.4).
(Assignee)

Updated

7 years ago
Depends on: 559964

Comment 68

7 years ago
Taras, I am having difficulty reproducing your results. I am using a Slackware Linux user's build script to build firefox with PGO enabled:

http://www.linuxquestions.org/questions/linux-software-2/building-firefox-3-6-a-profile-guided-optimization-pgo-automated-build-script-784164/

I made some improvements to her script and contributed them back to the thread at linuxquestions.org, however, I am unable to reproduce any improvements in Sun Spider benchmarks when comparing her script's PGO-build to the system Firefox, even if both are patched to enable TraceMonkey support on 64-bit Linux.

I think my difficulty is because -freorder-blocks-and-partition is not being passed to GCC. What are you doing to pass -freorder-blocks-and-partition to GCC during the initial build?
(Assignee)

Comment 69

7 years ago
(In reply to comment #68)
> I think my difficulty is because -freorder-blocks-and-partition is not being
> passed to GCC. What are you doing to pass -freorder-blocks-and-partition to GCC
> during the initial build?
in configure.in change
MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK" to
MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks-and-partition -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"

Those pgo instructions are a bit crazy. See https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization , all you have to do is make a script that can launch firefox. Add the PROFILE_GEN_SCRIPT bit to your .mozconfig and do make -f client.mk profiledbuild

Comment 70

7 years ago
Taras, thankyou for the information. I tried building firefox using mozilla's pgo instructions, but whenever I try building it, I get the following error message mid-way through the build:

OBJDIR=/home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build python /home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build/_profile/pgo/profileserver.py
INFO | automation.py | Application pid: 28455
TEST-UNEXPECTED-FAIL | automation.py | Exited with code -11 during test run
INFO | automation.py | Application ran for: 0:00:00.109925
make: *** [profiledbuild] Error 245

It has been like this whenever I would try doing PGO-builds over the past few years. Googling the error message gives me plenty of people with the same issue, but no solutions.
(Assignee)

Comment 71

7 years ago
(In reply to comment #70)
> Taras, thankyou for the information. I tried building firefox using mozilla's
> pgo instructions, but whenever I try building it, I get the following error
> message mid-way through the build:
> 
> OBJDIR=/home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build
> python
> /home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build/_profile/pgo/profileserver.py
> INFO | automation.py | Application pid: 28455
> TEST-UNEXPECTED-FAIL | automation.py | Exited with code -11 during test run
> INFO | automation.py | Application ran for: 0:00:00.109925
> make: *** [profiledbuild] Error 245

I am guessing that this this due to the same segfault as others have reported(on 64bit). This may be due to our buildsystem occasionally omitting pgo flags. The other approach to doing pgo is to build with
export PGO="-fprofile-generate=/tmp/pgo -fprofile-arc"
export CXX="g++ $PGO"
export CC="cc $PGO"
in your .mozconfig
then run firefox, then build it a second time with PGO=-fprofile-use=/tmp/pgo

It would be cool if someone could figure out how to build firefox with pgo on 64bit.
I don't see why the flags would be sometimes ommitted when building on x86-64 and not on x86. My guess is that the problem lies in gcc. Even building with CXX="g++ --coverage" and CC="gcc --coverage" results in the same crashes.
(Assignee)

Comment 73

7 years ago
(In reply to comment #72)
> I don't see why the flags would be sometimes ommitted when building on x86-64
> and not on x86. My guess is that the problem lies in gcc. Even building with
> CXX="g++ --coverage" and CC="gcc --coverage" results in the same crashes.

They are occasionally omitted on all platforms. Thanks for the info,
(Assignee)

Comment 74

7 years ago
files the x86_64 gcc bug as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43825
(Assignee)

Updated

7 years ago
Depends on: 560897
(Assignee)

Comment 75

7 years ago
Created attachment 440627 [details] [diff] [review]
pgo-friendly compiler flags

this plus fix in bug 560897 = working pgo on x86_64(and likely gcc 4.5)
-fprofile-correction in optim flags doesn't sound right
(Assignee)

Comment 77

7 years ago
(In reply to comment #76)
> -fprofile-correction in optim flags doesn't sound right

it's not something that would land, just enough to show that pgo works. Sticking those flags in right  places is the right thing(tm).
(Assignee)

Comment 78

7 years ago
Created attachment 442123 [details] [diff] [review]
configury
Attachment #440627 - Attachment is obsolete: true
Attachment #442123 - Flags: review?(ted.mielczarek)
(In reply to comment #78)
> Created an attachment (id=442123) [details]
> configury

The patch is absolutely amazing. Works even with my complicated .mozconfig. Thank you very much! We all have been waiting for it.
(Reporter)

Updated

7 years ago
Attachment #442123 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Updated

7 years ago
Keywords: checkin-needed
Comment on attachment 442123 [details] [diff] [review]
configury

Pushed as http://hg.mozilla.org/mozilla-central/rev/7a6a20da79ae

Should this bug be closed or is actually enabling pgo build on build servers part of this bug ?
Keywords: checkin-needed
(Assignee)

Comment 81

7 years ago
(In reply to comment #80)
> (From update of attachment 442123 [details] [diff] [review])
> Pushed as http://hg.mozilla.org/mozilla-central/rev/7a6a20da79ae
> 
> Should this bug be closed or is actually enabling pgo build on build servers
> part of this bug ?

Thanks. We'll turn it on via bug 559964.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Reporter)

Updated

7 years ago
Summary: turn on profile-guided optimization (pgo) on linux → fix profile-guided optimization (pgo) on linux
(Assignee)

Updated

7 years ago
Blocks: 564511
Hm, it fails to compile now. Regression range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ae1773250d39&tochange=5b3604a3cfbe

gcc -o now.o -c     -fomit-frame-pointer -Wall -pthread -march=core2 -O2 -pipe -fPIC  -UDEBUG  -DMOZILLA_CLIENT=1 -DNDEBUG=1 -DHAVE_VISIBILITY_HIDDEN_ATTRIBUTE=1 -DHAVE_VISIBILITY_PRAGMA=1 -DXP_UNIX=1 -D_GNU_SOURCE=1 -DHAVE_FCNTL_FILE_LOCKING=1 -DLINUX=1 -Di386=1 -DHAVE_LCHOWN=1 -DHAVE_STRERROR=1 -D_REENTRANT=1  -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM   -fprofile-use -fprofile-correction -Wcoverage-mismatch -freoder-blocks-and-partition /home/virus_found/abs/firefox/src/mozilla-central-build/nsprpub/config/now.c
cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
make[5]: *** [now.o] Error 1
(Assignee)

Comment 83

7 years ago
(In reply to comment #82)

> cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> make[5]: *** [now.o] Error 1

Sounds like you configured with one compiler and are building with an older one.
(In reply to comment #83)
> (In reply to comment #82)
> 
> > cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> > make[5]: *** [now.o] Error 1
> 
> Sounds like you configured with one compiler and are building with an older
> one.

I'm sorry, I don't get it. What do you mean by "configured"? I configure with autoconf-2.13.
(In reply to comment #83)
> (In reply to comment #82)
> 
> > cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> > make[5]: *** [now.o] Error 1
> 
> Sounds like you configured with one compiler and are building with an older
> one.

It just looks like there is a typo... reoder vs. reorder.
(In reply to comment #85)
> It just looks like there is a typo... reoder vs. reorder.

Yes, this turns out to be the reason. This was introduced in http://hg.mozilla.org/mozilla-central/rev/b5b016bb7c91
Created attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit
Comment on attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit

I'm not permitted to change anything in this bug, but adding comments and patches, so this needs to be checked in.
Attachment #449604 - Flags: review?
Attachment #449604 - Flags: review? → review?(ted.mielczarek)
Attachment #449604 - Flags: review?(ted.mielczarek)
(Reporter)

Comment 89

7 years ago
Comment on attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit

This is fallout from bug 564851. My patch there is correct, but somehow I screwed it up while landing, apparently. I checked in a fix in NSPR CVS.

Updated

7 years ago
Blocks: 608673
You need to log in before you can comment on or make changes to this bug.