Last Comment Bug 418866 - fix profile-guided optimization (pgo) on linux
: fix profile-guided optimization (pgo) on linux
Status: RESOLVED FIXED
[not needed for 1.9.0][ts]
:
Product: Firefox
Classification: Client Software
Component: Build Config (show other bugs)
: unspecified
: x86 Linux
: -- normal with 25 votes (vote)
: ---
Assigned To: (dormant account)
:
Mentors:
: 384899 (view as bug list)
Depends on: 418772 gcc4.5 560897
Blocks: 564511 608673
  Show dependency treegraph
 
Reported: 2008-02-21 11:19 PST by Ted Mielczarek [:ted.mielczarek]
Modified: 2012-05-05 13:54 PDT (History)
54 users (show)
ted: blocking‑firefox3.5-
benjamin: blocking‑firefox3.6-
reed: wanted‑firefox3.5?
benjamin: wanted‑firefox3.6-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
enable pgo on fx-linux-tbox (1.67 KB, patch)
2008-02-21 15:03 PST, Ted Mielczarek [:ted.mielczarek]
sayrer: review+
Details | Diff | Review
with clobbery (2.25 KB, patch)
2008-02-21 15:41 PST, Ted Mielczarek [:ted.mielczarek]
coop: review+
mbeltzner: approval1.9b4-
ted: approval1.9+
Details | Diff | Review
skip PGO in jemalloc [checked in] (544 bytes, patch)
2008-02-24 12:20 PST, Ted Mielczarek [:ted.mielczarek]
benjamin: review+
ted: approval1.9+
Details | Diff | Review
The mozconfig in question (1.18 KB, text/plain)
2009-02-17 08:15 PST, Boris Zbarsky [:bz] (Out June 25-July 6)
no flags Details
Use -fprofile-correction and enable PGO for jemalloc (1.95 KB, patch)
2009-04-11 12:14 PDT, Bastiaan Jacques
no flags Details | Diff | Review
pgo-friendly compiler flags (668 bytes, patch)
2010-04-21 15:02 PDT, (dormant account)
no flags Details | Diff | Review
configury (2.11 KB, patch)
2010-04-28 10:56 PDT, (dormant account)
ted: review+
Details | Diff | Review
fixes two typos, introduced in b5b016bb7c91 commit (1.76 KB, patch)
2010-06-07 04:38 PDT, Evgeny Burmentyev [:virus_found]
no flags Details | Diff | Review

Description Ted Mielczarek [:ted.mielczarek] 2008-02-21 11:19:50 PST
Patch in a bit.
Comment 1 Ted Mielczarek [:ted.mielczarek] 2008-02-21 11:50:54 PST
This was actually ok before bug 361343.  There was another older bug where I made the existing PGO support work again.

Comment 2 Ted Mielczarek [:ted.mielczarek] 2008-02-21 15:03:14 PST
Created attachment 304838 [details] [diff] [review]
enable pgo on fx-linux-tbox

There you go.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2008-02-21 15:41:03 PST
Created attachment 304854 [details] [diff] [review]
with clobbery

So, this is going to be an unpopular change, I think we need to make it clobber as well.
Comment 4 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-02-21 15:50:41 PST
From offline discussions with Damon, RobSayre, and then Ted:

1) This is being proposed for nightly/clobber builds, as well as hourly/incremental builds as well as release builds.

2) There are similar changes required for Mac. Ted is filing a bug for that.

3) This double-compile process will take extra time to cycle a build; seems like linux would be 2x, win32 would be just under 2x. This will need to be communicated to everyone watching Tinderbox.

4) We are not blocked on bug#418896 or bug#418894.
Comment 5 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-02-21 15:51:17 PST
marking P1 as its needed for beta4.
Comment 6 Robert Sayre 2008-02-21 20:49:15 PST
I had to back this out because of the following error:

/builds/tinderbox/Fx-Trunk/Linux_2.6.18-8.el5_Depend/mozilla/memory/jemalloc/jemalloc.c: In function 'choose_arena':
/builds/tinderbox/Fx-Trunk/Linux_2.6.18-8.el5_Depend/mozilla/memory/jemalloc/jemalloc.c:6043: error: corrupted profile info: edge from 5 to 6 exceeds maximal count
gmake[4]: *** [jemalloc.o] Error 1
Comment 7 Robert Sayre 2008-02-21 21:58:15 PST
Looks like this might be just jemalloc causing problems. We should be able to skip PGO for that.
Comment 8 Ted Mielczarek [:ted.mielczarek] 2008-02-24 12:20:17 PST
Created attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

This should do it.
Comment 9 Benjamin Smedberg [:bsmedberg] 2008-02-25 09:00:41 PST
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

File a followup? I imagine PGO for jemalloc would useful if simply for branch-prediction/locality optimizations.
Comment 10 Ted Mielczarek [:ted.mielczarek] 2008-02-25 09:04:10 PST
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

Filed bug 419470, will mention it in a comment.
Comment 11 Damon Sicore (:damons) 2008-02-25 11:48:52 PST
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

a1.9+=damons
Comment 12 Ted Mielczarek [:ted.mielczarek] 2008-02-25 12:26:00 PST
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

sayrer: you should be clear to try this again, but given that we're freezing tomorrow night, it might be rough to do it before wednesday.
Comment 13 Mike Beltzner [:beltzner, not reading bugmail] 2008-02-27 21:38:29 PST
Do you guys need a clean window to do this, like with Windows?
Comment 14 Robert Sayre 2008-02-27 21:44:26 PST
(In reply to comment #13)
> Do you guys need a clean window to do this, like with Windows?

That would be best, but I don't think it's critical we do it this week. Our best avenue for beta linux users seems to be distro betas. So I'm content to focus on mac and additional Windows profile input for the rest of this week.
Comment 15 Mike Beltzner [:beltzner, not reading bugmail] 2008-02-27 22:56:21 PST
Comment on attachment 304854 [details] [diff] [review]
with clobbery

a=beltzner for after beta 4
Comment 16 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-02-28 13:12:32 PST
*** Bug 384899 has been marked as a duplicate of this bug. ***
Comment 17 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-03-03 11:16:05 PST
(In reply to comment #15)
> (From update of attachment 304854 [details] [diff] [review])
> a=beltzner for after beta 4

Therefore, removing as blocker on bug#418926 (3.0beta4 tracking bug)
Comment 18 Ted Mielczarek [:ted.mielczarek] 2008-03-12 04:59:36 PDT
Rob: do you want to do this? This has been sort of hanging out here for a while.
Comment 19 Mike Schroepfer 2008-03-18 11:59:01 PDT
Given limited time in the release we are punting pgo on mac and linux - let's pick this up in next release
Comment 20 Ted Mielczarek [:ted.mielczarek] 2008-05-09 11:12:48 PDT
Comment on attachment 304854 [details] [diff] [review]
with clobbery

Clearing approval flag to get this off the radar.
Comment 21 Ted Mielczarek [:ted.mielczarek] 2008-05-09 11:13:01 PDT
Comment on attachment 305385 [details] [diff] [review]
skip PGO in jemalloc [checked in]

Clearing approval flag to get this off the radar.
Comment 22 Ted Mielczarek [:ted.mielczarek] 2008-05-09 13:13:25 PDT
We may take this in 3.next.
Comment 23 Ted Mielczarek [:ted.mielczarek] 2009-02-16 06:07:32 PST
I wouldn't block on it, though.
Comment 24 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-02-16 10:07:34 PST
So I just tried compiling with pgo on Linux.  I get:

../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o): In function `global constructors keyed to 65535_0_nsMorkReader.cpp':
/home/bzbarsky/mozilla/debug/mozilla/db/morkreader/nsMorkReader.cpp:579: undefined reference to `__gcov_init'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x24): undefined reference to `__gcov_merge_add'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x30): undefined reference to `__gcov_merge_single'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x3c): undefined reference to `__gcov_merge_single'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x48): undefined reference to `__gcov_merge_add'
../../../../db/morkreader/libmorkreader_s.a(nsMorkReader.o):(.data.rel+0x54): undefined reference to `__gcov_merge_ior'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o): In function `global constructors keyed to 65535_0_nsUnicharUtils.cpp':
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/intl/unicharutil/util/internal/nsUnicharUtils.cpp:226: undefined reference to `__gcov_init'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o):(.data.rel+0x24): undefined reference to `__gcov_merge_add'
/home/bzbarsky/mozilla/debug/obj-firefox-gpo/dist/lib/libunicharutil_s.a(nsUnicharUtils.o):(.data.rel+0x30): undefined reference to `__gcov_merge_single'
collect2: ld returned 1 exit status
gmake[7]: *** [libplaces.so] Error 1
Comment 25 Reed Loden [:reed] (use needinfo?) 2009-02-16 11:14:00 PST
I think Murali was using gcov on Linux lately... maybe he can give some pointers?
Comment 26 Murali Nandigama [:murali] 2009-02-16 12:03:21 PST
Preliminary symptoms seems to be issues with LDFLAGS.

I need to check the .mozconfig file to comment further.
Comment 27 Reed Loden [:reed] (use needinfo?) 2009-02-17 01:41:33 PST
bz, please post your .mozconfig for Murali, as per comment #26.
Comment 28 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-02-17 08:15:25 PST
Created attachment 362727 [details]
The mozconfig in question

I had to do some manual copy/paste inclusion (I assumed you didn't want me to attach the 4-5 mozconfig files involved that include each other), and I removed all the comment lines.  Other than that, this is the mozconfig involved.

Building on FC9 with:

% g++ --version
g++ (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8)
% ld --version
GNU ld version 2.18.50.0.6-7.fc9 20080403

and ccache enabled.
Comment 29 Murali Nandigama [:murali] 2009-02-17 11:45:41 PST
Please add the following line to your .mozconfig file

export LDFLAGS="-lgcov -static-libgcc"

However, on a separate note, you are using '-g' flag for CFLAGS and CXXFLAGS and also using 'disable-debug'.  Is it by design.

Thanks
Murali
Comment 30 Ted Mielczarek [:ted.mielczarek] 2009-02-17 11:51:45 PST
Last time I tried this (with an older GCC, definitely not 4.3) it didn't require any special mozconfig hacking. It's possible gcc just sucks.
Comment 31 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-02-17 12:02:45 PST
> export LDFLAGS="-lgcov -static-libgcc"

I can try that, but sounds like https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization needs updating.  Or the build target should perhaps handle this automatically.

AS for -g and --disable-debug, yes.  I want to be able to have debug symbols in my optimized builds, both for crash stacks and actual debugging of issues that only show up in the optimized build.
Comment 32 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-17 12:10:03 PST
Reed, why do you think gcov is the same as PGO?  They're very different things, although I could understand if they shared some common infrastructure.  Is there something I'm missing?
Comment 33 Ted Mielczarek [:ted.mielczarek] 2009-02-17 12:36:28 PST
(In reply to comment #31)
> > export LDFLAGS="-lgcov -static-libgcc"
> 
> I can try that, but sounds like
> https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization
> needs updating.  Or the build target should perhaps handle this automatically.

The build system should handle this all automatically. It's not doing anything complicated, just -fprofile-generate / -fprofile-use:
http://mxr.mozilla.org/mozilla-central/source/configure.in#7030
Comment 34 Reed Loden [:reed] (use needinfo?) 2009-02-17 13:59:11 PST
(In reply to comment #32)
> Reed, why do you think gcov is the same as PGO?  They're very different things,
> although I could understand if they shared some common infrastructure.  Is
> there something I'm missing?

I only mentioned gcov because of what the errors in comment #24 said.
Comment 35 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-02-17 14:18:56 PST
Adding those LDFLAGS doesn't help; I still get the error from comment 24.  The relevant command running at the time is:

c++  -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-long-long -pedantic -g -fno-strict-aliasing -fshort-wchar -pthread -pipe  -DNDEBUG -DTRIMMED -g -fno-inline -Os -freorder-blocks -fno-reorder-functions  -fPIC -shared -Wl,-z,defs -Wl,-h,libplaces.so -o libplaces.so  nsAnnoProtocolHandler.o nsAnnotationService.o nsFaviconService.o nsNavHistory.o nsNavHistoryExpire.o nsNavHistoryQuery.o nsNavHistoryResult.o nsNavBookmarks.o nsMaybeWeakPtr.o nsMorkHistoryImporter.o nsPlacesModule.o nsNavHistoryAutoComplete.o     -lpthread -lgcov -static-libgcc         -Wl,-rpath-link,/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -Wl,-rpath-link,/usr/local/lib  ../../../../db/morkreader/libmorkreader_s.a /home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/lib/libunicharutil_s.a -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -lxpcom -lxpcom_core  -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/bin -L/home/bzbarsky/mozilla/debug/obj-firefox-pgo/dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl  -Wl,--version-script -Wl,/home/bzbarsky/mozilla/debug/mozilla/build/unix/gnu-ld-scripts/components-version-script -Wl,-Bsymbolic -lasound -ldl -lm
Comment 36 Bastiaan Jacques 2009-04-11 12:14:51 PDT
Created attachment 372234 [details] [diff] [review]
Use -fprofile-correction and enable PGO for jemalloc

I tried Mozilla's PGO build and at one point or another, GCC would bail out complaining about corrupted profiling files, much like the issue that led to patch #305385. It turns out that GCC's profiling support was written without taking threaded applications into account, leading to inconsistencies in profiling information, which would then cause GCC to error out when reading them later.

In GCC-4.4, a new compiler flag was added to solve this issue:

"When using -fprofile-generate with a multi-threaded program, the profile counts may be slightly wrong due to race conditions. The new -fprofile-correction option directs the compiler to apply heuristics to smooth out the inconsistencies. By default the compiler will give an error message when it finds an inconsistent profile."

This patch adds a the -fprofile-correction flag to the PGO compiler flags and re-enables PGO for jemalloc. Naturally, this adds a dependency on GCC-4.4 for PGO.

(I was unable to reproduce bz's linking issue.)
Comment 37 Bastiaan Jacques 2009-04-12 04:13:16 PDT
Comment on attachment 372234 [details] [diff] [review]
Use -fprofile-correction and enable PGO for jemalloc

Several rebuilds later I ran into the jemalloc build error. So clearly there is more going on here than meets the eye.
Comment 38 drdarkraven 2009-05-23 07:18:11 PDT
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code -11 during test run
INFO | (automation.py) | Application ran for: 0:00:00.465828
make: *** [profiledbuild] Error 245
Comment 39 koralik 2009-06-07 04:32:40 PDT
I've got the same error on x86-64 with gcc4.4.0 no matter what mozconfig options I use.
Comment 40 Nephyrin Zey 2009-06-07 15:41:21 PDT
I also get the TEST-UNEXPECTED-FAIL error when building with pgo in the moz-central branch as well as the 191 branch if pgo is on. The firefox profile-generation binary is segfaulting in __gcov_init -> strlen. I've tried the profile-correction patch as well as disabling jemalloc pgo with no success.
I get this on both 4.3 and 4.4, i haven't tested <=4.2
Comment 41 Scott Ritchie 2009-06-07 19:47:30 PDT
I ran into an identical gcov issue while trying to build Wine on Ubuntu 9.04: http://bugs.winehq.org/show_bug.cgi?id=18832

I'm beginning to think there's a deeper problem exposing this for both of us, like something within gcc and gcov

This thread (about another gcov problem): http://nic.uoregon.edu/pipermail/tau-users/2009-April/000245.html  suggests that it might be a mismatch issue:

> I was able to reproduce this only by having a mismatch of GCC built 
> libraries.  When I used a GCC 3.4.6 built TAU/PDT, I got:
>
> % tau_cc.sh -fprofile-arcs -fprofile-generate foo.c
>
> ...
> : undefined reference to `__gcov_indirect_call_profiler'
>
> However, when I rebuilt TAU,PDT, and then used it, I had no problem.
>
> When I'd used all 3.4.6 built libraries, it all worked as well, only with 
> the mismatch could I reproduce it.
>
Comment 42 Ted Mielczarek [:ted.mielczarek] 2009-06-08 03:41:46 PDT
It certainly sounds like a busted GCC or gcov or something, not at all a problem with our codebase.
Comment 43 koralik 2009-06-09 08:51:00 PDT
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code -11 during test run
INFO | (automation.py) | Application ran for: 0:00:00.465828
make: *** [profiledbuild] Error 245

Seems to connected with pgo script. When I'm using simple shell script, that just start and close ff, it builds just fine.
Comment 44 Ted Mielczarek [:ted.mielczarek] 2009-06-09 09:56:48 PDT
The PGO script simply sets up the environment and launches the browser. The "code -11" there is the browser crashing.
Comment 45 koralik 2009-06-09 11:05:11 PDT
Strange, as the same config give woring non-pgo build, also as I mentioned before, shell script produces working pgo build. So what's wrong? :)
Comment 46 Nephyrin Zey 2009-06-09 11:25:00 PDT
There may be multiple issues at play, but when i was getting the code -11, running the profile-generate binary manually or through any script segfaulted. Non-pgo builds work fine for me, it's only the profile-generation binary that has issues. Skipping that and doing the final build without profile data negates the purpose of PGO.

Its possible your script is running the wrong binary, not stopping the build should it fail, or you're simply running into a different issue.

I've tried gcc 4.3 and 4.4, 4.4's profile-correction, enabling and disabling optimization, bare minimum mozconfigs, central and 191src. I used to run nightly builds with pgo on, and they just broke one day. I haven't gotten one to compile since.
Comment 47 Scott Ritchie 2009-06-09 13:16:41 PDT
Here's an issue I found from googling for gcov symbol errors:

http://stackoverflow.com/questions/566472/where-is-the-gcov-symbols

---
I just spent an incredible amount of time debugging a very similar error.
Here's what I learned:

    * You have to pass -fprofile-arcs -ftest-coverage when compiling.
    * You have to pass -fprofile-arcs when linking.
    * You can still get weird linker errors when linking. They'll look like
this: libname.a(objfile.o):(.ctors+0x0): undefined reference to `global
constructors keyed to long_name_of_file_and_function'

This means that gconv's having problem with one of your compiler-generated
constructors (in my case, a copy-constructor). Check the function mentioned in
the error message, see what kinds of objects it copy-constructs, and see if any
of those classes doesn't have a copy constructor. Add one, and the error will
go away.

Edit: Whether or not you optimize can also affect this. Try turing on /
switching off optimizations if you're having problems with it.
---

This may be what's causing the problem in both Wine and Firefox.
Comment 48 Robert Sayre 2009-06-09 13:53:33 PDT
fantastic
Comment 49 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-06-09 14:47:13 PDT
For my error, maybe the issue is the nsTArray use on MorkColumn (which has no copy ctor)?  It'd suck if gcov required explicit copy ctors on everything we use nsTArray on.  :(
Comment 50 Scott Ritchie 2009-06-22 14:19:12 PDT
Try fixing just one of the copy constructors and see if gcc errors in a different place - if so, then we'll indeed need one for every function :(
Comment 51 Brendan Eich [:brendan] 2009-06-22 14:29:07 PDT
Why am I reading MorkAnything in 2009?

/be
Comment 52 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-06-22 14:35:54 PDT
Because you want to be able to migrate Firefox 2 user profiles?
Comment 53 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-06-22 14:38:31 PDT
MorkReader (not Mork itself) is still used for import from pre-Firefox 3.0 history.
Comment 54 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-06-22 15:22:45 PDT
(Personally, I don't mind if FF-after-3.5, or even FF 3.5.1 can't import history from FF 2.0.)
Comment 55 Mike Connor [:mconnor] 2009-06-23 18:40:18 PDT
We should get that on file... I'm not opposed to killing migration-from-distant-past in .next, though 3.5.1 seems premature.
Comment 56 Ian 2009-07-19 19:41:07 PDT
make -f client.mk profiledbuild not working at present
/media/sdb1/programs/mozilla/central/src/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:989: error: corrupted profile info: number of executions for edge 2-3 thought to be 30069
/media/sdb1/programs/mozilla/central/src/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:989: error: corrupted profile info: number of executions for edge 2-5 thought to be -1
nsHttpAuthManager.cpp
make[9]: *** [nsHttpConnectionMgr.o] Error 1

make -f client.mk build working

Last time I built successfully was about last Thursday or Friday (am at GMT+10) here
Comment 57 Ted Mielczarek [:ted.mielczarek] 2009-07-21 17:32:57 PDT
This looks like a GCC problem. I think given the sheer number of comments here, GCC's PGO is so buggy/fickle as to be unusable.
Comment 58 Ian 2009-07-22 19:34:54 PDT
And today
make -f client.mk profiledbuild  
is working
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090723 Minefield/3.6a1pre ID:20090723105705

It did complain and stop in the configure section about libiw-dev being missing (ubuntu here and apt-get fixed that) but I don't know if that is related to the previous failure.
Comment 59 Ian 2009-08-02 19:48:05 PDT
Profiledbuild died again today 1245pm Monday 3 Aug 2009 Aust Eastern Std Time (GMT +10).  Don't know when the change to make it fail happened.  Sometime between now and last Thursday.

/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp: In member function ‘PRBool nsEventQueue::PutEvent(nsIRunnable*)’:
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 17-18 thought to be 3064
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 17-23 thought to be -1
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 23-24 thought to be 3064
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsEventQueue.cpp:141: error: corrupted profile info: number of executions for edge 23-26 thought to be -1
make[7]: *** [nsEventQueue.o] Error 1
make[7]: *** Waiting for unfinished jobs....
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsThreadPool.cpp: In member function ‘virtual nsresult nsThreadPool::Run()’:
/media/sdb1/programs/mozilla/central/src/xpcom/threads/nsThreadPool.cpp:163: warning: ‘idleSince’ may be used uninitialised in this function
make[7]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser/xpcom/threads'
make[6]: *** [libs] Error 2
make[6]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser/xpcom'
make[5]: *** [libs_tier_xpcom] Error 2
make[5]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[4]: *** [tier_xpcom] Error 2
make[4]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[3]: *** [default] Error 2
make[3]: Leaving directory `/media/sdb1/programs/mozilla/obj-i686-pc-linux-gnu/ffpgo/browser'
make[2]: *** [build] Error 2
make[2]: Leaving directory `/media/sdb1/programs/mozilla/central/src'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/media/sdb1/programs/mozilla/central/src'
make: *** [profiledbuild] Error 2

ordinary build works.

Is there any chance of having an official pgo build attempted once per day, so that there will exist a body of data about how and when it fails?
Comment 60 (dormant account) 2010-04-12 11:01:14 PDT
(In reply to comment #57)
> This looks like a GCC problem. I think given the sheer number of comments here,
> GCC's PGO is so buggy/fickle as to be unusable.

That may be, but we have enough gcc-brains in mozilla now that we have little excuse to not try to deploy it. Here is why.

I am building with gcc 4.4.3 on fedora, didn't run into any issues mentioned in this bug(may be luck).

Here are some cold startup numbers. First is the name where ordered=(application of bug 549749), static=(bug 525013), and pgo=(pgo with debloating -freorder-blocks-and-partition). Second column is cold startup time. Third column is RSS(ie how much ram is paged in via mmap, in this experiment it basically demonstrates how horribly inefficiently our code is laid out right now).

firefox.stock: 2515ms 49452

firefox.ordered 1919ms 45344

firefox.static 2321ms  49616

firefox.static.ordered 1577ms 37072

firefox.static.pgo 1619ms 38436 

The improvements in startup stem completely from improved seeking behavior. There is a 3x reduction in pagefaults between the slowest and fastest build. 
The reason why pgo is fast(as far as I can tell it does not lay out the binary chronologically) is because -freorder-blocks-and-partition breaks up functions into their useful part(ie the part that executes) and a bloated part(ie error checking that isn't needed 99% of the time.
Unfortunately I haven't yet figured out how to chronologically order symbols in a pgo build, that would result in the fastest-feasible-firefox-binary.

Conclusion: PGO wins are massive and very real. Because of -freorder-blocks-and-partition pgo can effectively debloat our functions resulting in a nimbler-smaller firefox. This combined with likely more efficient-runtime performance means we should aggressively pursue PGO.

Aside: All this made me wonder if it's possible to build on top of -freorder-blocks-and-partition to output cold functions as thumb for mobile. From my limited understanding of arm stuff, that should give us a very nice footprint win.
Comment 61 Christopher Blizzard (:blizzard) 2010-04-12 11:06:34 PDT
I think Taras just signed up.
Comment 62 Mike Hommey [:glandium] 2010-04-12 23:50:28 PDT
FWIW, all I got from trying to build with PGO is a crash during libxul initialization when the profile is being run, i.e. after the first build, not the second one. And the crash has a very nice mozilla unrelated backtrace:
 #0  0x00007ffff40807c1 in strlen () from /lib/libc.so.6
#1  0x00007ffff6823a92 in __gcov_init () from /tmp/xulrunner/dist/bin/libxul.so
#2  0x00007ffff6824f56 in __do_global_ctors_aux () from /tmp/xulrunner/dist/bin/libxul.so
#3  0x00007ffff51888ab in _init () from /tmp/xulrunner/dist/bin/libxul.so
#4  0x00007fffffffe908 in ?? ()
#5  0x00007ffff7dee429 in ?? () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7dee5af in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7de1b2a in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000001 in ?? ()
#9  0x00007fffffffeb8c in ?? ()
#10 0x0000000000000000 in ?? ()

I haven't tried on x86, though, only on x86-64. But that means PGO already doesn't work everywhere.

I do think, however, that what could be more interesting than PGO (and would work better) is profile guided code improvement. The idea would be to hint the compiler that chunks of code are likely or unlikely to be run, depending on data gathered from the profile, using __builtin_expect() for gcc, for example. I think that would make at the very least 50% (if not near 100%) of what PGO does, without having to build twice, and without resorting on gcc working properly for PGO.
Comment 63 Mike Hommey [:glandium] 2010-04-14 08:05:02 PDT
Confirmed, with the same gcc options, pgo works on x86 but not on x86-64.
Comment 64 (dormant account) 2010-04-14 18:14:50 PDT
(In reply to comment #63)
> Confirmed, with the same gcc options, pgo works on x86 but not on x86-64.

Which version of gcc? I can't seem to get anything other than 4.4 to build stuff.
Comment 65 Joseph Felps 2010-04-14 21:27:49 PDT
I get basically the same crash after the first build on x86-64, with gcc 4.4.3.
Comment 66 Mike Hommey [:glandium] 2010-04-14 22:51:34 PDT
> Which version of gcc? I can't seem to get anything other than 4.4 to build
> stuff.

4.4.3 from debian for both x86 and x86-64.
Comment 67 (dormant account) 2010-04-16 12:05:08 PDT
Figured out the problem. profiledbuild doesn't work on GCC 4.3 because the build system occasionally drops the -fprofile-generate flags(ie while building parts of nss) which then confuses things at link time(newer compilers seem to be more lenient there).
Unfortunately even if one does pass the pgo options everywhere, the second build fails because gcc 4.3 does not have -Wcoverage-mismatch. Figuring out what compiler we should switch to now (4.5 or 4.4).
Comment 68 Richard 2010-04-18 08:17:18 PDT
Taras, I am having difficulty reproducing your results. I am using a Slackware Linux user's build script to build firefox with PGO enabled:

http://www.linuxquestions.org/questions/linux-software-2/building-firefox-3-6-a-profile-guided-optimization-pgo-automated-build-script-784164/

I made some improvements to her script and contributed them back to the thread at linuxquestions.org, however, I am unable to reproduce any improvements in Sun Spider benchmarks when comparing her script's PGO-build to the system Firefox, even if both are patched to enable TraceMonkey support on 64-bit Linux.

I think my difficulty is because -freorder-blocks-and-partition is not being passed to GCC. What are you doing to pass -freorder-blocks-and-partition to GCC during the initial build?
Comment 69 (dormant account) 2010-04-18 12:18:02 PDT
(In reply to comment #68)
> I think my difficulty is because -freorder-blocks-and-partition is not being
> passed to GCC. What are you doing to pass -freorder-blocks-and-partition to GCC
> during the initial build?
in configure.in change
MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK" to
MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks-and-partition -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"

Those pgo instructions are a bit crazy. See https://developer.mozilla.org/en/Building_with_Profile-Guided_Optimization , all you have to do is make a script that can launch firefox. Add the PROFILE_GEN_SCRIPT bit to your .mozconfig and do make -f client.mk profiledbuild
Comment 70 Richard 2010-04-18 14:50:10 PDT
Taras, thankyou for the information. I tried building firefox using mozilla's pgo instructions, but whenever I try building it, I get the following error message mid-way through the build:

OBJDIR=/home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build python /home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build/_profile/pgo/profileserver.py
INFO | automation.py | Application pid: 28455
TEST-UNEXPECTED-FAIL | automation.py | Exited with code -11 during test run
INFO | automation.py | Application ran for: 0:00:00.109925
make: *** [profiledbuild] Error 245

It has been like this whenever I would try doing PGO-builds over the past few years. Googling the error message gives me plenty of people with the same issue, but no solutions.
Comment 71 (dormant account) 2010-04-18 15:36:10 PDT
(In reply to comment #70)
> Taras, thankyou for the information. I tried building firefox using mozilla's
> pgo instructions, but whenever I try building it, I get the following error
> message mid-way through the build:
> 
> OBJDIR=/home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build
> python
> /home/richard/firefox-build/vanilla/mozilla-1.9.2/../firefox-build/_profile/pgo/profileserver.py
> INFO | automation.py | Application pid: 28455
> TEST-UNEXPECTED-FAIL | automation.py | Exited with code -11 during test run
> INFO | automation.py | Application ran for: 0:00:00.109925
> make: *** [profiledbuild] Error 245

I am guessing that this this due to the same segfault as others have reported(on 64bit). This may be due to our buildsystem occasionally omitting pgo flags. The other approach to doing pgo is to build with
export PGO="-fprofile-generate=/tmp/pgo -fprofile-arc"
export CXX="g++ $PGO"
export CC="cc $PGO"
in your .mozconfig
then run firefox, then build it a second time with PGO=-fprofile-use=/tmp/pgo

It would be cool if someone could figure out how to build firefox with pgo on 64bit.
Comment 72 Mike Hommey [:glandium] 2010-04-18 23:43:07 PDT
I don't see why the flags would be sometimes ommitted when building on x86-64 and not on x86. My guess is that the problem lies in gcc. Even building with CXX="g++ --coverage" and CC="gcc --coverage" results in the same crashes.
Comment 73 (dormant account) 2010-04-19 08:54:57 PDT
(In reply to comment #72)
> I don't see why the flags would be sometimes ommitted when building on x86-64
> and not on x86. My guess is that the problem lies in gcc. Even building with
> CXX="g++ --coverage" and CC="gcc --coverage" results in the same crashes.

They are occasionally omitted on all platforms. Thanks for the info,
Comment 74 (dormant account) 2010-04-20 17:13:57 PDT
files the x86_64 gcc bug as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43825
Comment 75 (dormant account) 2010-04-21 15:02:57 PDT
Created attachment 440627 [details] [diff] [review]
pgo-friendly compiler flags

this plus fix in bug 560897 = working pgo on x86_64(and likely gcc 4.5)
Comment 76 Mike Hommey [:glandium] 2010-04-21 15:13:28 PDT
-fprofile-correction in optim flags doesn't sound right
Comment 77 (dormant account) 2010-04-21 15:14:47 PDT
(In reply to comment #76)
> -fprofile-correction in optim flags doesn't sound right

it's not something that would land, just enough to show that pgo works. Sticking those flags in right  places is the right thing(tm).
Comment 78 (dormant account) 2010-04-28 10:56:46 PDT
Created attachment 442123 [details] [diff] [review]
configury
Comment 79 Evgeny Burmentyev [:virus_found] 2010-04-30 16:18:48 PDT
(In reply to comment #78)
> Created an attachment (id=442123) [details]
> configury

The patch is absolutely amazing. Works even with my complicated .mozconfig. Thank you very much! We all have been waiting for it.
Comment 80 Mike Hommey [:glandium] 2010-05-06 01:22:14 PDT
Comment on attachment 442123 [details] [diff] [review]
configury

Pushed as http://hg.mozilla.org/mozilla-central/rev/7a6a20da79ae

Should this bug be closed or is actually enabling pgo build on build servers part of this bug ?
Comment 81 (dormant account) 2010-05-06 09:12:58 PDT
(In reply to comment #80)
> (From update of attachment 442123 [details] [diff] [review])
> Pushed as http://hg.mozilla.org/mozilla-central/rev/7a6a20da79ae
> 
> Should this bug be closed or is actually enabling pgo build on build servers
> part of this bug ?

Thanks. We'll turn it on via bug 559964.
Comment 82 Evgeny Burmentyev [:virus_found] 2010-06-06 13:49:24 PDT
Hm, it fails to compile now. Regression range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ae1773250d39&tochange=5b3604a3cfbe

gcc -o now.o -c     -fomit-frame-pointer -Wall -pthread -march=core2 -O2 -pipe -fPIC  -UDEBUG  -DMOZILLA_CLIENT=1 -DNDEBUG=1 -DHAVE_VISIBILITY_HIDDEN_ATTRIBUTE=1 -DHAVE_VISIBILITY_PRAGMA=1 -DXP_UNIX=1 -D_GNU_SOURCE=1 -DHAVE_FCNTL_FILE_LOCKING=1 -DLINUX=1 -Di386=1 -DHAVE_LCHOWN=1 -DHAVE_STRERROR=1 -D_REENTRANT=1  -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM   -fprofile-use -fprofile-correction -Wcoverage-mismatch -freoder-blocks-and-partition /home/virus_found/abs/firefox/src/mozilla-central-build/nsprpub/config/now.c
cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
make[5]: *** [now.o] Error 1
Comment 83 (dormant account) 2010-06-06 16:25:54 PDT
(In reply to comment #82)

> cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> make[5]: *** [now.o] Error 1

Sounds like you configured with one compiler and are building with an older one.
Comment 84 Evgeny Burmentyev [:virus_found] 2010-06-07 00:03:31 PDT
(In reply to comment #83)
> (In reply to comment #82)
> 
> > cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> > make[5]: *** [now.o] Error 1
> 
> Sounds like you configured with one compiler and are building with an older
> one.

I'm sorry, I don't get it. What do you mean by "configured"? I configure with autoconf-2.13.
Comment 85 Mike Hommey [:glandium] 2010-06-07 00:28:48 PDT
(In reply to comment #83)
> (In reply to comment #82)
> 
> > cc1: error: unrecognized command line option "-freoder-blocks-and-partition"
> > make[5]: *** [now.o] Error 1
> 
> Sounds like you configured with one compiler and are building with an older
> one.

It just looks like there is a typo... reoder vs. reorder.
Comment 86 Evgeny Burmentyev [:virus_found] 2010-06-07 04:35:04 PDT
(In reply to comment #85)
> It just looks like there is a typo... reoder vs. reorder.

Yes, this turns out to be the reason. This was introduced in http://hg.mozilla.org/mozilla-central/rev/b5b016bb7c91
Comment 87 Evgeny Burmentyev [:virus_found] 2010-06-07 04:38:04 PDT
Created attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit
Comment 88 Evgeny Burmentyev [:virus_found] 2010-06-07 04:41:57 PDT
Comment on attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit

I'm not permitted to change anything in this bug, but adding comments and patches, so this needs to be checked in.
Comment 89 Ted Mielczarek [:ted.mielczarek] 2010-06-07 05:07:05 PDT
Comment on attachment 449604 [details] [diff] [review]
fixes two typos, introduced in b5b016bb7c91 commit

This is fallout from bug 564851. My patch there is correct, but somehow I screwed it up while landing, apparently. I checked in a fix in NSPR CVS.

Note You need to log in before you can comment on or make changes to this bug.