Closed Bug 419893 Opened 16 years ago Closed 15 years ago

sort out PGO-triggered bugs

Categories

(Firefox Build System :: General, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [tsnap])

Attachments

(3 files)

Some of our components (sqlite, storage, places) have bugs that are exacerbated by PGO. Some of my Win32 PGO builds have problems with accessing bookmarks, or consistent shutdown crashers in sqlite. This usually indicates that there are bugs in the code somewhere, and PGO is just forcing us to hit them reliably.

We've seen this when profiling with just a startup/shutdown cycle of the browser, so it might be possible to trace what methods we're calling, and do some analysis from that. For the sqlite stuff, it would be good to trace all our calls to sqlite, and see if we can give the sqlite devs a reduced testcase that they can reproduce with.
Depends on: 420069
Depends on: 420092
|make check| passes on my PGO build, so that's good.
Also mochitests pass, so I think we're clean on regressions of our test suite now that dbaron has checked in a fix for bug 420069.
Comment on attachment 343587 [details]
xpcshell errors for sqlite pgo

With Ted's direction I have been looking into the PGO bug.  I have reproduced the sqlite crash.  I also get errors with xpcshell.
The following steps will reproduce the crash in Firefox with sqlite PGO enable:

Step 1:
  modify /path/to/src/db/sqlite/src/Makefile.in
  change NO_PROFILE_GUIDED_OPTIMIZE = 1 to #NO_PROFILE_GUIDED_OPTIMIZE = 1

  the comment allows sqlite to be optimized

Step 2:
  modify .mozconfig 
  add mk_add_options PROFILE_GEN_SCRIPT="sh script/to/run.sh"

Step 3:
  create profile script
  My script is
  ************************************************************************
  xport NO_EM_RESTART=1 #this means 'do not restart for any reason' during automation we do not want any restarting

  mkdir $OBJDIR/_profileprofile 
  cd $OBJDIR/_tests/testing/mochitest
  python runtests.py --test-path=browser/ --autorun --close-when-done
  *************************************************************************

Step 4:
  then build with make -f client.mk profiledbuild

Step 5:
  from the $OBJDIR run xpcshell tests
  make check
I also built an sqlite application and compiled it with PGO and I still get the crashes.  I see it as an sqlite issue.

FYI, I can compile other applications with PGO and not crash.  I can also compile sqlite and have it not crash if I only profile inserting. So I don't think its me.
This zip file includes the source files for my test app.
Files needed but not include:
  sqlite3.h
  sqlite3.c
***found in path/to/src/db/sqlite/src/
Files included:
  test.cpp
  timer.h - used for determining if pgo did improve performance
  timer.cpp
  pgo.bat - a script that creates the pgo files, profiles with inserting, creates an optimized build and then runs two scenarios.

To create a crash remove the REM in the pgo.bat file.
Shawn, can you help Chris get this filed upstream?
You'll want to file a ticket here:
http://www.sqlite.org/cvstrac/captcha?nxp=/cvstrac/tktnew

Then post the link of the filed ticket back here.
We (the SQLite developers) have done PGO builds of sqlite3.c on both windows (visual studio 2005 team edition) and on linux (gcc 4.1.0).  The windows build was done using Chris's script.  Both builds worked great.  We are unable to recreate the problem.

Can you let us know what version of sqlite3.c you are using?  Also, what version of MSVC++?
I've reproduced this crash using Visual C++ 2005 SP1. sdwilsh can tell you what version of sqlite3.c we're using.
"Visual C++ 2005 Pro SP1" I should say.
Right now we are running SQLite 3.6.6.2, but I'll be pushing us up to 3.6.7 in bug 471685 today.
We've tried 3.6.6.2 using both VC++ and GCC.  Still works every time for us.  We are continuing to try to recreate your problem.  Any additional hints would be appreciated.
We are still unable to get any failures using PGO with either visual studio 2005 team editor or gcc 4.0.1, on either SQLite CVS-HEAD or on version 3.6.6.2 (from the SQLite website).  We have observed that on windows, PGO gives about a 5%-10% slowdown.  PGO gives about a 10% speed-up on linux with GCC and -O3, though.

Can someone please tell me how to get the specific sqlite3.c and sqlite3.h files that are being used?  Has the sqlite3.c source file been altered in any way from the version 3.6.6.2 release on the SQLite website?

Can anybody reproduce this problem on a version of VC++ other than the version used for official FF builds?  We have historically had a lot of problems (read: compiler bugs) with the VC++ optimizer, to the point that we usually recommend that people turn off all optimizations in VC++.  It would be interesting to know if this problem is specific to a particular compiler.

Has this problem be reproduced on either SQLite 3.6.7 from the website, or on CVS-HEAD?  Contact me directly if you want to try but need me to email you a "sqlite3.c" file for CVS-HEAD.
(In reply to comment #16)
> Can someone please tell me how to get the specific sqlite3.c and sqlite3.h
> files that are being used?  Has the sqlite3.c source file been altered in any
> way from the version 3.6.6.2 release on the SQLite website?
I'm a real jerk about not accepting changes to the sqlite files, so we only ship what we get from the web site.
I ran my script on Windows XP SP2 with VS 2005 SP1.  Also the script without modifications will not produce an error.  The REM in the script has to be removed.  Profiling on DELETE caused the optimized build to crash.  Profiling on a SELECT caused the optimized build to crash also.
I can't reproduce this with the 3.6.6.2 amalgamation file from the current mozilla source. I suspect we've updated a few times since ChrisB tested this, so perhaps whatever we were hitting has been fixed since then.
Depends on: 475178
it's sad that PGO is turned off for the code that drives the location bar, and coloring visited links, etc.

anyone here capable of figuring why these failures are happening? is there a PGO team we can send stacks to?
Keywords: perf
I don't think they're bugs in the compiler, I think they're probably bugs in the underlying code that just get exposed by PGO. If you'd like to remove all those NO_PROFILE_GUIDED_OPTIMIZE = 1 bits from places/sqlite makefiles on the places branch, and see how it goes, that'd be some useful data. I don't know if we'd hit the crashes on Talos, or if you'd have to download and actually use the builds to hit them.

See also bug 413019 where we had the same problems in pixman/cairo, and it looks like things may have changed since then.
to give a try on the places branch
Attachment #392741 - Flags: review?(ted.mielczarek)
Comment on attachment 392741 [details] [diff] [review]
re-enable pgo for sqlite, storage and places

r=me for places branch landing.
Attachment #392741 - Flags: review?(ted.mielczarek) → review+
checked in 8/6: http://hg.mozilla.org/projects/places/rev/f8a355412abc

no test failures (attributable to this, anyway). no conclusive performance numbers either.
Whiteboard: [tsnap]
We're not actually running unittests on the PGOed builds currently, so it wouldn't affect that. You should run the Windows builds through some manual testing to see if any problems pop up.
(In reply to comment #25)
> We're not actually running unittests on the PGOed builds currently, so it
> wouldn't affect that. You should run the Windows builds through some manual
> testing to see if any problems pop up.

why? how can we trust a pgo build if it doesn't pass unit tests?
bug 486783 enables PGO unit testing. blocking on that.
Depends on: 486783
have opt unit tests on m-c per https://bugzilla.mozilla.org/show_bug.cgi?id=486783#c16

fixed on m-c: http://hg.mozilla.org/mozilla-central/rev/11450d26ff4d
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.