Closed Bug 418865 Opened 17 years ago Closed 17 years ago

turn on profile-guided optimization on fx-win32-tbox

Categories

(Release Engineering :: General, defect, P1)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: ted)

References

Details

Attachments

(2 files, 1 obsolete file)

Patch shortly.
Flags: blocking1.9?
Attached patch enable pgo on fx-win32-tbox (obsolete) — Splinter Review
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
From offline discussions with Damon, RobSayre, and then Ted:

1) This is being proposed for nightly/clobber builds, as well as hourly/incremental builds as well as release builds.

2) There are similar changes required for Mac. Ted is filing a bug for that.

3) This double-compile process will take extra time to cycle a build; seems like linux would be 2x, win32 would be just under 2x. This will need to be communicated to everyone watching Tinderbox.

4) We are not blocked on bug#418896 or bug#418894.

Flags: blocking1.9+ → blocking1.9?
Priority: P1 → --
setting to P1 as this is needed for beta4.
Attachment #304839 - Flags: review?(ccooper)
Comment on attachment 304839 [details] [diff] [review]
enable pgo on fx-win32-tbox

Adding rhelmer to see if we can get someone to look at it this weekend ...
Attachment #304839 - Flags: review?(rhelmer)
Comment on attachment 304839 [details] [diff] [review]
enable pgo on fx-win32-tbox

r+a=beltzner

The actual meat of this was reviewed by bsmedberg in bug 361343, this is a simple two line patch to turn on the generation of the profile script and then flip the switch.
Attachment #304839 - Flags: review?(rhelmer)
Attachment #304839 - Flags: review?(ccooper)
Attachment #304839 - Flags: review+
Attachment #304839 - Flags: approval1.9+
Flags: blocking1.9? → blocking1.9+
I turned this on to get immediate feedback.

Checking in CLOBBER;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/CLOBBER,v  <--  CLOBBER
new revision: 1.69; previous revision: 1.68
done
Checking in mozconfig;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/mozconfig,v  <--  mozconfig
new revision: 1.18; previous revision: 1.17
done
Checking in tinder-config.pl;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/tinder-config.pl,v  <--  tinder-config.pl
new revision: 1.28; previous revision: 1.27
done
I had to back this out because of these errors:
<http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203809580.1203817613.29625.gz&fulltext=1#err7>
So I see this:
e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\obj-fx-trunk\dist\include\string\nstsubstring.h(213) : error C2220: warning treated as error - no 'executable' file generated
e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\obj-fx-trunk\dist\include\string\nstsubstring.h(213) : warning C4952: 'nsACString_internal::Last' : no profile data found in program database 'xul.pgd'

And then:
79421 of 82043 functions (96.8%) were optimized using profile data
448151847 of 459075423 instructions (97.6%) were optimized using profile data
LINK : fatal error LNK1257: code generation failed

All the previous errors are a normal part of compiling the CRT, unfortunately (bug 416125 is filed on that).

I have no idea what those errors are from, and Google isn't being terribly helpful.  This VM didn't run out of memory or something like that, did it?

(In reply to comment #8)

> I have no idea what those errors are from, and Google isn't being terribly
> helpful.  This VM didn't run out of memory or something like that, did it?
> 

That's probably a good guess.  How much ram does the box have?
The RAM allocated to the VM was bumped to 2GB early in February (there was some swapping after WPO was turned on, other win32 VM's have 1GB). The VM management software says that memory usage hasn't gone over 65% in the last day, with time resolution of 5 minutes. There's also up to 1536MB swap space available.

Let us know if you need further info, or someone to monitor the box in realtime.


We may be able to toss -VERBOSE into LDFLAGS, not sure if it will give us any useful info though:
http://msdn2.microsoft.com/en-us/library/wdsk6as6(VS.80).aspx
I think we should try this again with -VERBOSE and see what happens. Because of the impending freeze, we should either
a) Try this on wednesday after the freeze or
b) Get a clone of fx-win32-tbox to play with.

I don't know what 
1) our VM resources look like currently (do we have room to make a clone?), and 2) what Build's resource allocation looks like, since we'd need a few hours of someone's time to tweak the clone so it didn't interfere with the original.

Justin: can you speak to 1), and joduinn, can you speak to 2)?
From comment#10, it sounds like the VM should be good enough. 

From comment#7, comment#8, do we know if this has ever worked? Before we go down the path of building more VMs and tweaking build infrastructure, I'm trying to figure out if this is some weird discrepancy between the build machines and developer machines that needs to be debugged? Or if we are trying something here for the first time?
(In reply to comment #13)
> From comment#10, it sounds like the VM should be good enough. 
> 
> From comment#7, comment#8, do we know if this has ever worked?

Yes, we've done it.
We have capacity on the machine hosting fx-win32-tbox without hitting our limits to preserve performance. mrz also has a tool that can hot-clone a VM, so we can clone without downtime (perhaps just a slight slowdown). So this is doable, but we'd Rob/Coop to sign up for the config, or I can do it tomorrow morning.
* push nightlies to experimental/pgo 
* do not push updates
* do not push symbols

I think that this would be safe to use on a clone of fx-win32-tbox, as long as it has a unique hostname.
Attachment #305597 - Flags: review?(nrthomas)
Depends on: 419530
Comment on attachment 305597 [details] [diff] [review]
[checked in] config for fx-win32-tbox clone

r+, but should also do
  $BuildTree  = 'MozillaExperimental';
  $BuildNameExtra = 'PGO';
Attachment #305597 - Flags: review?(nrthomas) → review+
Comment on attachment 305597 [details] [diff] [review]
[checked in] config for fx-win32-tbox clone

Checked into test_pgo branch with Nick's changes:

Checking in tinder-config.pl;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/tinder-config.pl,v  <--  tinder-config.pl
new revision: 1.29.2.1; previous revision: 1.29
done
Attachment #305597 - Attachment description: config for fx-win32-tbox clone → [checked in] config for fx-win32-tbox clone
Once this VM is cloned (bug#419530), it should show up on MozillaExperimental. 

Ted will use this VM to debug  https://bugzilla.mozilla.org/show_bug.cgi?id=418865#c7 and https://bugzilla.mozilla.org/show_bug.cgi?id=418865#c8.
Just wanted to update, I tried another build on my machine trying to minimize the differences between the nightly mozconfig and mine, and my build was still successful, so there's definitely some difference between my setup and the tinderbox, but I don't know what it is.
Looks like tete (who does custom builds, including PGO sometimes) hit this too:
http://tete009.seesaa.net/article/73848584.html#A73848584

I need a better Japanese translation to figure out if he actually has a workaround though. :)
We may need to disable PGO in sqlite even if we get this working, it sounds like it can cause dataloss:
http://forums.mozillazine.org/viewtopic.php?p=3271149#3271149
kohei translated the pertinent parts of tete's post for me, and apparently he was able to build successfully by adding -wd4624 and -wd4952 to his CFLAGS/CXXFLAGS. I think we should try that on the next pass.
This includes the warning disabling flags mentioned above.
Attachment #304839 - Attachment is obsolete: true
Attachment #306007 - Flags: review?(benjamin)
Attachment #306007 - Flags: review?(benjamin) → review+
Comment on attachment 306007 [details] [diff] [review]
enable pgo, take 2

Looking to re-land this today.
Attachment #306007 - Flags: approval1.9b4?
sayrer has a patch to turn off PGO in a few places where it exposes bugs (sqlite/storage/places), want that first.
Blocks: 419893
Comment on attachment 306007 [details] [diff] [review]
enable pgo, take 2

a1.9b4=beltzner, suggest waiting for all trees to cycle green before landing just to make sure we get a baseline reading.

(sayrer, ted to co-ordinate landing)
Attachment #306007 - Flags: approval1.9b4? → approval1.9b4+
Depends on: 419905
Checking in tools/tinderbox-configs/firefox/win32/CLOBBER;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/CLOBBER,v  <--  CLOBBER
new revision: 1.71; previous revision: 1.70
done
Checking in tools/tinderbox-configs/firefox/win32/mozconfig;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/mozconfig,v  <--  mozconfig
new revision: 1.21; previous revision: 1.20
done
Checking in tools/tinderbox-configs/firefox/win32/tinder-config.pl;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/tinder-config.pl,v  <--  tinder-config.pl
new revision: 1.30; previous revision: 1.29
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Discovered  this today on fx-win32-tbox, while investigating trunk tree closure. Is this a PGO problem or unrelated?

14:35:00 		* L/ C

[nthomas@mozilla.com - 2008/02/28 14:40:22]
No pgc files in
objdir/dist/bin, so the second
build is pointless. Kicked
tinderbox.

[nthomas@mozilla.com - 2008/02/28 14:30:40]
Hung at:
OBJDIR=obj-fx-trunk python
obj-fx-trunk/_profile/pgo/profileserver.py

firefox.exe in process list
but not on taskbar. Killed
firefox.exe:

Application pid: 7328
FAIL Exited with code 1 during
test run
fx-win32-tbox.build.mozilla.org -
- [28/Feb/2008 11:33:33] "GET
/index.html HTTP/1.1" 200 -
fx-win32-tbox.build.mozilla.org -
- [28/Feb/2008 11:33:33] "GET
/quit.js HTTP/1.1" 200 -
fx-win32-tbox.build.mozilla.org -
- [28/Feb/2008 11:33:34] code
404, message File not found
fx-win32-tbox.build.mozilla.org -
- [28/Feb/2008 11:33:34] "GET
/favicon.ico HTTP/1.1" 404 -

It continued to the second
build, which seems wonky.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #29)
> Discovered  this today on fx-win32-tbox, while investigating trunk tree
> closure. Is this a PGO problem or unrelated?

It is a bug in the script that it didn't check the error code. See bug 420187.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
So the build scripts should check for "FAIL Exited with code 1 during test run" ? I'm happy to file a bug in ReleaseEngineering for this, but I believe it means we'll see more frequent tree closures until bug#420187 is fixed...


Depends on: 421067
bug 421067 is not related. I've seen it on mac.
No longer depends on: 421067
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: