Closed Bug 710168 Opened 13 years ago Closed 12 years ago

Do PGO builds only when needed (or skip them when not needed)

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 691675

People

(Reporter: mak, Unassigned)

Details

(Whiteboard: [pgo])

Since bug 709192 has been correctly resolved, I'm moving here my suggestion.

The idea is that PGO builds are only needed when there are changes to compiled files or configure/makefiles. In our system there are a lot of pushes that only include changes to scripts, pages, images or java files, these never need a PGO pass.
So, rather than acting on a timer, buildbot could examine the push changed files (eventually using hg log with --template '{files}') pass them through regexes and evaluate if PGO is needed.

It is usually not needed for changes to (The list can easily be improved looking at pushes)
\.js$
\.jsm$
\.sjs$
\.java$
\.xul$
\.xml$
\.xhtml$
\.htm.?$
\.woff$
\.png$
\.jpg$
\.gif$
\.ico$
\.bmp$
\.py$
\.dtd$
\.properties$
\.txt$
\.ini$
\.list$
\.rdf$
\.css$
.*\/tests?\/.*\/Makefile\.in$
(In reply to Marco Bonardo [:mak] from comment #0)
> \.py$

Things like cl.py can affect it.

> .*\/tests?\/.*\/Makefile\.in$

Assuming you meant two lines here (so Makefile.in is a separate do-not-build)

I would agree on the "usually" but would not agree that it is not important, since if you do change DEFINES, or flags etc. in a Makefile, you can easily change what affects the profile.
(In reply to Justin Wood (:Callek) from comment #1)
> (In reply to Marco Bonardo [:mak] from comment #0)
> > \.py$
> 
> Things like cl.py can affect it.

well, the regexp can be tweaked, could even not skip on py, there aren't many.

> > .*\/tests?\/.*\/Makefile\.in$
> 
> Assuming you meant two lines here (so Makefile.in is a separate do-not-build)

I meant that changes to Makefiles should be considered skippable only if included in test folders, that's since adding a browser-chrome test for example, should not require PGO, but more generically changes to makefiles are important, as you said.
Don't forget, while our memory of PGO build failures is strong, that we have a whole other category of PGO-related failures, where the fact that PGO builds run faster leads to timing-related test failure that isn't seen running tests on non-PGO and debug builds.
How does that change from the current situation where we do a timed PGO? The only solution to your point would be to always PGO everything, but that has already been discarded for load reasons.
I'm just trying to remove some unpredictability from the process, rather than making it random and miss actual changes to the code, make it based on the changes. Surely it won't be the perfect fix, but that's to be expected by the unpredictability of the PGO process itself.
Is there consensus here on what you want us to do?
Severity: blocker → major
Priority: -- → P3
Whiteboard: [pgo][triagefollowup]
(In reply to Chris Cooper [:coop] from comment #5)
> Is there consensus here on what you want us to do?

Well, releng feedback would largely be appreciated, we don't manage the machines and load everyday.
Before doing a physical experiment, we may collect some stats from the last days and see how number of PGOs would change. I suspect we'd end up doing more, looking at today's pushes.
Okay, I was slow on the uptake, didn't realize this was a way for us to maybe sneak more PGO builds out of releng than they want to give us :)

But, if we could have relatively more PGO as long as we don't increase the total load, how about rather than trying to guess which filetypes would be affected (and guessing wrong, we just the other day merged PGO-only test failures over from Fx-Team, because it doesn't do PGO at all and had never seen that fast a build before), we instead look at the directories, and don't do any desktop builds at all on pushes that only touch /mobile/ and /widget/android/ and /other-licenses/android/? There's far more wasted runs to be saved that way than there are by guessing what things will and won't be PGOed.
(In reply to Marco Bonardo [:mak] from comment #6) 
> Well, releng feedback would largely be appreciated, we don't manage the
> machines and load everyday.
> Before doing a physical experiment, we may collect some stats from the last
> days and see how number of PGOs would change. I suspect we'd end up doing
> more, looking at today's pushes.

My 2 cents: I would actually prefer to run both PGO and non-PGO on every check-in. The rest of releng may have other ideas.

I think we have the buildslave capacity to do that now. Our wait times for build/try rarely fall below 95% starting within 15 minutes of check-in.

Testslave capacity is another matter entirely.

Recall that this entire process was started because we didn't think we were hitting any PGO-specific errors. Running PGO and non-PGO for everything seems more sane to me than trying to determine which check-ins need PGO and which don't. Devs get non-PGO result "quickly" and PGO would still be the long pole.
Whiteboard: [pgo][triagefollowup] → [pgo]
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.