Closed Bug 1252556 Opened 8 years ago Closed 5 years ago

Failure to profile application under profileserver.py went undetected

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox66 fixed)

RESOLVED FIXED
Tracking Status
firefox66 --- fixed

People

(Reporter: gps, Assigned: glandium)

References

Details

Attachments

(1 file)

As part of working on VS 2015 support, I produced a build that crashed on startup.

To my surprise, profileserver.py nor any other part of the build complained that firefox.exe crashed on startup! No .pgc files were produced and the 2nd pass of the linking continued as if everything were OK. link.exe did complain about missing .pgc files, however. But it wasn't a fatal error.

As a double whammy, the 2nd link.exe took longer to complete than a typical PGO 2nd link! I reckon the reason is it was performing whole program optimization over all of the code as opposed to targeting the parts the PGO profile data says to target.

I think the build system should detect incomplete PGO profiling runs and fail fast.
This doesn't block VS2015 landing.
No longer blocks: vs2015
Note the patch can't land until bug 1413570 is fixed.
Assignee: nobody → mh+mozilla
Depends on: 1413570
Comment on attachment 8924362 [details]
Bug 1252556 - Make profileserver.py fail when running Firefox fails.

https://reviewboard.mozilla.org/r/195614/#review200782
Attachment #8924362 - Flags: review+
As a followup, we should figure out how to get useful stacks out if Firefox crashes during the profiling run. We should get a minidump, the problem has historically been that we don't have symbols. We could probably just forcibly run `make syms-recurse` from this script and then use mozcrash to look for minidumps, since the build is going to fail at that point anyway.
Comment on attachment 8924362 [details]
Bug 1252556 - Make profileserver.py fail when running Firefox fails.

https://reviewboard.mozilla.org/r/195614/#review201130
Attachment #8924362 - Flags: review+
Attachment #8924362 - Flags: review?(core-build-config-reviews)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6)
> As a followup, we should figure out how to get useful stacks out if Firefox
> crashes during the profiling run. We should get a minidump, the problem has
> historically been that we don't have symbols. We could probably just
> forcibly run `make syms-recurse` from this script and then use mozcrash to
> look for minidumps, since the build is going to fail at that point anyway.

Well, the patch as attached reveals that linux builds are intermittently crashing at shutdown during the profiling, at a relatively high rate. So the patch can't land as long as that's not fixed, and I spent a lot of time yesterday trying to get something out of loaners, but that was a long exercise of frustration (see bug 1413823 for some of it), culminated by 2FA fun (bug 1413855). I also tried on EC2, using the same docker image, and couldn't reproduce there...

As long as loaners can't allow us to get a core, that followup would be the only way to figure out where exactly we're crashing on shutdown. But I spent already too much energy on this. I'm not going to spend more on this in the short term, sorry.
Assignee: mh+mozilla → nobody
Product: Core → Firefox Build System
This was raised indirectly in http://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html:

> Issue I run into this time is bit subtle. It is not visible during testing but it can be seen as following message in the build log:
>
>     MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /aux/hubicka/firefox-2018/release/toolkit/components/terminator/nsTerminator.cpp:219

which may be at least one part of the problem. Unfortunately, the situation with loaners is still not satisfying to debug things. That said, disabling the shutdown hang crash on profile-generate builds could go a long way.
So here's something funny... I've done 40 PGO builds on try with the patch from this bug attached, and only one failed... on the X11 server not being available ; so, on something unrelated to the shutdown hang thing.

Something obviously changed since last year, since the failure rate was much higher back then. I wonder if it's not "just" that I/O got better on automation in between.

It feels like we should land the existing patch, and deal with any problem that arises in followups.
Assignee: nobody → mh+mozilla
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4d5d46cf428
Make profileserver.py fail when running Firefox fails. r=chmanchester
https://hg.mozilla.org/mozilla-central/rev/e4d5d46cf428
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: