Closed Bug 522115 Opened 10 years ago Closed 10 years ago

Get crash stacks on Leak & Bloat Builders

Categories

(Mozilla Messaging :: Release Engineering, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: standard8, Assigned: gozer)

References

Details

Attachments

(4 files, 2 obsolete files)

We currently have a fairly frequent crasher on Linux trunk. We've got no output as to what it is so we should enable crash stacks on the builds.

Interestingly Firefox doesn't have these yet (bug 493237), but looking at what has been done for unit tests it should actually be quite easy to do.
Ok, I was going to attach a patch on this, but looking at the log again:

http://tinderbox.mozilla.org/showlog.cgi?tree=Thunderbird&errorparser=unix&logfile=1255444707.1255445607.31427.gz&buildtime=1255444707&buildname=Linux%20comm-central%20bloat&fulltext=1

I think it is a simpler matter that what I originally thought. Will think about it again soon.
Attached patch runtest.py fix (obsolete) — Splinter Review
Ok, I think I worked this out. This patch borrows the methods used in automation.py [1] to detect if a crash has occurred (by looking for minidump files) and process accordingly.

As the minidump files are stored in the profile then we already remove the profile at the start of each run, so they will get removed then.

[1] http://mxr.mozilla.org/comm-central/source/mozilla/build/automation.py.in#489
Attachment #406192 - Flags: review?(gozer)
Attached patch buildbot changes (obsolete) — Splinter Review
This is the second part of the changes. The changes here are:

- Enable crash reporter/breakpad on the builds. This lets breakpad do the work of catching the crashes and forming the dumps (rather than the OS specific behaviour). Crashes won't attempt to be sent (i.e. no crash reporter window) because MOZ_CRASHREPORTER_NO_REPORT=1 is set in the environment.

- Switch the bloat error parser to unittest. In the previous patch I've used some TEST-UNEXPECTED-FAIL as that brings us more into line with FF's leak & bloat test output (and checkForCrashes will output a TEST-UNEXPECTED-FAIL if it detects a crash).

- Adds a make buildsymbols steps for leak & bloat tests. This provides the symbols for the stack reporter to look up.

- Sets MINIDUMP_STACKWALK to the minidump_stack executable location. This is already available on the installations as it is downloaded into the <builder>/tools directory.

- If runtest.py fails, I've changed it so that buildbot won't continue to try to run the bloat log output/comparison (haltOnFailure=True). This will make it so that if we crash one cycle and pass the second time, then the old logs will be valid from the cycle previous to the crash, and we won't fail the second time round.

- I've also set that if runtest.py fails then we'll warn (warnOnFailure=True). This seems more appropriate as the test has failed, not the build. I'm also going to update the runtest.py patch in a moment so that it will tinderboxprint on a crash (to make it even more obvious) - As we're doing runtest.py from a ShellCommand I can't think of an easy way of tinderbox printing from there.
Assignee: nobody → bugzilla
Status: NEW → ASSIGNED
Attachment #406196 - Flags: review?(gozer)
Minor change to this patch as mentioned in my previous comment - add a tinderbox print statement to the output if we crash.

I've tested this using "mkdir <profileloc>/minidumps && touch <profileloc>/minidumps/test.dmp" and it seems to work fine, obviously I didn't get a crash output, but that should work (as that part is automationutils.py) once we get everything else in place.

I also forgot to mention that (obviously) I haven't tested the buildbot changes yet, and the two patches should be able to be applied independently of each other (although getting runtest.py in first may be preferable).
Attachment #406192 - Attachment is obsolete: true
Attachment #406198 - Flags: review?(gozer)
Attachment #406192 - Flags: review?(gozer)
Comment on attachment 406196 [details] [diff] [review]
buildbot changes

patch applied in staging, let's see what breaks. Overall, looks good with only a few fixups/typos.
(In reply to comment #5)
> (From update of attachment 406196 [details] [diff] [review])
> patch applied in staging, let's see what breaks. Overall, looks good with only
> a few fixups/typos.

The staging build worked fine. gozer said on irc that the build symbols steps takes 20 minutes or more. I'm guessing this is because we're running it over a debug build.

As I don't really want to flip these to release at the moment, my proposal is that we take the patches as they are, but when we have an intermittent crasher that we want to track down, we can enable buildsymbols on that platform/branch and then turn it off again once we've got a decent stack.

The rest of the patches will still provide saner crash handling than we already have which would be a bonus anyway.
Attachment #406198 - Flags: review?(gozer) → review+
Comment on attachment 406198 [details] [diff] [review]
[checked in] runtest.py fix v2

Checked in: http://hg.mozilla.org/comm-central/rev/752b3539d12b
Attachment #406198 - Attachment description: runtest.py fix v2 → [checked in] runtest.py fix v2
Over to gozer for sorting out the buildbot side.
Assignee: bugzilla → gozer
New patch that fixes a few typos in the original one.
Attachment #406196 - Attachment is obsolete: true
Attachment #406196 - Flags: review?(gozer)
Comment on attachment 407429 [details] [diff] [review]
[checked in] buildbot changes v2

changeset:   1636:139aeb353846
tag:         tip
user:        Philippe M. Chiasson <gozer@mozillamessaging.com>
date:        Tue Oct 20 19:51:12 2009 -0400
summary:     Bug 522115. Get crash stacks on Leak & Bloat Builders. p=Standard8 r=gozer
Attachment #407429 - Attachment description: buildbot changes v2 → [checked in] buildbot changes v2
production buildbot master restarted
I've just forced a few additional builds and we had a build crash. It proved that the haltOnFailure change worked.

I believe that the builds haven't picked up the mozconfig change as there was still the old stack dump and no report from runtest.py about the crash - therefore I've clobbered the builds and forced new ones and we'll see how it goes.

We also need one additional buildbot patch, I'll attach that in a moment.
Ok, although haltOnFailure worked, the whole build still got marked as busted even though I wanted it to be orange.

I've taken a look at it appears the ShellCommand step defaults to flunkOnFailure = True which means it marks the whole build as busted/failing. Therefore we should set it to False (just like AliveTest does in buildbotcustom/steps/test.py) and then we'll hopefully turn orange (test failed) rather than red (build step failed) if we crash/fail to complete runtest.py.
Attachment #407492 - Flags: review?(gozer)
Ok, so now we've clobbered, Tinderbox reports "bloatTest CRASH" :-) and we find the minidump and try to process it.

However, runtest.py is actually giving checkForCrashes the wrong path - automation.SYMBOLS_PATH is "../dist/crashreporter-symbols", hence assuming the working directory is <objdir>/mozilla/build (or something like that).

We can easily fix this in our script by merging the path to the automation script with the path supplied by automation.SYMBOLS_PATH.

I've tested the script gives the correct location of (<objdir>/mozilla/dist/crashreporter-symbols) by using a print and an exit. I've also checked that the symbols are going into that directory by taking a quick look on one of the windows boxes.
Attachment #407494 - Flags: review?(gozer)
Attachment #407492 - Flags: review?(gozer) → review+
Attachment #407494 - Flags: review?(gozer) → review+
Attachment #407492 - Attachment description: runtest.py step should flunkOnFailure → [checked in] runtest.py step should flunkOnFailure
Attachment #407492 - Attachment is obsolete: true
Attachment #407492 - Attachment is obsolete: false
Comment on attachment 407494 [details] [diff] [review]
[checked in] Fix symbols path for checkForCrashes

Checked in: http://hg.mozilla.org/comm-central/rev/83e8ff7342e8
Attachment #407494 - Attachment description: Fix symbols path for checkForCrashes → [checked in] Fix symbols path for checkForCrashes
We've now had our first orange crash cycle. Previous cycles have shown the other changes in this bug have worked. We also have symbols for the stacks. The current crash stacks are pointing to be bug 521549. I'm going to be updating it in a while.
Target Milestone: --- → Thunderbird 3.0rc1
Blocks: 523685
This is now fixed :-)

I've filed bug 523685 as the spin-off for determining what we do with the long build_symbols step.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
No longer blocks: 520707
You need to log in before you can comment on or make changes to this bug.