Closed Bug 730548 Opened 13 years ago Closed 13 years ago

Intermittent nightly build failures triggering l10n, where it looks like the only failed step or non-zero exit code is in cleanup old symbols

Categories

(Release Engineering :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: rail)

Details

(Whiteboard: [scheduler])

https://tbpl.mozilla.org/php/getParsedLog.php?id=9557969&tree=Firefox3.6 https://tbpl.mozilla.org/php/getParsedLog.php?id=9598128&tree=Mozilla-Aurora and I think it was five of them, Linux32 and Linux64, on mozilla-central a few days back. Best I can tell, rm_old_symbols is still flunkOnFailure=False, so I don't know why they are red, but I know it's expensive since the fear and uncertainty makes us retrigger (and retrigger and retrigger) them.
(In reply to Phil Ringnalda (:philor) from comment #1) > https://tbpl.mozilla.org/php/getParsedLog.php?id=9637683&tree=Firefox This was a failure in the trigger step, where it tries to set the l10n jobs running after this nightly. The buildbot master's web page for the step says no scheduler: Firefox mozilla-central macosx64 l10n nightly (In reply to Phil Ringnalda (:philor) from comment #2) > https://tbpl.mozilla.org/php/getParsedLog.php?id=9638613&tree=Mozilla-Aurora Similar here, no scheduler: Firefox mozilla-aurora macosx64 l10n nightly and the two logs in comment #0. All four of those failures are from buildbot-master07. philor also noticed that we have mobile desktop jobs scheduled on m-c, on the afternoon of the 24th and 3am on the 25th, despite the builders getting turned off in bug 720774, which makes me wonder if the change there didn't work as expected or the schedulers are in a funky state.
Summary: Intermittent nightly build failures where the only failed step or non-zero exit code is in cleanup old symbols → Intermittent nightly build failures triggering l10n, where it looks like the only failed step or non-zero exit code is in cleanup old symbols
https://tbpl.mozilla.org/php/getParsedLog.php?id=9643514&tree=Firefox no scheduler: Firefox mozilla-central win32 l10n nightly (results: 2, elapsed: 0 secs)
FWIW, using the manhole on the build scheduler, where master.allSchedulers() has the following scheduler: Firefox mozilla-central macosx64 l10n nightly builders: ['Firefox mozilla-central macosx64 l10n nightly'] scheduler: Firefox mozilla-aurora macosx64 l10n nightly builders: ['Firefox mozilla-aurora macosx64 l10n nightly'] so the schedulers exist there, seems plausible. There is something funky going on the masters though. On buildbot-master07 there are 4 non-release l10n schedulers, like this: mozilla-central-android-l10n mozilla-aurora-android-l10n Firefox mozilla-1.9.2 macosx l10n nightly Firefox mozilla-aurora win32 l10n nightly Kinda surprised these are here at all, but if they are then the notable exceptions are the ones in comment #3. Meanwhile on buildbot-master08 there are 13 schedulers, 5 each for m-c and m-a (including the ones we want), plus three for 1.9.2 (ie one per platform). Both of these masters have the same code checked out. I did a reconfig on buildbot-master07 but that hasn't helped at all. Besides, that master has another issue - we're getting lots of exceptions/hour like bug 728104 comment #5. It's possible that's affecting 07 but not 08 because of different releases being configured on each. Rail, got any insight on this ?
I think that this bug is related to bug 728104, at least the master is the same.
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Whiteboard: [scheduler]
Hmm... It looks like it happens only on bm07. If there is no objection I'm going to gracefully restart it tomorrow morning.
Assignee: nobody → rail
I restarted bm07 this EST morning.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.