Multiple Failures on Thunderbird tinderboxes

RESOLVED FIXED

Status

Mozilla Messaging
Server Operations
--
blocker
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: standard8, Assigned: gozer)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

10 years ago
I would raise multiple bugs, but these seem to have all happened at the same time, the list below are what we've got problems with, both on trunk and 1.9.1:

Linux Check
Linux Bloat
Windows Bloat
Mac Bloat (also covered by bug 472864)

On just trunk, mac check is potentially a problem - though it may go green again.

Also Windows check is taking a long time to report back from its trunk build.

Nightly builds are covered by bug 472970 which is a moco permissions problem.

At the moment I'm holding the tree closed as we have no bloat coverage, reduced check coverage
(Assignee)

Comment 1

10 years ago
Allright, status update from the front:

Linux comm-central check : OK
Linux comm-1.9.1 check   : OK
Win2k comm-central check : OK
Win2k comm-central check : OK
(Assignee)

Comment 2

10 years ago
Mac OS X 10.4 comm-central check : OK
Mac OS X 10.4 comm-1.9.1   check : OK
(Assignee)

Comment 3

10 years ago
Win32 comm-central bloat : OK
Mac OS X 10.4 bloat      : OK
Linux comm-central bloat : OK
(Assignee)

Comment 4

10 years ago
Linux on Thunderbird3.0 : GREEN
(Assignee)

Comment 5

10 years ago
Windows on Thunderbird3.0 : GREEN
(Assignee)

Comment 6

10 years ago
Please, close once 

MacOSX 10.4 comm-central bloat build

Turns green, it should do so withing an hour.
(Reporter)

Comment 7

10 years ago
Current status:

- Most builds ok
- "Linux comm-central mozilla-central bloat build" busted. A clobber may fix but there's no irc bot around to do this.
- "MacOSX 10.4 comm-central bloat build" busted. However I think this is just a one-off "it took too long to compile" problem.

Given the mac 1.9.1 bloat issue, and the fact that mac & windows check have long recompiles to do on 1.9.1, and the fact that all these boxes are currently stuck building on trunk (due to non TB checkins), I've completely closed comm-central for a couple of hours to try and let these boxes get over to the 1.9.1 branch.
(Assignee)

Comment 8

10 years ago
The IRC bot for comm-central mozilla-central builds is supposed to be : thunderbuild-trunk

Hrm, the problem is most likely because there has been lots of builds accumulating in the build queues, but on the good side, even though it might indicate 20 builds pending, it really mean only 1-2 builds, as buildbot will/should merge these into single builds, skipping over the queued versions in the middle.
(Assignee)

Comment 9

10 years ago
Clobbered "Linux comm-central mozilla-central bloat build"
(Reporter)

Comment 10

10 years ago
(In reply to comment #8)
> The IRC bot for comm-central mozilla-central builds is supposed to be :
> thunderbuild-trunk
> 
That's not on irc, additionally the buildbot config is possible old:
http://hg.mozilla.org/build/buildbot-configs/file/a4fc7865f94d/thunderbird/config.py#l494

> Hrm, the problem is most likely because there has been lots of builds
> accumulating in the build queues, but on the good side, even though it might
> indicate 20 builds pending, it really mean only 1-2 builds, as buildbot
> will/should merge these into single builds, skipping over the queued versions
> in the middle.

Yeah we can cope now I think

Note that we've been seeing lots of drop offs (ping timeouts) of the irc bots
today since you fixed the main issues. Linux bloat & build have also been
loosing their connections in the middle of builds quite frequently as well.
(Reporter)

Comment 11

10 years ago
I'm also uncertain about "Win2k3 comm-central check" its been building for approx 5 hours 40 mins now which is a little excessive even for a full rebuild.
(Reporter)

Comment 12

10 years ago
Status update:

- "Win2k3 comm-central check" has failed to check in after lots of hours compiling. It hasn't dropped off the buildbot radar.

- "Linux comm-central bloat build" is frequently dropping connections seems to coincide with irc bots having a ping timeout and then reconnecting.

- "Linux comm-central build" also drops connection occasionally, but I think that's on the same VM so not surprising.

- "Linux comm-central mozilla-central bloat build" was still busted after the clobber. Currently Linux & Mac are busted as a result of bug 386676, I'm going to do a trunk build anyway to see if the original bustage was real or not.

There's nothing here that is a real show stopper at the moment, we can cope with missing Windows check but keeping an eye on the SeaMonkey boxes.
(Reporter)

Comment 13

10 years ago
(In reply to comment #12)
> - "Linux comm-central mozilla-central bloat build" was still busted after the
> clobber. Currently Linux & Mac are busted as a result of bug 386676, I'm going
> to do a trunk build anyway to see if the original bustage was real or not.

Local build worked fine. Lets wait till the current bustage is resolved to see what is happening on that box.
(Assignee)

Comment 14

10 years ago
Win2k3 comm-central check`is probably the result of confusion between the buildbot client and server, Ive seen it happen before. The client needs restarting, most likely.

Linux comm-central build is inside the MoCo network
Linux comm-central bloat build is inside the MoMo network

That`s a bit odd that they are experiencing connection issues, as well as with the IRC server. I know MoCo recently performed core router upgrades, so it might have something to do with it.
(Assignee)

Comment 15

10 years ago
Just confirmed that the buildbot client isnt running on the Win32 check box, it should be. The master simply has managed to not notice and get confused as to what its status is.

Unfortunately, I can`t restart it from my current network location, it`ll have to wait for later this evening.
(Reporter)

Comment 16

10 years ago
(In reply to comment #14)
> That`s a bit odd that they are experiencing connection issues, as well as with
> the IRC server. I know MoCo recently performed core router upgrades, so it
> might have something to do with it.

Something I have noticed. As America woke up and checkins increase, the responsiveness of build.mozillamessaging.com has gone down, and I think we're getting more timeouts (the timeouts is more of an instinct).

Mac Check & Mac Bloat are also starting to look like they may have dropped out again (almost 3 hour build times, as they had both just completed a build, I think that's suspect).
(Reporter)

Comment 17

10 years ago
Further to my previous comment, it appears MoCo has network issues (knocked out a switch and a few machines). This probably explains the extra problems.
(Assignee)

Comment 18

10 years ago
That might indicate problems with load/network on the buildbot master, but I am not seeing anything probative from looking at the historical charts.

Looks like the momo-xserve-01, our Apple X-Serve, was rebooted, currenty reporting 6 hours, 51 minutes of uptime, strange. MoCo ?

And as I suspected, buildbot (and the rdp sessions) on the win32 check box were gone/dead. Not sure what's up there, can't remember how to find uptime on win32.
(Assignee)

Comment 19

10 years ago
Win32 unittest builder restarted (comm-1.9.1 and mozilla-c entral)
(Assignee)

Comment 20

10 years ago
OS X unittest builder restarted (comm-1.9.1 and mozilla-central)
OS X bloat    builder restarted (comm-1.9.1 and mozilla-central)
(Reporter)

Comment 21

10 years ago
(In reply to comment #18)
> Looks like the momo-xserve-01, our Apple X-Serve, was rebooted, currenty
> reporting 6 hours, 51 minutes of uptime, strange. MoCo ?

They had a power outage which took out a main switch. Not sure why our xserve was rebooted.

General Update:

- Most builders seem steady and reporting the correct state of the tree.
- Linux * bloat build regularly busted due to connection timeout issues. I'm trying to give the 1.9.1 build a clobber, but the next build it just drops connection which messes it up again. I think this is the same reason for the trunk build being messed up (which I can't clobber).
- irc bots are still dropping off irc.

I think the current state is reasonable and we can live with it until next week if there's no obvious fixes.
(Reporter)

Comment 22

10 years ago
In addition to the current status in comment 21, "Win2k3 comm-central check" seems to have died again (8 hour build at the moment).

Not a significant problem at the moment as we have stable Linux/Mac coverage as well as SeaMonkey's boxes.
(Assignee)

Comment 23

10 years ago
restarted "win2k * check", looks like the VM had crashed/rebooted, and buildbot doesn't start on boot.

restarted "linux * bloat" buildbot clients, just in case, but there is definitely something going on there.
(Reporter)

Comment 24

10 years ago
(In reply to comment #23)
> restarted "win2k * check", looks like the VM had crashed/rebooted, and buildbot
> doesn't start on boot.

Looks fine at the moment :-)
 
> restarted "linux * bloat" buildbot clients, just in case, but there is
> definitely something going on there.

I get the impression that this is more related to the irc bots timeout out and coming back on - when I've done clobber builds its typically around the time of the irc bot dropout that the build will fail.

This implies to me we've got a problem at the master or some connectivity issues somewhere.
(Reporter)

Comment 25

10 years ago
(In reply to comment #24)
> I get the impression that this is more related to the irc bots timeout out and
> coming back on - when I've done clobber builds its typically around the time of
> the irc bot dropout that the build will fail.
> 
> This implies to me we've got a problem at the master or some connectivity
> issues somewhere.

I've just been looking at other bugs, bug 470462 Setup VMWare reservations for buildbot master VMs - I'm not sure what one is, or whether buildbot master runs in a VM, but it might help!
(Assignee)

Comment 26

10 years ago
I suspect part of the problem was because of the master having been reconfigured a lot of times and not restarted.

I've restarted it cold this evening, and I am looking at it still.

On the funny side of things, the win2k * check box is apparently blue screening
during the builds...

http://imagebin.ca/view/ezMqDY5.html
(Reporter)

Comment 27

10 years ago
(In reply to comment #26)
> I suspect part of the problem was because of the master having been
> reconfigured a lot of times and not restarted.
> 
> I've restarted it cold this evening, and I am looking at it still.

irc bots seem stable now.

Mac bloat timed out (both on trunk & 1.9.1) and messed up builds I've queued up clobbers so they should go green again.

> On the funny side of things, the win2k * check box is apparently blue screening
> during the builds...

I think it happened again today - its failed to report in again.
(Assignee)

Comment 28

10 years ago
That windows box needs to be replaced with a freshly imaged one. Will do on monday.
(Assignee)

Comment 29

10 years ago
Win32 * check box is up, running, building and checking.

For some reason, the test run in mozilla/toolkit/crashreporter/test returns a non-zero status, even though no failing tests are reported:

make[4]: Leaving directory `[objdir]/mozilla/toolkit/crashreporter/test'
make[3]: Leaving directory `[objdir]/mozilla/toolkit/crashreporter'
make[2]: Leaving directory `[objdir]/mozilla/toolkit'
make[1]: Leaving directory `[objdir]/mozilla'
make[4]: *** [check] Error 1
make[3]: *** [check] Error 2
make[2]: *** [check] Error 2
make[1]: *** [check] Error 2
make: *** [check] Error 2

[full buildbot log is here: <http://build.mozillamessaging.com/buildbot/production/builders/MacOSX 10.4 comm-central bloat build/builds/3068/steps/compile/logs/stdio>]
(Assignee)

Comment 30

10 years ago
Wrong link in comment #29, should have been <http://build.mozillamessaging.com/buildbot/unittest/builders/Win2k3 comm-1.9.1 check/builds/161/steps/check/logs/stdio>
(Assignee)

Comment 31

10 years ago
Further work on general stability will be hapenning in bug 474600
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.