Status

task
RESOLVED FIXED
11 years ago
4 years ago

People

(Reporter: armenzg, Assigned: reed)

Tracking

Details

Like in bug 467922, we are having random drops.

Can we please remake the tree?
I'll save the entire tree so cls can help debug.
Assignee: server-ops → reed
Component: Server Operations: Tinderbox Maintenance → Server Operations
[cls@vortex MozillaStaging]$ head -1 build.dat 
1196846220|1196846220|Linux staging-prometheus-vm Depend Fx-Nightly|unix|building|1196846220.1196846243.24628.gz|
[cls@vortex MozillaStaging]$ perl -e '$t=localtime(1196846220); print "$t\n";'
Wed Dec  5 01:17:00 2007
[cls@vortex MozillaStaging]$
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 225735
Nevermind. I'm checking the wrong end of the file.  Too many balls in the air.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Wrong test; right conclusion.  

If you look at the admin page, none of the builds are marked as "Current".  That's the first clue.  Adding 'print STDERR' to load_buildlog in tbglobals.pl confirms it. Plus below:

[cls@vortex MozillaStaging]$ tail -20 build.dat | awk -F\| '{ print $2 "--" $3 }' | perl -e 'while(<STDIN>) { chomp; ($t,$n) = split(/--/,$_); $p=localtime($t);  print "$p -- $n\n";}'
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
Thu Dec  4 10:55:28 2008 -- Linux mozilla-central nightly
Thu Dec  4 11:33:00 2008 -- WINNT 5.2 mozilla-central nightly
[cls@vortex MozillaStaging]$
cls ran tinderbox's clean.pl script against a copy of MozillaStaging, and that resolved the problem with the broken waterfall. reed then ran that against tinderbox.m.o's tree and it loaded quickly with recent builds shown. Armen could confirm if the builds reflect reality or not, but it looks plausible to me.

clean.pl removes old builds from build.dat as well as old build logs, while the existing tidy mechanism was a cron job deleting build logs modified more than 60 days ago. Switching cron to clean.pl will cap the waterfall display to 60 days rather than <a long time ago>, but I think it's worth it for the responsiveness of the page load. Some 570k builds got removed from MozillaStaging, and I bet Firefox and Firefox3.0 trees would load much faster after tidying. Will post to the newsgroups to make sure no-one cares about older history.

That still leaves how the tree got horked in the first place, which comes back to the issue that tinderbox has with old mail (bug 225735). Possibly there are (or were) incorrect clocks on some build slaves, or the timestamp is being calculated incorrectly when sending tindermail. We'd have to diff build.dat against an older copy of itself to trace that. Can I ask you do that reed ? If we use cleanup.pl then at least broken trees are resets to a working state once a day.
Just in case I don't read that newsgroup: I fairly often do care about older history (along the lines of "oh, crap, how long have we been having that error in uploading symbols and what might have happened the day it started?"), though I also quite often fail to find it because the tree was renamed, or the machine was renamed, or both, or something else that's not obvious.
Post is http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/61bf4585c4fafc40#

(In reply to comment #6)
So you'd care about the 60 day limit for logs then ? I'm suggesting limiting the waterfall to the same value.
Apparently my memory of what I've done is totally untrustworthy. Someone might care about the loss of starred builds, I guess, but what I thought I've done seems to have never been possible.
From looking at:
http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaStaging&maxdate=1237930839&legend=0&norules=1

I can see these columns:
Linux mozilla-1.9.1 l10n %
WINNT 5.2 mozilla-central l10n %
linux_l10n_nightly %
macosx_l10n_nightly %
win32_l10n_nightly %

Now I can see more columns than I used to do when I filled the bug.

Thanks for putting this into a better shape but do we know a reason of why we are missing the following builders:
       1.9.1 moz-central
Linux   YES     NO
Win32   NO      YES
Mac     NO      NO

BTW, I have noted that the tinderbox pages do not "show the last 12 hours" but certain amount of rows and since the L10n builds are as many as 70 it feels quite fast
There's been no feedback on the newsgroups. I think we should go ahead with modifying the cron job, with backups of */build.dat if you feel so inclined.
There are no builds on MozillaStaging right now, should at least show a 1.9.1 l10n build which started at Mon Mar 30 11:31:39 2009 (from our staging system) and finished at 11:35:45. There are no errors about sending mail in the log.
The staging environment has been reporting to MozillaTest instead of MozillaStaging. I do not know if this has been changed since last time I used it.
Looks like MozillaStaging in the TinderboxMailNotifier setups for both
 staging-master:/builds/buildbot/moz2-master/master-main.cfg
 staging-1.9-master:/builds/buildbot/staging-trunk-master/master.cfg
(In reply to comment #10)
> There's been no feedback on the newsgroups. I think we should go ahead with
> modifying the cron job, with backups of */build.dat if you feel so inclined.

Done.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.