Closed
Bug 652812
Opened 14 years ago
Closed 14 years ago
builds and tests disappear from tbpl when complete
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mak, Assigned: bear)
References
Details
(Whiteboard: [buildduty])
Once builds or tests are done, they disappear from the Places tbpl, Bear said he has a clue on why this happens and will look into it shortly.
Assignee | ||
Comment 1•14 years ago
|
||
The two jobs that I thought might be impacting this are now clear - has it helped?
Reporter | ||
Comment 2•14 years ago
|
||
I see all the results now, so it did! thank you.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•14 years ago
|
||
Sorry, this is happening again today :(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 4•14 years ago
|
||
yep, and we have slow Places builds in the queue - my hunch is that tbpl is not getting all of the information (or out of order) and is hiding these until the jobs starts
Reporter | ||
Comment 5•14 years ago
|
||
(In reply to comment #4)
> is hiding these until the
> jobs starts
Not sure if I understand this correctly, but the builds are visible while they run, and disappear only when are completed.
The old Tinderbox page shows everything correctly.
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
> (In reply to comment #4)
> > is hiding these until the
> > jobs starts
>
> Not sure if I understand this correctly, but the builds are visible while they
> run, and disappear only when are completed.
> The old Tinderbox page shows everything correctly.
sorry - that's me typing in one bug while having two irc conversations and watching another.
While I got the details all flumoxed, my hunch is basically that tbpl has the Places config wrong so that the jobs are falling off once they change state.
Comment 7•14 years ago
|
||
Lets CC some TBPL people.
Comment 8•14 years ago
|
||
I see some of the boxes report the long revision id.
We are regex-ing for the short (12 char long) in tbpl: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/11e960a9afcc/js/TinderboxJSONUser.js#l79
So either truncating the scrape output on the slave or we need to update tbpl for that case and do the truncation there.
Note we use the short rev internally.
Comment 9•14 years ago
|
||
I would be very surprised if there was any difference between Places and other branches for the length of the revision string. Where does tbpl get the data for finished jobs these days ?
Comment 11•14 years ago
|
||
The scrape (TinderboxPrint) from tinderbox json like before. The pending/running come from the builds-pending/running.js from the build server, thats why they are displayed and the finished ones aren’t.
Here is an example scrape from Linux QT places build:
L
s: linux-ix-slave14
rev:5f97ac170bfaea43eb5cef7e05975c4edc5957d0
check
9930/0
Zdiff:+9568 (+9589/-21)
Z:31.8MB
Tbpl is choking on the long rev id there.
Comment 12•14 years ago
|
||
I doubt that's it if tbpl handles all branches the same. Here's what we have for 'Linux QT mozilla-central build':
L
s: linux-ix-slave18
rev:e0a879ad7a4df47a5251e3223a5631bfca17ed76
check
9930/0
Zdiff:-19264 (+19563/-38827)
Z:31.9MB
Reporter | ||
Comment 13•14 years ago
|
||
The builds just reappeared, as the last time... Is it possible it takes like 1 hour before completed builds are reported to tbpl?
Comment 14•14 years ago
|
||
(In reply to comment #12)
> I doubt that's it if tbpl handles all branches the same. Here's what we have
> for 'Linux QT mozilla-central build':
>
> L
> s: linux-ix-slave18
> rev:e0a879ad7a4df47a5251e3223a5631bfca17ed76
> check
> 9930/0
> Zdiff:-19264 (+19563/-38827)
> Z:31.9MB
Oh yes, my bad. Of course the regex matches, but only the first 12 chars, so the truncation begins there.
Maybe there is a delay somewhere as Marco suggests, the Places tinderbox page says (2011-04-28 17:47 PDT) while it really is 23:32 if I did the timezone dance correctly.
(note that this is the date of the html tinderbox page, I have no idea when the json was updated)
Reporter | ||
Comment 15•14 years ago
|
||
yes the path seems that one, they appear while running, disappear when complete and after a lot of time (hours) they appear again.
Reporter | ||
Comment 16•14 years ago
|
||
Today things are working correctly (this morning there was still a delay of like 15 minutes, while now it's working with almost no delay). Really I have no idea what's up :(
Comment 17•14 years ago
|
||
I just watched an episode of what's very likely to be the same thing, on mozilla-central and mobile, and I'd bet that it's actually "Tinderbox is either just generally hosed, or it is having trouble getting mail."
The "I see them running, then they disappear, then eventually they show up as finished" symptom is what will happen if buildbot tells us correctly when they start and when they finish, by including them and then not including them in builds-running.js, but Tinderbox doesn't discover that they've finished until long after they actually did. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304198150.1304207827.17628.gz is a Linux debug build which Tinderbox claims started at 14:15 and finished at 17:09, but buildbot claims (via self-serve) actually finished at 14:53, so as a result for a couple of hours tbpl said that build never happened at all, even though it was showing a few of the tests that had run on it.
Hard to tell much of anything from the outside, but I'm suspicious of mail problems since I saw one test job which showed up as finished in tbpl, so Tinderbox read the job-finished email, then a bit later it disappeared, and guessing about the yellow stuff on the Tinderbox waterfall and start times, I think that after the finished job email arrived (or was processed) then the start job email arrived (or was processed) so now that job's perpetually running.
Reporter | ||
Comment 18•14 years ago
|
||
I guess if there is some sort of prioritization so that this is less visible on central, but more on Places.
Fwiw, this is happening today as well.
Comment 19•14 years ago
|
||
die tinderbox, die! (bug 630538)
Updated•14 years ago
|
Whiteboard: [buildduty]
Comment 20•14 years ago
|
||
Isn't this a dupe of bug 653969?
Reporter | ||
Comment 21•14 years ago
|
||
(In reply to comment #20)
> Isn't this a dupe of bug 653969?
that bug was filed by Phil after this one in the doubt they were not the same, if you can ensure they are the same, I have no problem in merging them back.
Comment 22•14 years ago
|
||
Among the many many possibilities here is that Marco usually merges to Places at 3 or 4 am, so "builds disappear" when they finish and buildbot removes them from the running list tbpl uses to display that things are running and sends mail to Tinderbox, and then "builds reappear" when Tinderbox gets done processing hundreds of mails from l10n-nightlies and gets around to the Places mails and puts them in the json file tbpl uses to display finished things. If that's the case, and if someone actually does bug 653969 comment 4, and if it does then alert every day from 04:00 to 06:00, then this would be a sort of approximate dupe of that.
Reporter | ||
Comment 23•14 years ago
|
||
This could be compatible with what I see, I pushed something some minutes ago and I'm seeing the first results without a big delay.
So yes, it's possible thar at certain times the mail traffic is much higher than at usual US working times.
Reporter | ||
Comment 24•14 years ago
|
||
Fwiw, today's situation is that, after 4 hours, I'm still waiting for tbpl to show results, I'm sure they'll appear at a certain point, but due to that I had to delay a merge and lost time looking at tinderbox results multiple times.
I don't see any ETA on bug 653969, so I'd like to know if one exists or we have a temporary solution to setup in the meanwhile.
Comment 25•14 years ago
|
||
It seems there is a disconnect between builds-running.js and the processing of the log by tinderbox.
There are two different things that tbpl needs to know:
* what is the status of the job? (green/orange/purple/red)
** plus few extra data (tinderbox print messages)
* parsing of the log
Because the 2nd takes too long on tinderbox to process it, tbpl can't tell what the status is (the first item) and where the logs are. In other words, the first item and the second item are tied.
The logs are right now in two different places and we could avoid the delay on tinderbox if we could from tbpl point to the logs on ftp rather than wait on the email to tinderbox to be parsed and also indicate the status of the job.
philor if we had a way to tell tbpl here are the logs [1] and this is the metadata like the status without going through tinderbox's processing would we be able to move forward in here?
[1] http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1305556408/mozilla-central-linux-build2046.txt.gz
Blocks: 630538
Summary: Places builds and tests disappear from tbpl when complete → builds and tests disappear from tbpl when complete
Comment 26•14 years ago
|
||
(In reply to comment #25)
> philor if we had a way to tell tbpl here are the logs [1] and this is the
> metadata like the status without going through tinderbox's processing would
> we be able to move forward in here?
We're in the process of switching over to that. Bug 656902 stores the ftp.mozilla.org log URLs in a database, and bug 625887 has the patches to do log parsing (and tinderbox print message extraction) on the tbpl server.
Comment 27•14 years ago
|
||
That is excellent to know. I had not been following the progress of it.
What is the ETA? end of May/June? next quarter?
I am trying to figure this out to take it into consideration when talking with IT about bug 653969.
Comment 28•14 years ago
|
||
The mail processing issues have been fixed.
IT has placed nagios checks to catch this.
The longer term solution is being worked on by mstange and others.
Closing.
mak let us know if you notice this happening again.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 29•14 years ago
|
||
OK, thank you very much for taking care of this, I really appreciate that.
Reporter | ||
Comment 30•14 years ago
|
||
Hm, I'm noticing the same issue right now, a push done on Wed May 18 14:26:14 2011 PDT is still waiting results :(
Reopen or something that has been catched?
Assignee | ||
Comment 31•14 years ago
|
||
can you give more detail on the item? revision # or even better the tbpl link for it?
Reporter | ||
Comment 32•14 years ago
|
||
it was http://hg.mozilla.org/projects/places/rev/77083dd59380 the results reappeared few minutes after I posted comment 30
Assignee | ||
Comment 33•14 years ago
|
||
(In reply to comment #32)
> it was http://hg.mozilla.org/projects/places/rev/77083dd59380 the results
> reappeared few minutes after I posted comment 30
ok, so it appeared - cool. thanks for following up with the revision info
Comment 34•14 years ago
|
||
Does anybody know if the nagios check went off? Not sure if the delay was long enough.
mak for how long do you think this happened?
Comment 35•14 years ago
|
||
No, I didn't not see the check in #build.
Reporter | ||
Comment 36•14 years ago
|
||
from when I pushed to when results appeared it was about a couple hours, considering half an hour to make a build, probably it happened for 1 hour and a half.
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•