Closed Bug 843383 (tinderbox-death) Opened 11 years ago Closed 10 years ago

[Tracking bug] decommission tinderbox server

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Unassigned)

References

Details

(Keywords: dev-doc-needed, spring-cleaning)

Attachments

(1 file)

From irc yesterday, turns out there was no bug for decommissioning tinderbox server, even though it was being eagerly talked about, so filing this to track. 

If you know of something which relies on tinderbox server, please link it to this bug. Once the known dependencies on tinderbox have been removed, the plan of record is to power off this tinderbox server and instead use the supported tbpl.m.o.
(keeping in mozilla.org:RelEng for now; once we are all agreed we no longer use tinderbox.m.o, we'll move this bug to ServerOps for action.)
Alias: tinderbox-death
Bugzilla and Camino are the two reasons nobody else ever filed this.
Well, that and the fact that the people eagerly talking about it were just eagerly talking about "killing tinderbox use by any project I have to deal with" rather than actually talking about "killing tinderbox."
Bugzilla still uses Tinderbox, so you'll need to track that as well.
What's so bad about Tinderbox? It seems that "real" developers far prefer tbpl, but for a tester / QA person like me the latter is totally unfriendly. Can't you have both the pushlog _and_ the waterfall?
(In reply to Tony Mechelynck [:tonymec] from comment #5)
> What's so bad about Tinderbox? It seems that "real" developers far prefer
> tbpl, but for a tester / QA person like me the latter is totally unfriendly.
> Can't you have both the pushlog _and_ the waterfall?

Could you give some examples of what you mean by unfriendly? :-)
Argh, I really hate it when you keep trying to sneakily kill services off we use and making me unease of the future support of anything we depend on. And still there's nothing in this bug that says WHY you want to kill tinderbox, especially since its existence shouldn't matter to Fx devs at all as you have your own thing.

First question that comes to mind, is tbpl fully compatible with the same (email) input that Tinderbox Server uses? If not, then there's all Bugzilla tests and Selenium QA tests we have that need to be rewritten to support tbpl. And that's not something we easily have resources for.

Other thing is of course getting the same output as from Tinderbox. Basically for me that means easy list of current status of tests per branch and also access to the full stderr output of a test run to see what went wrong. I used to also see what changes went into which build but that has been somewhat broken for some time now.

There's also status integration to browser status bar and also to IRC (using a supybot plugin) that we have from Tinderbox. Are such things done for TBPL already?
For Bugzilla, see the discussion in bug 527038.
Depends on: 527038
(In reply to Ed Morley [:edmorley UTC+0] from comment #6)
> (In reply to Tony Mechelynck [:tonymec] from comment #5)
> > What's so bad about Tinderbox? It seems that "real" developers far prefer
> > tbpl, but for a tester / QA person like me the latter is totally unfriendly.
> > Can't you have both the pushlog _and_ the waterfall?
> 
> Could you give some examples of what you mean by unfriendly? :-)

By unfriendly to me, I mean I can't make head or tail of it, even with the "legend" popup frame, or only after hard labour. The waterfall page is larger than my browser screen, yes, but that's no great hardship with the Ctrl+- zoom, and I can immediately find the build category I'm looking for (for instance, all SeaMonkey trunk L64 hourlies), the column header tells me what the last build's result was for this category (from green = success to black with flames = busted) and then by scrolling around I can find out, if I want, when that build started and ended. Or I can find the "latest green build" (if recent enough) by just scrolling down the waterfall. Or, by scrolling horizontally, I can see immediately where the trouble spots lie for a given product branch (i.e. on a given Tinderbox page), even if the latest builds were from different changesets depending on platform etc.

Also, the Tinderstatus extension adapts to my statusbar and lets me follow the current state of the tinderboxen I'm interested in — AFAIK there's nothing similar for tbpl, and nothing either, for instance, for Firefox Nightly or Firefox Aurora, whose waterfalls have already been killed.

So my question remains: I've heard that developers prefer tbpl. As a tester I prefer the Tinderbox waterfall. Why can't we have both?
Flags: needinfo?(emorley)
(In reply to Tony Mechelynck [:tonymec] from comment #9)
> By unfriendly to me, I mean I can't make head or tail of it, even with the
> "legend" popup frame, or only after hard labour. 

I most definitely agree that the documentation for TBPL could be improved (although that's unlikely to happen with the current version, now that we are dicussing TBPLv2 - but we'll prioritise documentation for v2) - however much of what you said is already possible with TBPL (and IMO easier to use than tinderbox, once you get accustomed to it).

> Also, the Tinderstatus extension adapts to my statusbar and lets me follow
> the current state of the tinderboxen I'm interested in — AFAIK there's
> nothing similar for tbpl, and nothing either, for instance, for Firefox
> Nightly or Firefox Aurora, whose waterfalls have already been killed.

Agreed the browser addon and IRC bot are missing - and are both things that I would find very useful as a sheriff - so will do my best to make sure the backend for TBPLv2 is capable of supporting those kind of downstream consumers.

> So my question remains: I've heard that developers prefer tbpl. As a tester
> I prefer the Tinderbox waterfall. 

I'm slightly puzzled as to the developer vs tester differentiation - I don't think being a tester makes any difference as to whether one prefers TBPL vs tinderbox (unless there are use-cases I'm missing) - I think it's purely a case of:
"does the improved UX/features of TBPL (once you learn how to use it) outweigh losing the IRC bot and browser addon"

> Why can't we have both?

I think we're conflating two issues here.

1) Preferred UI for the frontend (the pushlog view of TBPL vs the waterfall view of tinderbox.m.o)

2) The data source for job results/logs/... (buildbot for TBPL vs tinderbox)

#1 is a matter of preference (or possibly not being as keen on an alternative UI, just due to lack of docs/familiarity - but I'm happy to concede on this point) - and if there was a great demand for it, support for the waterfall view could always be added to TBPL (though I suspect waterfall view fans are in the minority, so may be a case of 'patches-welcome'). In addition, TBPLv2 will be adding support for a number of more specialist use-cases (eg: tracking down bad machines, or all-tree infra issues), which may reduce people's need for a waterfall/timeline view.

#2: Tinderbox is old, no longer supported by our main automation & is full of security holes. The tinderbox server was unable to keep up with the email load (I'll gloss over the '?!?!' of using emails for results) - and I'm sure releng has many other reasons why we switched. (I'm not in release engineering, so that's about the extent of my knowledge as to why).
Flags: needinfo?(emorley)
I should also add that the replacement for TBPL (aka TBPLv2, but that won't be the final name), will support data input from sources other than buildbot. (You'll just use a script of your choice to post to the new web service). The wiki page needs a bit of updating, but you can find the preliminary plans at:
https://wiki.mozilla.org/Auto-tools/Projects/TBPL2
(In reply to Teemu Mannermaa (:wicked) from comment #7)
> Argh, I really hate it when you keep trying to sneakily kill services off

A public bug & a bug already filed for bugzilla isn't exactly sneaky.

> still there's nothing in this bug that says WHY you want to kill tinderbox,
> especially since its existence shouldn't matter to Fx devs at all as you
> have your own thing.

Partial answer at comment 10, but 301 releng/IT for the full reasoning.

> First question that comes to mind, is tbpl fully compatible with the same
> (email) input that Tinderbox Server uses?

No, buildbot is used as the data source.

> Other thing is of course getting the same output as from Tinderbox.
> Basically for me that means easy list of current status of tests per branch
> and also access to the full stderr output of a test run to see what went
> wrong. I used to also see what changes went into which build but that has
> been somewhat broken for some time now.

TBPL supports all of that and some more.

> There's also status integration to browser status bar and also to IRC (using
> a supybot plugin) that we have from Tinderbox. Are such things done for TBPL
> already?

Unfortunately not yet, see comment 10.
(In reply to Ed Morley [:edmorley UTC+0] from comment #10)
[…]
> Agreed the browser addon and IRC bot are missing - and are both things that
> I would find very useful as a sheriff - so will do my best to make sure the
> backend for TBPLv2 is capable of supporting those kind of downstream
> consumers.

A spot of light at the end of the tunnel :-) . I check new extensions on AMO about daily so I ought to see the new addon if (and when) it's hosted there. And of course I'll get it as an update if the Tinderstatus extension is modified to support some version of TBPL. And, yes, the IRC bot, e.g. to stalk certain words when firebot sees them in bug summaries: it complements Product::Component watching but neither of them replaces the other.

> 
> > So my question remains: I've heard that developers prefer tbpl. As a tester
> > I prefer the Tinderbox waterfall. 
> 
> I'm slightly puzzled as to the developer vs tester differentiation - I don't
> think being a tester makes any difference as to whether one prefers TBPL vs
> tinderbox (unless there are use-cases I'm missing) - I think it's purely a
> case of:

Could be a misunderstanding on my part. I've heard (well, seen on IRC) at least one developer say "I hate Tinderbox". I never saw any of them say "I hate TBPL". But it could be that I overgeneralized.

> "does the improved UX/features of TBPL (once you learn how to use it)
> outweigh losing the IRC bot and browser addon"
> 
> > Why can't we have both?
> 
> I think we're conflating two issues here.
> 
> 1) Preferred UI for the frontend (the pushlog view of TBPL vs the waterfall
> view of tinderbox.m.o)
> 
> 2) The data source for job results/logs/... (buildbot for TBPL vs tinderbox)
> 
> #1 is a matter of preference (or possibly not being as keen on an
> alternative UI, just due to lack of docs/familiarity - but I'm happy to
> concede on this point) - and if there was a great demand for it, support for
> the waterfall view could always be added to TBPL (though I suspect waterfall
> view fans are in the minority, so may be a case of 'patches-welcome').

Well, well, well… I don't think I understand TBPL well enough to even start writing a flowchart (or whatever is used in lieu of a flowchart nowadays), not to mention doing the actual coding, but if I could, I gladly would.

> In
> addition, TBPLv2 will be adding support for a number of more specialist
> use-cases (eg: tracking down bad machines, or all-tree infra issues), which
> may reduce people's need for a waterfall/timeline view.

Some of these sound useful even for me, and I'm sure they'll interest RelEng & BuildConfig people. :-)

> 
> #2: Tinderbox is old, no longer supported by our main automation & is full
> of security holes. The tinderbox server was unable to keep up with the email
> load (I'll gloss over the '?!?!' of using emails for results) - and I'm sure
> releng has many other reasons why we switched. (I'm not in release
> engineering, so that's about the extent of my knowledge as to why).

Ah, so that's the reason. OK, I can understand "old, full of security holes and unable to keep up with the load". Let's hope the patches-welcome a few paragraphs above will progress to ASSIGNED, r?, r+ and FIXED. ;-)
Depends on: 846140
Blocks: 823923
OS: Mac OS X → All
Hardware: x86 → All
Based on https://bugzilla.mozilla.org/show_bug.cgi?id=698910#c22, l10n no longer has a dependency on tinderbox.m.o.
OS: All → Mac OS X
Hardware: All → x86
1) SeaMonkey and Camino no longer depend on tinderbox.m.o, and the dep bugs are closed.

2) Per email from bsmith (25mar) and email from kai (28mar), NSS no longer has a dependency on tinderbox.m.o. Bug#648676 remains open while loose ends of the transition from cvs->hg are sorted out, but none of that blocks decommission of tinderbox server.
:wicked, (and :joes)


(In reply to Ed Morley (Away 29th-1st, UK public holiday) [:edmorley UTC+0] from comment #12)
> (In reply to Teemu Mannermaa (:wicked) from comment #7)
> > Argh, I really hate it when you keep trying to sneakily kill services off
> 
> A public bug & a bug already filed for bugzilla isn't exactly sneaky.

:wicked: Sorry you were not notified before. That was an oversight, but not sneaky. To be honest, this project started in 2007, we finally completed the Firefox/Fennec transitions off tinderbox.m.o ~sept2012, and then started the multi-month projects to transition NSS and l10n. I didnt know bugzilla was using tinderbox server until LpSolit linked dep bug#527038 (filed over 3 years ago, in 2009-11-08). 

In the "good news" category, at least we found out about your usage of this before we powered it off!

At this point, it looks like bug#527038 is now the only remaining blocker to powering off tinderbox.m.o. Lets explore some options in bug#527038, where the right folks are cc'd.


> > still there's nothing in this bug that says WHY you want to kill tinderbox,
> > especially since its existence shouldn't matter to Fx devs at all as you
> > have your own thing.
> 
> Partial answer at comment 10, but 301 releng/IT for the full reasoning.

Short summary: 
* tinderbox server has been unowned, and unmaintained, since before I joined Mozilla in 2007. This is still true today. Any repair/maintenance work done on it was ad-hoc, during-outages, by whomever-could-be-found-to-throw-themselves-on-it.
* the design of tinderbox doesnt easily handle multiple-concurrent jobs of the same job-type; something that tbpl does handle nicely. This was a bottleneck to improving capacity for our entire build infrastructure, and the "new" tbpl-fed-directly-from-buildbot scaled. 
* MoCo production has not used tinderbox.m.o since sept 2012. 
* As of today, the unmaintained tinderbox.m.o continues to be a security problem, requiring people to react to active exploits. Given then nature of these, if you want further details, please contact joes (cc'd) offline. He'll be happy to elaborate.
* As part of my negotiations with joes last month, OpSec agreed to leave tinderbox.m.o running only because I found volunteers to hot-fix an exploit-in-progress, and I promised to move everyone else off asap. Given the security issues involved, we need to find people who are able to fix them quickly, or we need to move off the server in an expedient manner.

Hope that helps set context. Happy to talk if you would like further info.
No longer depends on: 843389
Calendar, which also still uses tinderbox should be fine with this turning off. Nightly builds have already transitioned to Thunderbird build machinery, which is using TBPL.
The documentation at:

https://developer.mozilla.org/en-US/docs/Tinderbox

should be updated when this is turned off.
Keywords: dev-doc-needed
(In reply to Philipp Kewisch [:Fallen] from comment #17)
> Calendar, which also still uses tinderbox should be fine with this turning
> off. Nightly builds have already transitioned to Thunderbird build
> machinery, which is using TBPL.

:Fallen,

Yes, I'd mentally figured it was a no-op for Calendar, given the recent transition to Thunderbird systems which, as you correctly note, use TBPL. However, getting an explicit confirm from you is good, so thanks for that!
(In reply to John O'Duinn [:joduinn] from comment #16)
> In the "good news" category, at least we found out about your usage of this
> before we powered it off!

John: this URL:
http://tinderbox.mozilla.org/showbuilds.cgi
shows all of the trees reporting to Tinderbox, including the 8 Bugzilla trees. You might want to review that list to see if anyone else is using Tinderbox who hasn't been informed about your desire to switch it off :-)

Gerv
(In reply to Gervase Markham [:gerv] from comment #20)
> (In reply to John O'Duinn [:joduinn] from comment #16)
> > In the "good news" category, at least we found out about your usage of this
> > before we powered it off!
> 
> John: this URL:
> http://tinderbox.mozilla.org/showbuilds.cgi
> shows all of the trees reporting to Tinderbox, including the 8 Bugzilla
> trees. You might want to review that list to see if anyone else is using
> Tinderbox who hasn't been informed about your desire to switch it off :-)
> 
> Gerv

Done.
Depends on: 902553
Product: mozilla.org → Release Engineering
Found in triage.

Everyone else is off tinderbox.m.o; once bug#527038 is resolved, we can power-off.
Depends on: 983275
No longer depends on: 648676
No longer depends on: 843395
The service is offline. fox2mike deleted the vhost at 5pm today.

There are a couple of cleanup items for early next week, but I can't help myself.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
From #releng:
...
[13:50]	fox2mike	laura: config commented out, restarting apache
[13:51]	laura	WOO
[13:51]	bhearsum	yay!
[13:51]	bhearsum	i think it's dead now?
[13:51]	bhearsum	pokes it with a stick
[13:51]	laura	joduinn: bhearsum: catlee: mcote|afk dkl:
[13:51]	laura	I think so!
[13:51]	laura	updates code
[13:51]	bhearsum	YAY!!!
[13:52]	dkl	cool!
[13:52]	joduinn	waits for Guinness to settle
[13:52]	joduinn	watched kettle, and all that...
[13:53]	laura	lol
[13:53]	catlee	https://tinderbox.mozilla.org/ needs updating
[13:54]	bhearsum	laura: http://canwekilltinderboxyet.com/ is 404 \=\
[13:54]	laura	bhearsum: deploying
[13:54]	laura	bhearsum: at “stackato.stager: Completed staging application 'canwekilltinderboxyet'"
[13:55]	laura	man stackato is crawling
[13:55]	laura	bhearsum: done
[13:55]	laura	 http://canwekilltinderboxyet.com/
[13:55]	catlee	 nice
[13:56]	bhearsum	 :-D
[13:56]	rail     ooh
[13:56]	joduinn lol
[13:59]	dustin man I wish I had hair like htat
[14:00]	bhearsum	 bouncy and curly?
[14:00]	fox2mike	 laura: mcote|afk I think it's done
[14:00]	fox2mike	 I'm going to leave DNS and load balancer configs AS IS until next week
[14:00]	fox2mike	 and if we don't need it, I'll wipe all that out
[14:02]	bhearsum	 what a great way to end the week - happy friday all
[14:07]	rail	waves
[14:07]	rail	have a great weekend
[14:10]	laura	fox2mike: you are awesome
...
[14:11]	laura	dustin: ditto on the hair
[14:11]	laura	happy friday!
[14:11]	fox2mike laura: <3
[14:11]	laura	<3
[14:12]	callek_mobile Joduinn. Hihi! How goes it?
[14:12]	fox2mike	 zomg is there a joduinn here?
...
[14:15]	catlee	heads out too
[14:18]	laura	fox2mike: he wanted to be in at the death
[14:18]	callek_mobile He deserved to send it off at least
[14:19]	laura	agreed
[14:22]	joduinn	quietly puts down the ceremonial Guinness and waves to fox2mike
...

...and from http://canwekilltinderboxyet.com/, I think we can declare this done.
Blocks: 1026036
Blocks: 1093383
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: