Closed Bug 428617 Opened 12 years ago Closed 8 years ago

transfer management of build-graphs.mozilla.org to Camino team

Categories

(mozilla.org Graveyard :: Server Operations: Projects, task, P5, trivial)

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: joduinn, Assigned: zandr)

References

Details

(Whiteboard: [still used by Camino][lowp][q4])

The machines build-graphs.m.o, bl-bldxp01, bl-bldlnx01, bl-bldlnx02, bl-bldlnx03, bm-xserve08 were setup as an interim measure while Talos and graph server were being developed. 

At this point, talos and graphs.m.o are up and officially supported for several months. In bug#413695, we mothballed bl-bldxp01, bl-bldlnx01, bl-bldlnx02, bl-bldlnx03 and killed the perf-related processes on bm-xserve08.

We did not mothball build-graphs.m.o yet because, apparently, other products in Mozilla community are now using build-graphs.m.o, and would be impacted by us turning off build-graphs.

This bug is to track:
- find those other products using build-graphs.m.o
- migrate those products from build-graphs.m.o to the officially supported graphs.m.o instead. 
- mothball build-graphs.m.o.
build-graphs.m.o isn't just for perf things... check http://build-graphs.mozilla.org/graph/query.cgi for a list of all the tests it handles data for... Just looking at the Firefox tinderbox, I see a minimum of six different machines that use build-graphs for data. If you really want to do this, it's going to take a ton of work that includes tinderbox client changes to support sending data to the new graphs server.
Yep, I agree there is perf and non-perf data on builds-graphs. 

Any perf data that is also on graphs.m.o should be aged off build-graphs to avoid confusion, imho. 

Any non-perf data should be identified - my concern here is that build-graphs.m.o was setup on a temp machine on a temp basis, and is not being supported by anyone aiui. If someone is relying on build-graphs.m.o, we need to figure out how to get them onto a supported setup; whether we beef up build-graphs.m.o or migrate them to graphs is an open question.
(In reply to comment #2)
> Any non-perf data should be identified - my concern here is that
> build-graphs.m.o was setup on a temp machine on a temp basis, and is not being
> supported by anyone aiui. If someone is relying on build-graphs.m.o, we need to
> figure out how to get them onto a supported setup; whether we beef up
> build-graphs.m.o or migrate them to graphs is an open question.

I don't believe the information you have received concerning build-graphs.m.o is correct. Speaking from what knowledge I have of the webtool, build-graphs.m.o runs on dm-webtools01, which is an IT-supported production VM. dm-webtools01 is not a temporary machine, and I was the one to move build-graphs.m.o to dm-webtools01 over the summer on a permanent and not temporary basis, as it was originally on (the now deprecated) axolotl. While true that nobody is actively developing on build-graphs.m.o for future expansion, it is still a supported webtool in "maintenance" mode, meaning it continues to get security holes patched as they are found, just like any of our other various perl-based webtools. It also receives changes as needed if any type of OS upgrade causes it to have problems. It's working just fine for its needs currently, so I'm not sure why there is some sudden need to "mothball" it. I agree perf-related stuff should move to the new graphs server, as it has better features that allow for more fine-grained investigative work, but the current results being reported to build-graphs.m.o are not as critical as our perf numbers are and don't require all the fanciness of the new graphs server in order to track some increase or decrease in a test result. I'm not sure why build-graphs.m.o would need to be "beefed up" at all, as I am unaware of any problems it is currently having.
From some internal talks seems the only thing on this VM that's important to keep is whatever constitutes build-graphs and that requires less that the 150GB disk that's currently allocated to it.

Moving this to Server-Ops:Projects - seems like a good intern project.
Component: Release Engineering: Future → Server Operations: Projects
QA Contact: release → justin
1) "still supported in maintenance mode"... by whom? To the best of my knowledge, this is not being supported by IT or Release Groups. This was intended as a temporary solution until Talos and graph server were up in production. If we hit problems with the VM, or any processes running in the VM, who is on the hook to fix them? What support level is being provided?

2) build-graphs.m.o is a 150GB VM. By comparison, our build slave VMs range in size from 45GB-60GB. So this one "build-graphs" VM takes up the same space as 3-4 of our normal VMs. Given the space crunch we have on our ESX hosts, and our buildup of moz2 infrastructure, this space matters. 

3) If community folks prefer to keep using this build-graphs.m.o VM, instead of transitioning to graph.m.o, then we need to :
- find someone willing to own maintaining the build-graph.m.o code and the VM
- we should migrate this VM to a community VM host
- to avoid confusion, I'd prefer to remove any data from this build-graphs.m.o VM which is obsoleted by data on graph.mozilla.org.

Personally, I think a smooth transition over to using the supported graph.m.o seems better then trying to figure out answers to these open questions about build-graphs.m.o, but thats just my $0.02.

Does all that make sense?
build-graphs was intended to go away when the new graph server was ready for production.  Some people may still want some of the old data around if the new server only had data starting at a certain point.  I suspect getting some of the people who actually know how these things work CCed on this bug might be useful.
(In reply to comment #5)
> 2) build-graphs.m.o is a 150GB VM.

For reference, the only reason it had that much disk space is because LXR and MXR used to be on the same box with it, and they needed the space.  Those two webtools are no longer on that VM.  build-graphs itself uses about 10 GB.

What mrz was proposing (if build-graphs is kept around at all) is to migrate build-graphs to a new smaller VM and then dispose of this one.
(In reply to comment #6)
> build-graphs was intended to go away when the new graph server was ready for
> production.

*and* implements all of the old servers functionality that is still actually being used.  I'm told it doesn't yet.
As far as I know, all community and non-Firefox tinderboxen report to build-graphs, so that "box" either needs to remain operational or there needs to be a migration plan for getting community build-graphs data and tinderboxen moved on to graphs.

(It would also be nice if graphs appeared to be usable and functional before any proposed migration happened; although I can get a graph if I follow someone's link to a pre-made one, I have never been able to make a graph of my own.  Either it just is "building graph" interminably or the graphs UI fails to finish loading due to constant script timeouts.  If there's a secret handshake needed to getting graphs working, can we document it somewhere findable?)
(In reply to comment #9)
> As far as I know, all community and non-Firefox tinderboxen report to
> build-graphs

Actually, all tinderboxen (including Firefox) report to build-graphs currently. Only Talos reports to the new graphs server.
Build-Graphs has been around for awhile (since at least 2001, looking at the data in it, see machines like beast, gabrielle or pacifica (non-vm) for a nice journey back in time.). I don't think previously it used that hostname though, and rather reported to axolotl (and probably something else before that server came online) directly. Its not actively being developed, beyond what reed said in Comment #3. All of the tinderboxen (no distinction between 'community' and not) prior to the talos tests use it, for any statistics that they want to graph, such as fxdbug's leak numbers or any Ts/Tp box (such as bm-xserve05 on mozilla1.8 or seamonkey's sea-win32-tbox, just to cite specific examples). 

The migration of all the Firefox performance tests to Talos from the perf test boxes certainly reduced the number of Firefox-specific boxes that use it, which certainly explains a bit of the confusion here. Its migration to dm-webtools01 probably was temporary, in the sense that talos/new graphs was being developed, and IT wanted/needed to shut down axolotl, and the plan for the new graphs server was likely to be intended to replace build-graphs, but that hasn't happened yet. AFAIK, no boxes have actually migrated from one server to the other, only that the new talos stuff used it from the start.

So, I see there's a few problems being mentioned here.
(1) The VM that hosts it (dm-webtools01) is too large for just this app. Comment #4 and Comment #7 address this, by suggesting it should be moved to a smaller VM to free up space. Though since this is not a community-specific tool, it should not move to a community vm host. Getting this done sounds like it'd probably solve the most pressing reason this bug was filed and probably what this bug should focus on.

(2) Tinderboxen (both Community and Firefox) should move to using graphs instead of build-graphs, once what was said in Comment #8 is true.
Have the performance issues with the new graphs server been solved that caused the db load and other problems a few weeks (months?) back? Before adding a bunch more machines, seems like knowing the new server can handle the load would be a good idea. 

(3) Is there documentation for the tinderbox maintainers on how to use the new graphs server to report their test results? Does tinderbox client support the new graphs server or is it buildbot only? Is there a plan for the remaining Firefox/Mozilla1.8 boxes to migrate to using graphs instead of build-graphs? Is there a way to migrate data between the two tools, to preload history for a machine once it has migrated, otherwise we'd need to wait until the machines that get migrated at this point have some time to build history so the data they're reporting is actually useful to those looking at it.
(In reply to comment #4)
> From some internal talks seems the only thing on this VM that's important to
> keep is whatever constitutes build-graphs and that requires less that the 150GB
> disk that's currently allocated to it.
> 
> Moving this to Server-Ops:Projects - seems like a good intern project.
I've created bug#431380 in ServerOps:Projects to track reducing the size of this VM. Moving this bug back to ReleaseEngineering:Future, as we'll need to work through the various "who is using what" scenarios before we can mothball this VM.
Component: Server Operations: Projects → Release Engineering: Future
QA Contact: justin → release
The huge slowness of the new, supposed-to-be-better build graphs system and the comparatively awesome fastness of build-graphs.m.o is actually one primary reason why I wouldn't even think of getting rid of this old system for now.
ignore this flag - testing new triage query.
Flags: blocking1.8.1.next?
Ignore a flag we use to track 1.8.1 releases? ;)
(In reply to comment #14)
> ignore this flag - testing new triage query.
All done, so I've cleared flag to previous unset value. Now our triage queries will detect if someone marks one of our unassigned Future bugs as a blocker, so we dont miss it by accident. 


(In reply to comment #15)
> Ignore a flag we use to track 1.8.1 releases? ;)
Yeah, sorry for noise, but I figure the old 1.8.1 flag was best for this experiment, as relatively few people are checking that anymore.
Flags: blocking1.8.1.next?
(In reply to comment #13)
> The huge slowness of the new, supposed-to-be-better build graphs system and the
> comparatively awesome fastness of build-graphs.m.o is actually one primary
> reason why I wouldn't even think of getting rid of this old system for now.

(Just found this bug, and noticed this comment is over a year old.)

Have you tried using the new graph server, to see if that is performant enough for you to switch from build-graphs??
Someone should check links at least - the memory graphs page from the Firefox tinderbox still points to the old graph server.

Also the Firefox 3.5 & 3.6 pages don't have links for memory graphs even though there are graphs for them.
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Once we EOL FF3.0 (in bug#554226), MoCo will not be using build-graphs.m.o for anything. 

If community are still using build-graphs, and IT are willing to support it, then we should WONTFIX this bug. If community are not using this, or cannot get it supported, then we should mothball build-graphs.
Things have changed in the last two years, the new graphs server is slightly faster, and we have tooling to post there but no machines that post to build-graphs any more for SeaMonkey - if anyone else needs it, fine, but we don't.
We're still using build-graphs for Camino.
Assignee: nobody → joduinn
Priority: P3 → P5
Whiteboard: [still used by Camino]
We used to post to build-graphs for a while, but since that broke a while back, never got fixed. Once we look at posting graph data once more, I'd rather we look at the new, shiny stuff, so mothball away from my POV.
Kairo, gozer: thanks for the confirm that you're not using build-graphs.


(In reply to comment #22)
> We're still using build-graphs for Camino.

Smokey, is it possible for you to change Camino from build-graphs.m.o over to using graphs.m.o? Hopefully its not too much work, and it seems like a better long-term strategy then figuring out how to transition support of build-graphs.
Since we use Tinderbox, I'm sure it's possible, but no idea how easy (someone from the releng team would know). However, given the workload of our team as it is, I'd say... "patches accepted."
Camino should follow along with what seamonkey did in bug 492406.
(In reply to comment #25)
> Since we use Tinderbox, I'm sure it's possible, but no idea how easy (someone
> from the releng team would know). However, given the workload of our team as it
> is, I'd say... "patches accepted."

(In reply to comment #26)
> Camino should follow along with what seamonkey did in bug 492406.

Sam: Let us know if you have any questions when following those instructions moving to "new" graphserver. 

Note: If you prefer to continue using build-graphs, instead of moving to the "newer" supported graphserver, we should sort out some questions. We'll need to coordinate with you about moving build-graphs to community hosted space, who can maintain it (you?), and what changes you'll need to make to point to the relocated build-graphs.
(In reply to comment #27)
> (In reply to comment #25)
> > Since we use Tinderbox, I'm sure it's possible, but no idea how easy (someone
> > from the releng team would know). However, given the workload of our team as it
> > is, I'd say... "patches accepted."
> 
> (In reply to comment #26)
> > Camino should follow along with what seamonkey did in bug 492406.
> 
> Sam: Let us know if you have any questions when following those instructions
> moving to "new" graphserver. 
> 
> Note: If you prefer to continue using build-graphs, instead of moving to the
> "newer" supported graphserver, we should sort out some questions. We'll need to
> coordinate with you about moving build-graphs to community hosted space, who
> can maintain it (you?), and what changes you'll need to make to point to the
> relocated build-graphs.


Sam: ping?
(In reply to comment #28)
> Sam: ping?

I'll re-iterate comment 25. Patches accepted. Keep in mind, in bug 492406, SeaMonkey was already using Buildbot. We're not.
(In reply to comment #29)
> (In reply to comment #28)
> > Sam: ping?
> 
> I'll re-iterate comment 25. Patches accepted. Keep in mind, in bug 492406,
> SeaMonkey was already using Buildbot. We're not.

As all MoCo projects have moved off build-graphs, IT will be moving this machine over to community space. Pushing bug over to IT to coordinate with Camino project.
Assignee: joduinn → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Long term goal is to decommission this host.  Short term is to turn it into a community resource, ideally managed by someone related to Camino.  Any takers?
What does "managed" mean in terms of this machine? Either Smokey or I can do this (though I don't want to volunteer him without asking first). I'm happy to take it on depending on what we're talking about and I imagine a backup is good just as we do for the build machines.
Ideally, yeah, you'd manage the OS & up.  IT would manage/support the hardware.
I'm fine with that.

Please give Smokey access as well so that we have a backup since I'm on the road a lot.
Assignee: server-ops → mrz
Summary: mothball build-graphs.mozilla.org → transfer management of build-graphs.mozilla.org to Camino team
Low priority, requires a lot of network moves.  Unless anyone else feels this is urgently needed, I want to sit on it until q4 (when we have more time).
Severity: normal → trivial
Whiteboard: [still used by Camino] → [still used by Camino][lowp][q4]
Component: Server Operations → Server Operations: Projects
Assignee: mrz → zandr
Is there any action here?  It's been a while.  Graphs moved (I think twice) since the last post here.  If there's still something camino needs, speak up and let's get it addressed, otherwise, RESOLVED/WORKSFORME?
(In reply to Dustin J. Mitchell [:dustin] from comment #36)
> If there's still something camino needs, speak up
> and let's get it addressed, otherwise, RESOLVED/WORKSFORME?

30 days, nothing heard.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.