Closed
Bug 734123
Opened 12 years ago
Closed 12 years ago
set up puppet dashboard on puppetagain servers
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
References
()
Details
We should get a puppet dashboard set up on the puppetagain servers. It would be great if the dashboard could appear on a single server, rather than having to hunt down the master for a particular host. This is probably best experimented with on relabs-puppet and relabs07/08 if anyone wants to try it.
Assignee | ||
Updated•12 years ago
|
Blocks: PuppetAgain
Updated•12 years ago
|
Assignee: server-ops-releng → dustin
Assignee | ||
Comment 1•12 years ago
|
||
So, dashboard requires a backend MySQL database, and seems to use quite a bit of CPU power - so it should be a separate VM from any of the puppet masters. We could, potentially, run the frontend on the releng cluster, with a new VM just for workers. I want to talk to Jabba about this before I dive in: is it worth the trouble? when does MySQL performance begin to suffer?
Comment 2•12 years ago
|
||
Based on the horsepower needed, this might be a good use of the hg mirror hardware sitting in scl1 currently (which, I think , may no longer be in use for hg?). Dev services guys, can we reclaim that hardware?
Assignee | ||
Comment 3•12 years ago
|
||
And, sheeri, what do you think about running this on an existing DB cluster, given your experience with the infra pupppet dashboard?
Comment 4•12 years ago
|
||
If we put it on an existing DB cluster, it has to be one that's OK with having replication behind for several hours on a weekly basis, because when puppet dashboard is defragmented, it takes a while. We found we needed it to be defrag'd weekly, and we still had space issues (jabba did a lot of work to not put too much data in the database, but there was still a ton of information and a lot of pruning work....) That's also a lot of disk I/O, which might end up not playing nicely with the other VM's on the machine.
Assignee | ||
Comment 5•12 years ago
|
||
Is there such a cluster? Should we set up a dedicated cluster for dashboard backends? Which machines do you mean by VMs - DB servers, or the puppet workers?
Comment 6•12 years ago
|
||
We have a puppetdashboard DB cluster, but it's having a ton of problems at the moment. Jabba was investigating other puppet dashboard solutions, and we have a bug to add more disk to the existing ones. The disk I/O VM comment was intended for if you put the db on a vm.
Assignee | ||
Comment 7•12 years ago
|
||
OK, thanks. It seems like we should put these two eggs in the same basket - they certainly shouldn't be in baskets with any other eggs. So I'll wait to see how the sysadmins puppet stuff plays out. Puppetagain load is pretty low right now, but will likely grow within 6mo or so to be similar to what sysadmins puppet is doing today.
Assignee | ||
Comment 8•12 years ago
|
||
So, rough architecture plan is this: workers = releng-puppet-dashN.private.scl3 UI = releng cluster (scl3) report acceptance = releng cluster (scl3) db = puppetdashboard{1,2}.db.phx1 flows: masters -> report acceptance tcp/3000 report acceptance -> db tcp/3306 (cross-DC flow) UI -> db tcp/3306 (cross-DC flow) workers -> db tcp/3306 I'd very much like to use the existing releng cluster for the web stuff, since we can embed it in secure.pub.b.m.o. I'm aware it will be slow using a phx1 backend. If necessary, we can work around this (most likely requiring an additional SSL cert).
Assignee | ||
Comment 9•12 years ago
|
||
On Amy's advice to not do cross-DC flows, we're going to try to set up a separate DB cluster in scl3, using the old hg-mirror hardware. So, I'll get some bugs filed for that, and close out the IP allocation and VM bugs.
Assignee | ||
Updated•12 years ago
|
No longer blocks: PuppetAgain
Assignee | ||
Comment 10•12 years ago
|
||
OK, bug 771121 tracks decomming the HG mirrors, which will free up that hardware by Sep 10. That hardware is HP DL360G7 E5645 Base US Svr RAM: 6GB CPU: Intel® Xeon® E5645 (2.40GHz/6-core/12MB/80W, DDR3-1333, HT Turbo 1/1/1/1/2/3) Disk: Smart Array P410i with six physicaldrive 1I:1:1 (port 1I:box 1:bay *, SAS, 146 GB, OK) in a single logicaldrive 1 (683.5 GB, RAID 5, OK) Sheeri, what do you think of using these as MySQL servers under load similar to puppetdashboard*.db.phx1? Should we change the disk config? Is it worth stalling long enough to get more RAM?
Assignee | ||
Comment 11•12 years ago
|
||
Docs:
Assignee | ||
Comment 12•12 years ago
|
||
puppetdash1/2.db.scl3 are kickstarting now. I need to update DHCP in inventory. I'll need to check up on whether these will be bonding, and how to encode the mgmt nic.
Assignee | ||
Comment 13•12 years ago
|
||
If this is running on the releng cluster, then we need to disable diffs entirely so we don't leak secrets - bug 791102.
Depends on: 791102
Assignee | ||
Comment 14•12 years ago
|
||
OK, this is pretty much working. https://secure.pub.build.mozilla.org/puppetdash/ 'course, you'll need to use the new secure vhost; in /etc/hosts: 63.245.215.57 secure.pub.build.mozilla.org I verified that disabling the report vhost doesn't cause production failures: Sep 14 10:34:17 releng-puppet1 puppet-master[21846]: Unable to submit report to http://puppetdash.pvt.build.mozilla.org/reports/upload [403] Forbidden so I'll include this as a workaround to any issues of db/webhead slowness affecting production. There are some lingering problems with full URL paths - puppet dashboard *mostly* works at a sub-URI, but not quite. They seem harmless enough so far (they require you to auth twice, and then you'll get some broken image links), and I'll work to fix the upstream. What remains: - monitoring for DB servers - monitoring for workers - review and update docs
Assignee | ||
Comment 15•12 years ago
|
||
..and submitting some patches upstream: https://github.com/puppetlabs/puppet-dashboard/pull/121 https://github.com/puppetlabs/puppet-dashboard/pull/122
Assignee | ||
Comment 16•12 years ago
|
||
OK, this is done. We may need to revisit as we see how this scales. It may need more workers, for example, or the db servers may need more tuning. There were some issues with the webheads yesterday and today, but I strongly suspect those were due to a bogus master/master configuration of the databases. I think this app is not master/master capable, so I switched the DBs to a master/slave configuration, with both DB's in the ro pool and only the master in the rw pool. The installed webapp still has some absolute paths. If we're patient, we'll wait until a new version is released with my patches applied; otherwise, we *could* patch this locally with puppet or build some custom RPMs.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•