734123 - set up puppet dashboard on puppetagain servers

Assignee

Description

•

12 years ago

We should get a puppet dashboard set up on the puppetagain servers.

It would be great if the dashboard could appear on a single server, rather than having to hunt down the master for a particular host.

This is probably best experimented with on relabs-puppet and relabs07/08 if anyone wants to try it.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Blocks: PuppetAgain

Amy Rich [:arr] [:arich]

Updated

•

12 years ago

Assignee: server-ops-releng → dustin

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 1

•

12 years ago

So, dashboard requires a backend MySQL database, and seems to use quite a bit of CPU power - so it should be a separate VM from any of the puppet masters.

We could, potentially, run the frontend on the releng cluster, with a new VM just for workers.

I want to talk to Jabba about this before I dive in: is it worth the trouble? when does MySQL performance begin to suffer?

Amy Rich [:arr] [:arich]

Comment 2

•

12 years ago

Based on the horsepower needed, this might be a good use of the hg mirror hardware sitting in scl1 currently (which, I think , may no longer be in use for hg?).  Dev services guys, can we reclaim that hardware?

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 3

•

12 years ago

And, sheeri, what do you think about running this on an existing DB cluster, given your experience with the infra pupppet dashboard?

Sheeri Cabral [:sheeri]

Comment 4

•

12 years ago

If we put it on an existing DB cluster, it has to be one that's OK with having replication behind for several hours on a weekly basis, because when puppet dashboard is defragmented, it takes a while.

We found we needed it to be defrag'd weekly, and we still had space issues (jabba did a lot of work to not put too much data in the database, but there was still a ton of information and a lot of pruning work....)

That's also a lot of disk I/O, which might end up not playing nicely with the other VM's on the machine.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 5

•

12 years ago

Is there such a cluster?  Should we set up a dedicated cluster for dashboard backends?

Which machines do you mean by VMs - DB servers, or the puppet workers?

Sheeri Cabral [:sheeri]

Comment 6

•

12 years ago

We have a puppetdashboard DB cluster, but it's having a ton of problems at the moment. Jabba was investigating other puppet dashboard solutions, and we have a bug to add more disk to the existing ones. 

The disk I/O VM comment was intended for if you put the db on a vm.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 7

•

12 years ago

OK, thanks.  It seems like we should put these two eggs in the same basket - they certainly shouldn't be in baskets with any other eggs.  So I'll wait to see how the sysadmins puppet stuff plays out.

Puppetagain load is pretty low right now, but will likely grow within 6mo or so to be similar to what sysadmins puppet is doing today.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Depends on: 786651

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 8

•

12 years ago

So, rough architecture plan is this:

workers = releng-puppet-dashN.private.scl3
UI = releng cluster (scl3)
report acceptance = releng cluster (scl3)
db = puppetdashboard{1,2}.db.phx1

flows:
 masters -> report acceptance tcp/3000
 report acceptance -> db tcp/3306 (cross-DC flow)
 UI -> db tcp/3306 (cross-DC flow)
 workers -> db tcp/3306

I'd very much like to use the existing releng cluster for the web stuff, since we can embed it in secure.pub.b.m.o.  I'm aware it will be slow using a phx1 backend.  If necessary, we can work around this (most likely requiring an additional SSL cert).

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 9

•

12 years ago

On Amy's advice to not do cross-DC flows, we're going to try to set up a separate DB cluster in scl3, using the old hg-mirror hardware.  So, I'll get some bugs filed for that, and close out the IP allocation and VM bugs.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

No longer blocks: PuppetAgain

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 10

•

12 years ago

OK, bug 771121 tracks decomming the HG mirrors, which will free up that hardware by Sep 10.

That hardware is
 HP DL360G7 E5645 Base US Svr
 RAM: 6GB
 CPU: Intel® Xeon® E5645 (2.40GHz/6-core/12MB/80W, DDR3-1333, HT Turbo 1/1/1/1/2/3)
 Disk: Smart Array P410i
    with six
      physicaldrive 1I:1:1 (port 1I:box 1:bay *, SAS, 146 GB, OK)
    in a single
      logicaldrive 1 (683.5 GB, RAID 5, OK)

Sheeri, what do you think of using these as MySQL servers under load similar to puppetdashboard*.db.phx1?  Should we change the disk config?  Is it worth stalling long enough to get more RAM?

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Depends on: 771121

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

12 years ago

Docs:

URL: https://infra.etherpad.mozilla.org/99

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Depends on: 788605

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Depends on: 788630

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

12 years ago

puppetdash1/2.db.scl3 are kickstarting now.  I need to update DHCP in inventory.  I'll need to check up on whether these will be bonding, and how to encode the mgmt nic.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

12 years ago

Depends on: 791023

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 13

•

12 years ago

If this is running on the releng cluster, then we need to disable diffs entirely so we don't leak secrets - bug 791102.

Depends on: 791102

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 14

•

12 years ago

OK, this is pretty much working.
  https://secure.pub.build.mozilla.org/puppetdash/

'course, you'll need to use the new secure vhost; in /etc/hosts:

63.245.215.57 secure.pub.build.mozilla.org


I verified that disabling the report vhost doesn't cause production failures:

Sep 14 10:34:17 releng-puppet1 puppet-master[21846]: Unable to submit report to http://puppetdash.pvt.build.mozilla.org/reports/upload [403] Forbidden

so I'll include this as a workaround to any issues of db/webhead slowness affecting production.


There are some lingering problems with full URL paths - puppet dashboard *mostly* works at a sub-URI, but not quite.  They seem harmless enough so far (they require you to auth twice, and then you'll get some broken image links), and I'll work to fix the upstream.


What remains:

 - monitoring for DB servers
 - monitoring for workers
 - review and update docs

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 15

•

12 years ago

..and submitting some patches upstream:
 https://github.com/puppetlabs/puppet-dashboard/pull/121
 https://github.com/puppetlabs/puppet-dashboard/pull/122

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 16

•

12 years ago

OK, this is done.  We may need to revisit as we see how this scales.  It may need more workers, for example, or the db servers may need more tuning.

There were some issues with the webheads yesterday and today, but I strongly suspect those were due to a bogus master/master configuration of the databases.  I think this app is not master/master capable, so I switched the DBs to a master/slave configuration, with both DB's in the ro pool and only the master in the rw pool.

The installed webapp still has some absolute paths.  If we're patient, we'll wait until a new version is released with my patches applied; otherwise, we *could* patch this locally with puppet or build some custom RPMs.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations