Closed
Bug 666757
Opened 14 years ago
Closed 14 years ago
fix naming problems on seamicro.phx1 nodes
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nmaul, Assigned: nmaul)
Details
Nodes 10-19 and 21-29 are known to be wrong. That is, DNS and the server disagree... the hostname is one number higher than DNS. Apart from being confusing, this is probably also causing an off-by-one error with puppet.
1- remove the <node30 nodes from the engagement cluster (drain them on http https)
2- remove node1-node29 from rhn (or at least the affected ones)
3- fix the hostnames for node0-node29 (currently show as node1-node30)
4- re-register them with rhn
5- re-puppetize them just to make sure we have the name/certs right
6- un-drain them
pro tip: puppetizing these takes forever. If you do them in a serial loop make sure and screen your session.
Assignee | ||
Updated•14 years ago
|
Assignee: server-ops → nmaul
Assignee | ||
Comment 1•14 years ago
|
||
1. Done
2. Done
3. Done. Rebooted nodes to put into effect (because some daemons don't like changing hostnames, and this seemed like the easiest way to get everything fixed). This caused chaos with the Seamicro though. I think there is a race condition where nodes rebooting simultaneously can screw it up... best to space out the reboots by 15+ seconds... that seemed to work fine.
4. In progress.
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•14 years ago
|
||
4. Done.
5. In progress (screen w/ 2 windows on dp-nagios01... doing 2 at a time).
Assignee | ||
Comment 3•14 years ago
|
||
5. Done.
6. Not starting yet... I want to double-check at least the edge cases, to make sure they look like they're going to work properly and got the right puppet configs.
Comment 4•14 years ago
|
||
(In reply to comment #0)
> pro tip: puppetizing these takes forever. If you do them in a serial loop
> make sure and screen your session.
I just puppetized node81 and 82 yesterday...and had no problems (took the "usual" amount of time). Could it be that these nodes have storage related issues? or something else that's making them slow?
Assignee | ||
Comment 5•14 years ago
|
||
It actually wasn't too bad... I think it just seems bad if you do a bunch in serial. Of course, they're just little Atom CPUs, so they should be a bit slower than a good blade would be.
Anyway, found another problem with the engagement cluster. For some reason,
engagement1-20 line up with node10-29.seamicro
engagement21 is a blade
engagement22-30 line up with node210-218.seamicro
engagement31-41 line up with node229-239.seamicro
There was a gap... node219-228 are assigned to the Engagement cluster in puppet, but did not have an 'engagementXX' CNAME. I wedged it in there, which shifted 31-41 down a bunch. I updated commanderconfig.py on ip-admin02.phx, and fixed root's known_hosts file to have all the proper keys for each host.
6. Done as well.
Marking this as resolved... don't see any more strangeness with engagement naming.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•