Closed Bug 597912 Opened 14 years ago Closed 13 years ago

build hosts/puppet should not use NFS

Categories

(Infrastructure & Operations :: RelOps: General, task, P4)

All
Other
task

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mrz, Unassigned)

References

Details

(Whiteboard: [puppet])

NFS doesn't scale the way releng needs it to and causes bootup problems if NFS is unavailable.

IT doesn't use NFS for puppet - releng should move away from this method.
Assignee: server-ops → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Group: infra
Priority: -- → P4
Whiteboard: [puppet]
Also, makes it a pain in the wrong places to do NFS across Datacenters and stuff (saw another bug requesting this). Should just follow the IT model of checking in stuff into svn (we can help create a sekrit releng svn tree) and you guys can push stuff in there, puppetmasters check out stuff from there.
Possible to get an ETA on this?
Severity: minor → critical
I don't know; I'm supposed to get right back on top of old bugs once I'm done with santa clara stuff. Off to John for prioritization.
Assignee: nobody → joduinn
I was led to believe that we're using local storage for puppet files on all of the masters, with the exception that staging and production share a mount.  I was wrong (which explains why I had so much trouble with rsyncing!).

CURRENT CONFIG
--------------
all four masters mount /N from 10.2.71.136:/export/buildlogs/puppet-files (bm-sun-xf01, in MPT).  Files are served from:
 staging: /N/staging
 mpt: /N/production
 mv: /N/production
 scl: /builds/production
surprisingly, I don't see any symlinks trying to confuse the issue - that's a first!

So this needs to be fixed.  We should get rid of NFS and the /N mountpoint entirely on these systems, in fact.

As we move more things to use modules, we'll have much more of our puppet stuff in hg, so we'll have that much less need to sling files between puppet servers - just packages, really.  And packages are a lot easier to keep track of, since they have long unique names.

fox2mike, does it make sense to put (fairly large) package files in svn?  If so, can we set that up?  I'll open a new bug for it on request.


(In reply to comment #0)
> NFS doesn't scale the way releng needs it to and causes bootup problems if NFS
> is unavailable.
> 
> IT doesn't use NFS for puppet - releng should move away from this method.

Totally agree - having NFS as single point of failure here is not good. Well spotted.


(In reply to comment #3)
> I don't know; I'm supposed to get right back on top of old bugs once I'm done
> with santa clara stuff. Off to John for prioritization.

Sorry for delayed response, missed this. Found in triage. 


If fox2mike says its ok for Dustin can store these packages in an svn/hg repo, lets set that private repo up, and then RelEng can rework things so that puppet does not need to NFS mount.
fox2mike, can you set that up for us?  It'd be best to have a generic repository, name perhaps 'build-secrets'?  Then we can use subdirectories of that for puppet, opsi, and whatever else we find a need for.
Assignee: joduinn → shyam
Can you file an IT bug stating what needs to be done and have that block this? I'm not really sure what you want me to setup at this stage? 

(In reply to comment #4)

> fox2mike, does it make sense to put (fairly large) package files in svn?  If
> so, can we set that up?  I'll open a new bug for it on request.

If by that you mean RPMs, no. We have other methods to distribute RPMs..like our own repositories. We should look into that. 

Which is why I'd like another bug vs having this assigned to me. It'll be easier to discuss/lock down and decide what you guys need.
Assignee: shyam → nobody
I don't think it matters if MPT and staging use the NFS mount for their files. SCL and MV certainly shouldn't though.

From mv-p-p:
/dev/sdb1 on /N type ext3 (rw,noatime)


It looks like scl-p-p is, and I don't know why.
Shouldn't everything in RelEng be consistent?
(In reply to comment #10)
> Shouldn't everything in RelEng be consistent?

YES, yes, 1,000x yes.

All four puppet masters should be configured identically, and I don't see any reason to use NFS on any of them, since staging and production will not use the same set of puppet files anyway.
Fine by me. I think it's more important that the paths on the filesystem are the same though, and less so that the physical storage mediums are.
(In reply to comment #4)
> all four masters mount /N from 10.2.71.136:/export/buildlogs/puppet-files
> (bm-sun-xf01, in MPT).  Files are served from:
>  staging: /N/staging
>  mpt: /N/production
>  mv: /N/production
>  scl: /builds/production
> surprisingly, I don't see any symlinks trying to confuse the issue - that's a
> first!

This is incorrect right now - mv-production-puppet has /N mounted at /dev/sdb1 locally.  Maybe the NFS mount was accidentally put in place and is now removed?
(In reply to comment #13)
> (In reply to comment #4)
> > all four masters mount /N from 10.2.71.136:/export/buildlogs/puppet-files
> > (bm-sun-xf01, in MPT).  Files are served from:
> >  staging: /N/staging
> >  mpt: /N/production
> >  mv: /N/production
> >  scl: /builds/production
> > surprisingly, I don't see any symlinks trying to confuse the issue - that's a
> > first!
> 
> This is incorrect right now - mv-production-puppet has /N mounted at /dev/sdb1
> locally.  Maybe the NFS mount was accidentally put in place and is now removed?

Huh, weird. I have no idea. In any case, let's drop it wherever it still happens to be.
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
I don't think we're actually using NFS at this point - I think there was a mounting error.  Which is fun, since these boxes are all lovingly handcrafted.  I'm working on fixing that in bug 659005.

The new puppet infrastructure will need to solve this problem, but the background in this bug won't be necessary, so .. WORKSFORME.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.