Test and implement AUS syncing between SJC and PHX

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: cshields, Assigned: aravind)

Tracking

Details

(Reporter)

Description

7 years ago
re: bug 596391.. Currently Nick's scripts sync the data between dm-ausstage01 and dp-ausstage01.  We need to decide on a method of getting this data on to NFS rather than local disk.

ETA Mon/Tue (12/13 or 12/14) during all-hands, will have Aravind around.  Nick is out on paternity leave 12/15.
(Reporter)

Updated

7 years ago
Assignee: cshields → aravind
(Assignee)

Comment 1

7 years ago
Went through 596391 and talked to bhearsum, here is what I plan on doing.

This is mostly for nthomas to comment on.

1) comment out cron job under the ffxbld account on dm-ausstage01
rsync -a --delete /opt/aus2/incoming/2/ dp-ausstage01:/opt/aus2/incoming/2/

2) sync /vol/aus2/ from netapp-b (in mpt) the same volume on the phx netapp.

3) wipe out /opt/aus2 on dp-ausstage01 and mount the synced phx netapp volume there.

4) setup ssh keys on dp-ausstage01 (for cltbld and ffxbld).

Is there anything else you can think of that needs to be done?

Once all that's done, I'd uncomment the cron job in step 1.
I had written this before comment #1 was made, but lets post it anyway for background. There are three ways data will be pushed onto that location:

1) cron job every 10 minutes updating /opt/aus2/incoming/2/Firefox/ (nightly snippets)
2) cron job every day checking content of /opt/aus2/incoming/3/ (release snippets); starts at midnight but can take up to 100 mins [it makes a tarball in MPT first and that's gated on NFS speed there. And that's with a with local disk, impact of netapp move TBD]
3) pushing new snippets from releases - on demand from release drivers

Until we're serving out of PHX we can ignore 1) and 2) to make NFS changes. Just let RelEng know that we will get cron mail about failures, or we can disable them until you're done.

To get status on 3) you'd have to ask RelEng. 4.0b8 is due to start any time now. You should be able to dive into the 'Build' links at https://wiki.mozilla.org/Releases to figure out who's on the hook for any releases in progress.
I'm a little short of time before heading out on paternity leave so verification our side might be fun.

(In reply to comment #1)
> 1) comment out cron job under the ffxbld account on dm-ausstage01
> rsync -a --delete /opt/aus2/incoming/2/ dp-ausstage01:/opt/aus2/incoming/2/

I was going to deploy the other changes very shortly but could do that later in the day.
 
> 2) sync /vol/aus2/ from netapp-b (in mpt) the same volume on the phx netapp.
> 
> 3) wipe out /opt/aus2 on dp-ausstage01 and mount the synced phx netapp volume
> there.

TBH, I'd just mount the PHX netapp on dp-aussstage01 and rsync from there, then swap them over. It'll be a bunch faster, and if I deploy my changes after you're done I can take care of making sure they're in sync.

> 4) setup ssh keys on dp-ausstage01 (for cltbld and ffxbld).

Don't need to do this, ffxbld is already set up and cltbld needs a stake through the heart.
(In reply to comment #3)
> TBH, I'd just mount the PHX netapp on dp-aussstage01 and rsync from there, 

Sorry, this is unclear. I mean populate the PHX netapp from the snippets already in dp-ausstage01:/opt/aus2/incoming
(Assignee)

Comment 5

7 years ago
As I am digging into this more, I noticed that there might be two cron jobs updating this nfs share in phx.  We already have a cron job syncing all of aus2/incoming from mpt to the nfs volume in phx.

I commented out this sync job and mounted this volume on dp-ausstage01 for you guys to verify (/opt/aus2_nfs).  I plan on leaving this sync commented out.

Since this is already being synced up (from our cron job).. I don't think we need to sync anything else here.  Once you guys verify this, I will move that mount point from /opt/aus2_nfs to /opt/aus2.  We might need to adjust uid numbers for the ffxbld user, but other than that it should be good.
I'm doing a dummy rsync between /opt/aus2 and /opt/aus2_nfs, and it's taking significantly longer than the equivalent operations on the local disk. The scripts in bug 596391 are based on the assertion 'nfs is fast in PHX', which doesn't seem to be the case.
(Assignee)

Comment 7

7 years ago
(In reply to comment #6)
> I'm doing a dummy rsync between /opt/aus2 and /opt/aus2_nfs, and it's taking
> significantly longer than the equivalent operations on the local disk. The
> scripts in bug 596391 are based on the assertion 'nfs is fast in PHX', which
> doesn't seem to be the case.

It's *faster* than nfs in mpt.  It will still be slower than local disk.
OK, that might have been mostly my bad.

rsync can't find any differences with
  rsync -ni -av0 --no-o --no-g --delete /opt/{aus2/,aus2_nfs/aus2/}incoming/
(ie itemized dry run which doesn't touch owner or group)

So go ahead and swap them over.
Only cron job 1) from comment #2 is currently in place.
(Assignee)

Comment 10

7 years ago
Okay, /opt/aus2 is now an nfs mount.
Could you organise it so /opt/aus2 contains the current contents of /opt/aus2/aus2 ?
(Assignee)

Comment 12

7 years ago
I am copying the original directory into /opt/aus2/aus2_orig.

I also fixed ffxbld uid/gid to match those on dm-ausstage01.  Please let me know if there is anything else for me to do here.
Hmm, I probably wasn't very clear. Here is what I would like to see:

1) The NFS mount at dp-ausstage01:/opt/ instead of /opt/aus2, so we don't have dp-ausstage1:/opt/aus2/aus2/ and dm-ausstage01:/opt/aus2/
2) The uid and guid for ffxbld working properly, right now on dp-ausstage01 I see
$ whoami; pwd; ls -la
ffxbld
/opt/aus2/aus2
total 32
drwxr-xr-x 8 root root 4096 Jul  3  2009 .
drwxr-xr-x 7 root root 4096 Dec 14 18:00 ..
drwxr-xr-x 4 root root 4096 Jan 19  2010 app
drwxr-xr-x 3  593 2004 4096 Aug 27  2005 build
drwxr-xr-x 3  593 2004 4096 Aug 27  2005 build.old
drwxr-xr-x 3  593 2004 4096 Aug 27  2005 build-with-nightly-l10n
lrwxrwxrwx 1 root root   23 Mar 23  2010 config.php -> app/inc/config-dist.php
drwxr-xr-x 6  593 2004 4096 Sep 15 14:56 incoming
drwxr-xr-x 4  593 2004 4096 Apr  4  2007 snippets
Please also install cvs on dp-ausstage01.
(Assignee)

Comment 15

7 years ago
cvs is installed.

The volume is now mounted on /opt.

593 uid is the cltbld user, which I didn't recreate because you said you want it dead.  I added it to the system and the owners now look similar to what they do on dm-ausstage01.  I added the keys under cltbld on dm-ausstage01 to cltbld/dp-ausstage01
Thanks for adding cvs and fixing the mount, but we're working against each other here. I had dp-ausstage01 and /opt/aus2/ set up just as wanted it before we swapped to NFS. In particular
*) only a ffxbld account on dp-ausstage01, using a new key that hasn't leaked all over
*) unneeded crap from MPT deleted from /opt/aus2
*) remaining files in /opt/aus2 owned by ffxbld:ffxbld

At this point I could you please 
*) grant ffxbld sudo on dp-ausstage01
OR
*) remove cltbld user
*) move /opt/aus2 to /opt/aus2_mpt
*) move /opt/aus2_orig to /opt/aus2
*) 'chown -R ffxbld:ffxbld /opt/aus2'

I need to get this finished up ASAP.
(Assignee)

Comment 17

7 years ago
ffxbld now has full sudo, let me know once you are done with it.
(Assignee)

Comment 18

7 years ago
nthomas: you have sudo now, but /opt/aus2 in now the nfs mount, so before you clear stuff from it.. please think if we need to..  If its just extra data, clean-up can wait for you (for when you are not in a rush to get done)..

But, I will leave that to you.
Too late! I removed aus2-dev/ & aus-prep/ from /opt, and belatedly realized aus-prep/ wasn't something synced from MPT. Sorry about that. Your sync from MPT is still there at /opt/aus2_mpt.

I'll do some testing before we yank the sudo.
You can remove the sudo access again, I'm all done with tweaks.
(Assignee)

Comment 21

7 years ago
sudo access is gone.  Please re-open the bug if there is anything else remaining here.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.