When: next TCW, roughly 15 minutes in duration System(s) affected: builds / treeherder / partner builds Notifs: usual TCW comms Point: cknowles, selenamarie-or-delegate Plan: To help with the evacuation of product delivery, we are going to unmount the existing pvtbuilds NFS mount (living on soon-to-be-off-warranty hardware), and replace it with a same-named, smaller, empty pvtbuilds mount on supported hardware. This will buy time for legacy code to be migrated, past our warranty deadline. "Unmount from all boxes, switch the volume on the filer, remount" covers the window; rollback is same in the other direction. The original volume will be kept, offline but recoverable, for ~1 week before being deleted.
Reviewed 11/18 and scheduled for 11/21/2015 TCW
Change Request: ? → approved
Work completed on schedule - :selenamarie confirmed that things look good post the remount - closing out.
Assignee: server-ops-webops → cknowles
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Just wondering if the remount was mounted with the same permissions It seems we have quite a few errors like this Return code: 1 Failed to log stats. Exception = [Errno 185090050] _ssl.c:340: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib Return code: 1 rsync error: error in file IO (code 11) at main.c(587) [Receiver=3.0.9] rsync: connection unexpectedly closed (9 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] Return code: 12 Unable to rsync /builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/upload to pvtbuilds.pvt.build.mozilla.org:/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-4/20151121143005! Failed to upload /builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/upload to firstname.lastname@example.org:/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-4/20151121143005! http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-nexus-4/1448145005/b2g_b2g-inbound_nexus-4_dep-bm73-build1-build139.txt.gz or Cron <b2gbld@upload-cron> nice -n 19 find /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds -mindepth 2 -maxdepth 2 -not -wholename '*/mozilla-b2g30_v1_4-hamachi*' -not -wholename '*/*-flame*' -type d -mtime +20 -print0 | xargs -0 rm -rf Inbox x Cron Daemon Cron Daemon <email@example.com> 7:00 PM (3 hours ago) to release rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_critical.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_error.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_fatal.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_info.log': Permission denied
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The vol is exported from the filer with the same filer perms, and mounted with the same client perms. However, it looks like the data copied over did not retain the perms of the original: A temporary / read-only copy of the old volume: [firstname.lastname@example.org ~]# ls -l /tmp/qq/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json -rw-rw-r-- 1 b2gbld b2gbld 5928 Oct 31 23:11 /tmp/qq/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json The new prod volume: [email@example.com ~]# ls -l /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json -rw-rw-r-- 1 root root 5928 Oct 31 23:11 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json I can't change these (well, I COULD but I don't know what I'm doing there). Basically, you probably have some mass chowns needed.
How was the data copied that lost the perms in the first place? If this is causing errors, can we match the file permissions from the old volume (is that the desired end state here)? Either by using rsync (if it's some known exact subset), or by using a script that looks at the files in the new mount point and matches the perms from the old one?
Hi, this impacts now mozilla-central, mozilla-inbound and b2g-inbound tree with the device builds at least like https://treeherder.mozilla.org/logviewer.html#?job_id=3415635&repo=b2g-inbound 23:49:28 INFO - rsync: mkdir "/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-flame-kk-eng/20151122215327" failed: Permission denied (13) so raising this as blocker, since this is a perma failure on the affect trees
Severity: normal → blocker
closed affected trees due to mass perma failures of the affected buildbot device builds
I'd like to suggest we run these commands to get the tree re-opened: chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly There may be other issues but that should cover the bulk of immediate problem. It's simply setting the group on the top level of the given directories, then fixing root:root ownership of everything within that.
That's on pvtbuilds2.dmz.scl3.
============================= old permissions before change =============================== firstname.lastname@example.org ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds drwxr-s--- 38 b2gbld root 4096 Nov 22 00:48 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [email@example.com ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly drwxr-s--- 20 b2gbld root 4096 Nov 20 19:28 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly [ firstname.lastname@example.org ~]# chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [email@example.com ~]# chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly ==================== Permissions after change ======================== [firstname.lastname@example.org ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds drwxr-s--- 38 b2gbld b2gbld 4096 Nov 22 00:48 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [email@example.com ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly drwxr-s--- 20 b2gbld b2gbld 4096 Nov 20 19:28 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly [firstname.lastname@example.org ~]#
reopened the trees for now and retriggerd builds
Alright - Let me know if you need anything from the storage side.
[15:41:10] <gcox> Tomcat|Sheriffduty: Heya, bug 1223956 was marked a blocker overnight. Is it still blocking, did the chowns fix it, or are we still waiting to learn more? [15:41:41] <Tomcat|Sheriffduty> gcox: oh its ok now, the fix fixed this [15:42:13] <Tomcat|Sheriffduty> and trees are now open again
Severity: blocker → normal
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago → 3 years ago
Resolution: --- → FIXED
34 automation job failures were associated with this bug yesterday. Repository breakdown: * b2g-inbound: 22 * mozilla-inbound: 6 * mozilla-central: 6 Platform breakdown: * b2g-device-image: 34 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1223956&startday=2015-11-23&endday=2015-11-23&tree=all
36 automation job failures were associated with this bug in the last 7 days. Repository breakdown: * b2g-inbound: 25 * mozilla-central: 6 * mozilla-inbound: 5 Platform breakdown: * b2g-device-image: 36 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1223956&startday=2015-11-23&endday=2015-11-29&tree=all
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.