1010327 - Permission to delete from /mnt/socorro/symbols_upload/ on stage

Reporter

Description

•

11 years ago

When the SymbolsUnpackApp crontabber app runs on stage, it reads from /mnt/socorro/symbols_upload/ and for every .zip (or .tar, tgz etc) it finds it unpacks it into /mnt/socorro/symbols and the DELETES the archive file. It appears it doesn't have permission to do so here. The reason for this is so it knows what has been unpacked already and doesn't need unpack it again. See attached URL.

Daniel Maher [:phrawzty]

Assignee

Comment 1

•

11 years ago

https://errormill.mozilla.org/webtools/socorro-stage/group/168826/

Peter Bengtsson [:peterbe]

Reporter

Comment 2

•

11 years ago

Ha! Just what I've always suspected. People miss the URL attribute on the bug data. I'll go back to including the URL in the bug description instead.

Daniel Maher [:phrawzty]

Assignee

Comment 3

•

11 years ago

The problem is that the symbols_upload needs to be writable by both the apache and socorro users. I've the ownership of that directory to "apache.socorro", and ensured mode 2775, which would normally enforce the group permissions (setgid), thus allowing both users to r/w as expected. Unfortunately, the setgid appears to be ignored when the directory is used as an NFS mountpoint. I added 'suid' to the list of mount options (in Puppet), but that didn't appear to have any effect. WIP.

Status: NEW → ASSIGNED

Daniel Maher [:phrawzty]

Assignee

Comment 4

•

11 years ago

:gcox, the tl;dr is that I'd like both the "apache" and "socorro" users to be able to r/w, without resorting to something horrible like adding socorro to the apache group (or vice versa). In a non-NFS-mount situation the solution would be to simple set the ownership as apache.socorro and to activate setgid to enforce the ownership and perms throughout. Is there a way to do this once the mountpoint is active? Put another way, is it possible for socorroadm.stage.private.phx1:/mnt/socorro/symbols_upload/ (mountpoint for 10.8.75.14:/symbols_upload/stage) to be owned by apache.socorro with perms 2775 ?

Peter Bengtsson [:peterbe]

Reporter

Comment 5

•

11 years ago

Still a problem. The new file I uploaded still errored https://errormill.mozilla.org/webtools/socorro-prod/group/168852/

Greg Cox [:gcox]

Comment 6

•

11 years ago

As a general rule, the set[ug]id is squashed off on the filer exports. I've enabled it back for 10.8.75.48 on stage, and tested that it did what I'd expect: [root@socorroadm.stage.private.phx1 ~]# mkdir /mnt/socorro/symbols_upload/foo [root@socorroadm.stage.private.phx1 ~]# chown apache.socorro /mnt/socorro/symbols_upload/foo [root@socorroadm.stage.private.phx1 ~]# chmod 2775 /mnt/socorro/symbols_upload/foo [root@socorroadm.stage.private.phx1 ~]# sudo -u apache touch /mnt/socorro/symbols_upload/foo/apa [root@socorroadm.stage.private.phx1 ~]# sudo -u socorro touch /mnt/socorro/symbols_upload/foo/soc [root@socorroadm.stage.private.phx1 ~]# ls -al /mnt/socorro/symbols_upload/foo total 8 drwxrwsr-x 2 apache socorro 4096 May 15 11:34 . drwxrwsr-x 4 socorro apache 4096 May 15 11:33 .. -rw-r--r-- 1 apache socorro 0 May 15 11:34 apa -rw-r--r-- 1 socorro socorro 0 May 15 11:34 soc * If you want me to turn that up for the webapp nodes in stage, just say so. * I assume you'll want it in the prod version of this volume, for parallelism. There are more nodes there (socorroadm.private and sp-admin01, as well as the webapp nodes), so, again, just ping me with what you'd like not-squashed.

Greg Cox [:gcox]

Comment 7

•

11 years ago

Per IRC, added the suid permissions on the filer for webheads and admin nodes, stage and prod.

Daniel Maher [:phrawzty]

Assignee

Comment 8

•

11 years ago

Great, thanks :gcox ! The webapp nodes in stage don't need it since nothing runs as user socorro on those nodes (at least, not for now :P ). I have also now realised that extended attrs are not possible either, which means that I cannot guarantee permission inheritance within the tree via setfacl as previously planned (doh). That said, there is still a solution: since the parent directory now honours the group ID (if not the permissions), if the interacting processes (the webapp on the webheads, and crontabber on admin) set g+rw, we should end up with the desired behaviour. To illustrate: * On the webhead, Apache writes out files to symbols_upload/, and since the directory is setgid, they are owned by "apache.socorro". * On the webhead, Apache sets g+rw (recursively) on the files it has written. * Since symbols_upload/ is a mount, the ownership and permissions are the same on the admin node. * On the admin node, Crontabber (user socorro) can read and write those files - and ultimately delete them. This will require a small amount of additional functionality in the code.

Daniel Maher [:phrawzty]

Assignee

Comment 9

•

11 years ago

Oops - my bad, the webapp nodes in stage *do* need it. Ignore that sentence. The rest of comment 8 is accurate afaik. ;)

Daniel Maher [:phrawzty]

Assignee

Comment 10

•

11 years ago

I will be doing further NFS ACL tests (see bug 1011497), but if that doesn't work, then we'll just have to ensure that the code pulls a "chmod -R g+rw" every time symbols_upload/ gets touched (jaaaaaank).

Daniel Maher [:phrawzty]

Assignee

Comment 11

•

11 years ago

21:21:45 <gcox> So, banged my head on the facl thing. If possible, it'd be good if we can get the OS upgraded on the stage box. I don't think that's the solution, but that a few things are behind would at least eliminate question marks. I'll keep on it.. but I think the NFS side may be a filer problem. It's just a REALLY weird problem. I'd love to have you amplify this statement. :) In particular, what does "OS upgrade" mean exactly (just a yum update, or a dist upgrade, or something else?), and what problem do you expect said upgrade to solve ? (/me resists urge to needinfo ;) )

Daniel Maher [:phrawzty]

Assignee

Comment 12

•

11 years ago

Fwiw, nfs4_setfacl returns "Failed setxattr operation: Input/output error" every time I try to use it in symbols_upload, so I guess something is, indeed, amiss.

Greg Cox [:gcox]

Comment 13

•

11 years ago

> I'd love to have you amplify this statement. :) In particular, what does > "OS upgrade" mean exactly (just a yum update, or a dist upgrade, or > something else?), and what problem do you expect said upgrade to solve ? Yum update was all I was thinking. There are bits of info that suggest rhel6.5 is better than earlier releases for doing nfsv4. If I need it I'll call for it. This was more "if you were thinking about doing one anyway, go for it." > Fwiw, nfs4_setfacl returns "Failed setxattr operation: Input/output error" > every time I try to use it in symbols_upload, so I guess something is, > indeed, amiss. Yeah. This was all guinea-pigging and not-ready-to-launch. I have seen this behave in not-your-case places (other clients, other exports on the filer) but have yet to fully bisect to figure out where the issue is.

Peter Bengtsson [:peterbe]

Reporter

Comment 14

•

11 years ago

phrawzty, How about this for a curve-ball... When I built it I made the webapp write the zips for a directory and nothing else. Then a cronjob unpacks those zip files and puts them on ted's symbol server. How about we re-architecture it significantly so that the webapp also deals with unpacking the zip file on-the-fly straight into the final destination. It's just a suggestion. It'd mean the upload might be more fragile and slightly slower but at least it could potentially simplify the whole NFS story significantly. What do you think?

Daniel Maher [:phrawzty]

Assignee

Comment 15

•

11 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #14) > phrawzty, > > How about this for a curve-ball... > > When I built it I made the webapp write the zips for a directory and nothing > else. Then a cronjob unpacks those zip files and puts them on ted's symbol > server. > > How about we re-architecture it significantly so that the webapp also deals > with unpacking the zip file on-the-fly straight into the final destination. > > It's just a suggestion. It'd mean the upload might be more fragile and > slightly slower but at least it could potentially simplify the whole NFS > story significantly. > > What do you think? The webheads only have "symbols_upload" mounted on them. If the final destination is "symbols", then we'd need to have that mount set up on the webheads as well. Furthermore, if that mount is already on the webheads, and it's the final destination, then there would be no need for "symbols_upload". I agree that this is a more simple (and therefore superior) approach, and I would be happy to see it implemented; however, it does not actually solve the underlying permissions problem. That issue still needs to be resolved either way.

Daniel Maher [:phrawzty]

Assignee

Updated

•

11 years ago

Depends on: 1015187

Laura Thomson :laura

Updated

•

11 years ago

Component: WebOps: Socorro → Infra

Product: Infrastructure & Operations → Socorro

QA Contact: nmaul

Daniel Maher [:phrawzty]

Assignee

Comment 16

•

11 years ago

Turns out that you can specify a default umask for a WSGI Daemon[1]. Combined with the setgid bit, this has the net effect of forcing newly created directories to inherit the permissions and ownership of the root dir - which, combined with setting newly uploaded files g+w in Django itself, would seem to meet all of our conditions for success. I will add this setting to Stage in Puppet now. --- [dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ pwd /data/socorro/webapp-django/media/symbols_upload [dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat . File: `.' Size: 4096 Blocks: 8 IO Block: 65536 directory Device: 14h/20d Inode: 75235392 Links: 3 Access: (2775/drwxrwsr-x) Uid: ( 48/ apache) Gid: (10000/ socorro) Access: 2014-06-05 03:08:31.116646000 -0700 Modify: 2014-06-05 03:08:26.242544000 -0700 Change: 2014-06-05 03:08:26.242544000 -0700 [dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat 2014 File: `2014' Size: 4096 Blocks: 8 IO Block: 65536 directory Device: 14h/20d Inode: 75235424 Links: 3 Access: (2775/drwxrwsr-x) Uid: ( 48/ apache) Gid: (10000/ socorro) Access: 2014-06-05 03:08:34.844732000 -0700 Modify: 2014-06-05 03:08:26.242558000 -0700 Change: 2014-06-05 03:08:26.242558000 -0700 [dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat 2014/06/05/cf9dbabf97487b2123085e94943fb3b0.zip File: `2014/06/05/cf9dbabf97487b2123085e94943fb3b0.zip' Size: 183 Blocks: 8 IO Block: 65536 regular file Device: 14h/20d Inode: 75235427 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 48/ apache) Gid: (10000/ socorro) Access: 2014-06-05 03:08:26.244552000 -0700 Modify: 2014-06-05 03:08:26.247550000 -0700 Change: 2014-06-05 03:08:26.247555000 -0700 --- [1] https://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess

Daniel Maher [:phrawzty]

Assignee

Comment 17

•

11 years ago

Note bug 1020901 which, while it is a separate issue, does ultimately impact this bug.

Daniel Maher [:phrawzty]

Assignee

Comment 18

•

11 years ago

Crontabber can successfully remove the archive as expected. Looks like we've got a viable fix (woo!). 2014-06-05 07:18:28,924 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.symbolsunpack.SymbolsUnpackCronApp'> 2014-06-05 07:18:28,979 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.symbolsunpack.SymbolsUnpackCronApp'> on 2014-06-05 14:18:28.950316+00:00 Will roll out to Prod once Puppet comes back from today's maintenance window.

Daniel Maher [:phrawzty]

Assignee

Comment 19

•

10 years ago

Woops, forgot to close this out. :)

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Bugzilla

Permission to delete from /mnt/socorro/symbols_upload/ on stage

Categories

(Socorro :: Infra, task)

Tracking

(Not tracked)

People

(Reporter: peterbe, Assigned: dmaher)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Updated

Comment 16

Comment 17

Comment 18

Comment 19