Closed
Bug 1010327
Opened 11 years ago
Closed 10 years ago
Permission to delete from /mnt/socorro/symbols_upload/ on stage
Categories
(Socorro :: Infra, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Assigned: dmaher)
References
()
Details
When the SymbolsUnpackApp crontabber app runs on stage, it reads from /mnt/socorro/symbols_upload/ and for every .zip (or .tar, tgz etc) it finds it unpacks it into /mnt/socorro/symbols and the DELETES the archive file.
It appears it doesn't have permission to do so here. The reason for this is so it knows what has been unpacked already and doesn't need unpack it again. See attached URL.
Assignee | ||
Comment 1•11 years ago
|
||
Reporter | ||
Comment 2•11 years ago
|
||
Ha! Just what I've always suspected. People miss the URL attribute on the bug data. I'll go back to including the URL in the bug description instead.
Assignee | ||
Comment 3•11 years ago
|
||
The problem is that the symbols_upload needs to be writable by both the apache and socorro users. I've the ownership of that directory to "apache.socorro", and ensured mode 2775, which would normally enforce the group permissions (setgid), thus allowing both users to r/w as expected.
Unfortunately, the setgid appears to be ignored when the directory is used as an NFS mountpoint. I added 'suid' to the list of mount options (in Puppet), but that didn't appear to have any effect.
WIP.
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•11 years ago
|
||
:gcox, the tl;dr is that I'd like both the "apache" and "socorro" users to be able to r/w, without resorting to something horrible like adding socorro to the apache group (or vice versa).
In a non-NFS-mount situation the solution would be to simple set the ownership as apache.socorro and to activate setgid to enforce the ownership and perms throughout. Is there a way to do this once the mountpoint is active? Put another way, is it possible for socorroadm.stage.private.phx1:/mnt/socorro/symbols_upload/ (mountpoint for 10.8.75.14:/symbols_upload/stage) to be owned by apache.socorro with perms 2775 ?
Reporter | ||
Comment 5•11 years ago
|
||
Still a problem. The new file I uploaded still errored
https://errormill.mozilla.org/webtools/socorro-prod/group/168852/
![]() |
||
Comment 6•11 years ago
|
||
As a general rule, the set[ug]id is squashed off on the filer exports.
I've enabled it back for 10.8.75.48 on stage, and tested that it did what I'd expect:
[root@socorroadm.stage.private.phx1 ~]# mkdir /mnt/socorro/symbols_upload/foo
[root@socorroadm.stage.private.phx1 ~]# chown apache.socorro /mnt/socorro/symbols_upload/foo
[root@socorroadm.stage.private.phx1 ~]# chmod 2775 /mnt/socorro/symbols_upload/foo
[root@socorroadm.stage.private.phx1 ~]# sudo -u apache touch /mnt/socorro/symbols_upload/foo/apa
[root@socorroadm.stage.private.phx1 ~]# sudo -u socorro touch /mnt/socorro/symbols_upload/foo/soc
[root@socorroadm.stage.private.phx1 ~]# ls -al /mnt/socorro/symbols_upload/foo
total 8
drwxrwsr-x 2 apache socorro 4096 May 15 11:34 .
drwxrwsr-x 4 socorro apache 4096 May 15 11:33 ..
-rw-r--r-- 1 apache socorro 0 May 15 11:34 apa
-rw-r--r-- 1 socorro socorro 0 May 15 11:34 soc
* If you want me to turn that up for the webapp nodes in stage, just say so.
* I assume you'll want it in the prod version of this volume, for parallelism. There are more nodes there (socorroadm.private and sp-admin01, as well as the webapp nodes), so, again, just ping me with what you'd like not-squashed.
![]() |
||
Comment 7•11 years ago
|
||
Per IRC, added the suid permissions on the filer for webheads and admin nodes, stage and prod.
Assignee | ||
Comment 8•11 years ago
|
||
Great, thanks :gcox ! The webapp nodes in stage don't need it since nothing runs as user socorro on those nodes (at least, not for now :P ).
I have also now realised that extended attrs are not possible either, which means that I cannot guarantee permission inheritance within the tree via setfacl as previously planned (doh). That said, there is still a solution: since the parent directory now honours the group ID (if not the permissions), if the interacting processes (the webapp on the webheads, and crontabber on admin) set g+rw, we should end up with the desired behaviour.
To illustrate:
* On the webhead, Apache writes out files to symbols_upload/, and since the directory is setgid, they are owned by "apache.socorro".
* On the webhead, Apache sets g+rw (recursively) on the files it has written.
* Since symbols_upload/ is a mount, the ownership and permissions are the same on the admin node.
* On the admin node, Crontabber (user socorro) can read and write those files - and ultimately delete them.
This will require a small amount of additional functionality in the code.
Assignee | ||
Comment 9•11 years ago
|
||
Oops - my bad, the webapp nodes in stage *do* need it. Ignore that sentence. The rest of comment 8 is accurate afaik. ;)
Assignee | ||
Comment 10•11 years ago
|
||
I will be doing further NFS ACL tests (see bug 1011497), but if that doesn't work, then we'll just have to ensure that the code pulls a "chmod -R g+rw" every time symbols_upload/ gets touched (jaaaaaank).
Assignee | ||
Comment 11•11 years ago
|
||
21:21:45 <gcox> So, banged my head on the facl thing. If possible, it'd be good if we can get the OS upgraded on the stage box. I don't think that's the solution, but that a few things are behind would at least eliminate question marks. I'll keep on it.. but I think the NFS side may be a filer problem. It's just a REALLY weird problem.
I'd love to have you amplify this statement. :) In particular, what does "OS upgrade" mean exactly (just a yum update, or a dist upgrade, or something else?), and what problem do you expect said upgrade to solve ?
(/me resists urge to needinfo ;) )
Assignee | ||
Comment 12•11 years ago
|
||
Fwiw, nfs4_setfacl returns "Failed setxattr operation: Input/output error" every time I try to use it in symbols_upload, so I guess something is, indeed, amiss.
![]() |
||
Comment 13•11 years ago
|
||
> I'd love to have you amplify this statement. :) In particular, what does
> "OS upgrade" mean exactly (just a yum update, or a dist upgrade, or
> something else?), and what problem do you expect said upgrade to solve ?
Yum update was all I was thinking. There are bits of info that suggest rhel6.5 is better than earlier releases for doing nfsv4. If I need it I'll call for it. This was more "if you were thinking about doing one anyway, go for it."
> Fwiw, nfs4_setfacl returns "Failed setxattr operation: Input/output error"
> every time I try to use it in symbols_upload, so I guess something is,
> indeed, amiss.
Yeah. This was all guinea-pigging and not-ready-to-launch. I have seen this behave in not-your-case places (other clients, other exports on the filer) but have yet to fully bisect to figure out where the issue is.
Reporter | ||
Comment 14•11 years ago
|
||
phrawzty,
How about this for a curve-ball...
When I built it I made the webapp write the zips for a directory and nothing else. Then a cronjob unpacks those zip files and puts them on ted's symbol server.
How about we re-architecture it significantly so that the webapp also deals with unpacking the zip file on-the-fly straight into the final destination.
It's just a suggestion. It'd mean the upload might be more fragile and slightly slower but at least it could potentially simplify the whole NFS story significantly.
What do you think?
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #14)
> phrawzty,
>
> How about this for a curve-ball...
>
> When I built it I made the webapp write the zips for a directory and nothing
> else. Then a cronjob unpacks those zip files and puts them on ted's symbol
> server.
>
> How about we re-architecture it significantly so that the webapp also deals
> with unpacking the zip file on-the-fly straight into the final destination.
>
> It's just a suggestion. It'd mean the upload might be more fragile and
> slightly slower but at least it could potentially simplify the whole NFS
> story significantly.
>
> What do you think?
The webheads only have "symbols_upload" mounted on them. If the final destination is "symbols", then we'd need to have that mount set up on the webheads as well. Furthermore, if that mount is already on the webheads, and it's the final destination, then there would be no need for "symbols_upload". I agree that this is a more simple (and therefore superior) approach, and I would be happy to see it implemented; however, it does not actually solve the underlying permissions problem. That issue still needs to be resolved either way.
Updated•11 years ago
|
Component: WebOps: Socorro → Infra
Product: Infrastructure & Operations → Socorro
QA Contact: nmaul
Assignee | ||
Comment 16•11 years ago
|
||
Turns out that you can specify a default umask for a WSGI Daemon[1]. Combined with the setgid bit, this has the net effect of forcing newly created directories to inherit the permissions and ownership of the root dir - which, combined with setting newly uploaded files g+w in Django itself, would seem to meet all of our conditions for success.
I will add this setting to Stage in Puppet now.
---
[dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ pwd
/data/socorro/webapp-django/media/symbols_upload
[dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat .
File: `.'
Size: 4096 Blocks: 8 IO Block: 65536 directory
Device: 14h/20d Inode: 75235392 Links: 3
Access: (2775/drwxrwsr-x) Uid: ( 48/ apache) Gid: (10000/ socorro)
Access: 2014-06-05 03:08:31.116646000 -0700
Modify: 2014-06-05 03:08:26.242544000 -0700
Change: 2014-06-05 03:08:26.242544000 -0700
[dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat 2014
File: `2014'
Size: 4096 Blocks: 8 IO Block: 65536 directory
Device: 14h/20d Inode: 75235424 Links: 3
Access: (2775/drwxrwsr-x) Uid: ( 48/ apache) Gid: (10000/ socorro)
Access: 2014-06-05 03:08:34.844732000 -0700
Modify: 2014-06-05 03:08:26.242558000 -0700
Change: 2014-06-05 03:08:26.242558000 -0700
[dmaher@socorro2.stage.webapp.phx1 symbols_upload]$ stat 2014/06/05/cf9dbabf97487b2123085e94943fb3b0.zip
File: `2014/06/05/cf9dbabf97487b2123085e94943fb3b0.zip'
Size: 183 Blocks: 8 IO Block: 65536 regular file
Device: 14h/20d Inode: 75235427 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 48/ apache) Gid: (10000/ socorro)
Access: 2014-06-05 03:08:26.244552000 -0700
Modify: 2014-06-05 03:08:26.247550000 -0700
Change: 2014-06-05 03:08:26.247555000 -0700
---
[1] https://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess
Assignee | ||
Comment 17•11 years ago
|
||
Note bug 1020901 which, while it is a separate issue, does ultimately impact this bug.
Assignee | ||
Comment 18•11 years ago
|
||
Crontabber can successfully remove the archive as expected. Looks like we've got a viable fix (woo!).
2014-06-05 07:18:28,924 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.symbolsunpack.SymbolsUnpackCronApp'>
2014-06-05 07:18:28,979 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.symbolsunpack.SymbolsUnpackCronApp'> on 2014-06-05 14:18:28.950316+00:00
Will roll out to Prod once Puppet comes back from today's maintenance window.
Assignee | ||
Comment 19•10 years ago
|
||
Woops, forgot to close this out. :)
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•