There may be an issue with maintaining the correct ownership/permissions on the symbols upload directory. It may be related to a production push. A work-around has been put in place in Puppet. This bug is to investigate whether there is actually a problem, and if so, what the root cause (and solution) are. (See bug 1024241)
As evidenced by errormill, the problem occurred again on 23 June 2014. The fact that it spiked once and then disappeared further supports the theory that *something* is irregularly altering the permissions of the directory, but the Puppet work-around fixed the issue (as expected). Given that the timing is highly irregular (it has only occurred once since the initial fix on 12 June 2014), this isn't a blocker; however, it *is* a bug that needs to be investigated and solved. Off the top of my head I would propose implementing some sort of inotify-based monitoring of the directory in order to track the timing and behaviour of processes that interact with this directory (the caveat being that I don't know how NFS plays into this plan).  https://errormill.mozilla.org/webtools/socorro-prod/group/168852/
Assignee: nobody → dmaher
08:59:56 < phrawzty> did we do a prod push on 23 June ? 09:00:12 < peterbe> phrawzty: yes we did. 09:00:30 < peterbe> later afternoon pacific time This is the likely culprit, imho. The prod push process should be picked through in order to see if there's a point at which the symbols_upload directory is affected in some way.
It definitely sounds like the prod push is the problem. What permissions need to be set and against what after a push completes? We can work it into the push script.
(In reply to Chris Lonnen :lonnen from comment #3) > It definitely sounds like the prod push is the problem. What permissions > need to be set and against what after a push completes? We can work it into > the push script. [email@example.com socorro]$ stat symbols_upload/ | grep Uid Access: (2775/drwxrwsr-x) Uid: ( 48/ apache) Gid: (10000/ socorro) ^ Note the setgid.
I'm not sure where to set this flag in the process. Early in the release we could set it on the tarball that is rsync'd from the admin node, or later in the release `socorro1.webapp.phx1 ~]$ cat /data/bin/update-www.sh` calls $EXTRAS_SCRIPT, and we could do it there.
Went to the update script in /data/crashstats/www/crash-stats.mozilla.org/ and tried to make the appropriate changes only to have them overwritten every push. Could not find the code in puppet, puppet-ls knew nothing, and rbryce grepped through the whole tree for me to confirm it wasn't in another module. Eventually found out that /data/crashstats/src/crash-stats.mozilla.org/ is copied to /data/crashstats/www/crash-stats.mozilla.org/ every push. Added the following three lines after every push: # change upload permissions to work with django/apache echo "chown apache /mnt/socorro/symbols_upload" | sudo issue-multi-command -i /root/.ssh/id_dsa crashstats echo "chmod 2664 /mnt/socorro/symbols_upload" | sudo issue-multi-command -i /root/.ssh/id_dsa crashstats After the next push: $ stat /mnt/socorro/symbols_upload File: `/mnt/socorro/symbols_upload' Size: 4096 Blocks: 8 IO Block: 65536 directory Device: 15h/21d Inode: 75104320 Links: 3 Access: (2664/drw-rwSr--) Uid: ( 48/ apache) Gid: ( 0/ root) "socorro" isn't a known group on the box, so I left it with setting the user and not the group. $ cat /etc/group | grep ^socorro # => nothin' I also set perms to 2664, which agrees with puppet but not comment #4
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
I believe these settings work for the purposes of the cron. The symbols upload job recovered after making this change. I checked to confirm puppet hadn't overwritten it with anything and indeed, stat returns the same as comment #6
You need to log in before you can comment on or make changes to this bug.