Closed
Bug 799662
Opened 12 years ago
Closed 11 years ago
mdn: stage & prod: add cron `manage.py cron build_sitemaps`
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: groovecoder, Assigned: cturra)
References
Details
Need to add a cron job to build sitemaps for MDN once per day: manage.py cron build_sitemaps Note: need to run as a user with permission to write files to developer.mozilla.org/kuma/media/
Assignee | ||
Comment 1•12 years ago
|
||
:groovecoder - i have added this cron to dev and stage. it is set to run 2 minutes after midnight each evening. once it has run successfully this evening, we can push this to prod.
Assignee: server-ops-webops → cturra
Status: NEW → ASSIGNED
Comment 2•12 years ago
|
||
Noting for the record: In order to do this without very strange side-effects, we also changed the deploy process for stage and prod slightly. Kumascript is now restarted by Chief directly, explicitly. Previously, it was restarted during a particular step in the "update-www.sh" script (specifically sync-extras.sh) on the nodes, that just updates the content on the web nodes via rsync to the admin node. That was set up before Chief, and was a trick to make sure kumascript would get restarted if/when a deploy happened. With Chief, we can move this step right into Chief, and simplify the overall process. Without doing this, the side-effect would have been that when this cron that simply makes a file in /media/ runs, it would have also caused a restart of kumascript. Not horrible, but definitely unexpected and non-ideal.
Reporter | ||
Comment 3•12 years ago
|
||
Thanks :jakem. :cturra - did this run last night? I don't see the media/sitemap* files on stage.
Assignee | ||
Comment 4•12 years ago
|
||
:groovecoder - unfortunately there were unrelated puppet errors on the admin node, so these cron updates never made it onto the server :( i have resolved this now and the cron's are in place as expected. so we don't need to wait until tomorrow for this to run, i have changed the execution time to 11:02 so should be able to check this shortly thereafter.
Reporter | ||
Comment 5•12 years ago
|
||
OSError: [Errno 13] Permission denied: '/data/developer-stage/www/developer.allizom.org/kuma/media/sitemaps'
Assignee | ||
Comment 6•12 years ago
|
||
:groovecoder - sorry for the delay here. the root cause of these not showing up as expected was the result of some missing ssh keys on the developer admin node in the web cluster. to make the url's a little more clear, i have setup an apache alias to make the sitemaps available at these location for each environment (dev/stage for now): https://developer-dev.allizom.org/sitemaps/ https://developer.allizom.org/sitemaps/
Reporter | ||
Comment 7•12 years ago
|
||
Okay this is weird. We use https://github.com/mozilla/kuma/blob/master/configs/htaccess-without-mindtouch#L17 to publish a robots.txt, sitemap index file, and the individual sitemaps. But, https://developer-dev.allizom.org/robots.txt doesn't work and I don't see a webroot/.htaccess symlink to it on dev, stage, nor prod.
Assignee | ||
Comment 8•12 years ago
|
||
:groovecoder - how do you expect these files to be published and served on the web nodes? is this supposed to be done when running the `build_sitemaps` cron, or are there others required? looking at the .htaccess file in kuba/media/ i don't see any directives in there that would match your htaccess-without-mindtouch file.
Reporter | ||
Comment 9•12 years ago
|
||
Right. https://github.com/mozilla/kuma/blob/master/configs/htaccess-without-mindtouch#L17 contains the directives to publish/enable the sitemap files. There's *supposed* to be a webroot/.htaccess -> htaccess-without-mindtouch symlink. But it seems like that's missing. The missing symlink would also explain why: https://developer-dev.allizom.org/contests/ https://developer-dev.allizom.org/es4 are broken, as well as missing CORS headers (bug 720068). I'm not sure why that symlink is missing but we need to restore it on dev and stage and then a whole bunch of stuff should work. Then we'll want to do the same on prod.
Assignee | ||
Comment 10•12 years ago
|
||
i have added this symlink per your request, but we appear to be getting the same results (http 404). as an fyi -- in other projects, such as bedrock (www.mozilla.org), we manage all the rewrites/etc in the apache configs directly. not saying what you're trying to do is incorrect or won't work however. [cturra@developer1.dev.webapp.scl3 webroot]$ grep -i "documentroot" /etc/httpd/mozilla/domains/developer-dev.allizom.org.conf DocumentRoot "/data/www/developer-dev.allizom.org/kuma/webroot" [cturra@developer1.dev.webapp.scl3 webroot]$ pwd /data/www/developer-dev.allizom.org/kuma/webroot [cturra@developer1.dev.webapp.scl3 webroot]$ ls -la .htaccess lrwxrwxrwx 1 root root 37 Oct 17 14:24 .htaccess -> ../configs/htaccess-without-mindtouch
Comment 11•12 years ago
|
||
Probably just need a directory block with an "AllowOverride all" directive in it in the Apache config. Like this: <Directory /data/www/developer-dev.allizom.org/kuma/webroot> AllowOverride all </Directory> In general I prefer doing rewrites/redirects in a .htaccess file, because I prefer that webdevs be able to manage them without IT/webops involvement. This also goes for cache-control headers, and maybe a few other things. Apache .htaccess files are tailor-made for allowing the users (webdevs) to nudge the webserver in the right direction without having to touch the main config.
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #11) > Probably just need a directory block with an "AllowOverride all" directive > in it in the Apache config. Like this: > > <Directory /data/www/developer-dev.allizom.org/kuma/webroot> > AllowOverride all > </Directory> it's already present... <Directory /data/www/developer-dev.allizom.org/kuma/webroot> Options +FollowSymLinks AllowOverride All </Directory> > In general I prefer doing rewrites/redirects in a .htaccess file, because I > prefer that webdevs be able to manage them without IT/webops involvement. > This also goes for cache-control headers, and maybe a few other things. > Apache .htaccess files are tailor-made for allowing the users (webdevs) to > nudge the webserver in the right direction without having to touch the main > config. i agree completely!
Comment 13•12 years ago
|
||
I know what's wrong here... we probably have the same problem in other sites and just haven't noticed it, and/or are working around it without realizing. WSGIScriptAlias / /data/www/developer-dev.allizom.org/kuma/wsgi/kuma.wsgi This conflicts with the DocumentRoot. We are essentially remapping the DocumentRoot to go to a mod_wsgi application. That means anything *in* the DocumentRoot is inaccessible. /media/ and such still work, because they're different URL paths... they don't overlap the exact same path like WSGIScriptAlias and DocumentRoot currently do. Apache never reads the .htaccess file because it's instead following the alias to the mod_wsgi app. The solution (or at least *a* solution) is to put the wsgi app somewhere else. Judging by the contents of that .htaccess file, it looks like it used to be at /mswgi. We can move it there again easily. However, in so doing we risk breaking anything that relies on the root to be the django app. There will need to be a RewriteRule that sends anything not a static file over to the mwsgi app. The .htaccess file has such a RewriteRule in it already. At a glance it seems to be okay, but I haven't tested exhaustively. Once we change this over, likely some things will be broken until either the Apache config or the .htaccess file are updated to do the right thing. Let's do this tomorrow (on -dev), when more people will be around to notice and help out. The new line would look like this: WSGIScriptAlias /mwsgi /data/www/developer-dev.allizom.org/kuma/wsgi/kuma.wsgi I think we will have a problem with the "RewriteBase /" line in the .htaccess file... we can experiment after the change is made. I'm somewhat uncomfortable with the CORS logic/pattern here too... another thing we can play with once the change is made.
Reporter | ||
Comment 14•12 years ago
|
||
Thanks Jake. I'm around today. Let's do this whenever Raymond is available to test -dev.
Reporter | ||
Comment 15•12 years ago
|
||
Also still getting: IOError: [Errno 13] Permission denied: '/data/developer-stage/www/developer.allizom.org/kuma/media/sitemap.xml'
Comment 16•12 years ago
|
||
The change in comment 13 is in place. Let us know of any breakage and/or incorrect functionality. It seems the sitemap.xml file works now, but I suspect other things will be broken.
Comment 17•12 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #16) > The change in comment 13 is in place. Let us know of any breakage and/or > incorrect functionality. > > It seems the sitemap.xml file works now, but I suspect other things will be > broken. I'm still looking at the sitemap.xml files on dev. I'll update bug if i find anything out of the ordinary
Reporter | ||
Comment 18•12 years ago
|
||
I think it's doing well on -dev. Let's move on to -stage and then maybe prod today?
Assignee | ||
Comment 19•12 years ago
|
||
:groovecoder - looks like :jakem applied these changes to stage at the same time as he did dev. can you please test and confirm both of these environments function as expected?
Reporter | ||
Comment 20•12 years ago
|
||
https://developer.allizom.org/sitemap.xml is an empty file but I didn't get any errors from the cron job on stage? I got a permission denied error when I tried to run it myself - presumably because my account on the stage server can't write files.
Assignee | ||
Comment 21•12 years ago
|
||
:groovecoder - we were getting the following error from the cron because user `apache` was trying to do a deploy, but doesn't have the access to do this... [localhost] err: rsync: failed to set times on "/data/developer-stage/www/developer.allizom.org/kuma": Operation not permitted (1) [localhost] err: rsync: mkstemp "/data/developer-stage/www/developer.allizom.org/kuma/media/.humans.txt.6Yl497" failed: Permission denied (13) [localhost] err: rsync: mkstemp "/data/developer-stage/www/developer.allizom.org/kuma/media/.sitemap.xml.iGOiZk" failed: Permission denied (13) [localhost] err: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6] as a result i have changed the way we do the `build_sitemaps` cron on the web cluster admin node for stage to store all this in a netapp mount. each web head in the cluster has access to the same netapp volume, so will immediately receive these updated files. this will be the same route we will want to go in prod.
Reporter | ||
Comment 22•12 years ago
|
||
Stage sitemaps look good now. Ready for prod when :retornam can help us test.
Reporter | ||
Comment 23•12 years ago
|
||
I'm still getting cron error emails with this: From: root@developeradm.private.scl3.mozilla.com Subject: Cron <apache@developeradm> cd /data/developer-stage/src/developer.allizom.org/kuma; python2.6 manage.py cron build_sitemaps IOError: [Errno 13] Permission denied: '/data/developer-stage/src/developer.allizom.org/kuma/media/sitemaps/en-US/sitemap.xml' Do we need to make a change to the cron job itself?
Assignee | ||
Comment 24•12 years ago
|
||
:groovecoder - looks like there were still some directories and files owned by root:root in media/sitemaps/. i have updated these ownerships to be correct now. i will schedule this sitemaps update to prod for monday (11/05).
Reporter | ||
Comment 25•12 years ago
|
||
sorry to overload this bug, but got some permission errors on another stage cron job: Cron <apache@developeradm> cd /data/developer-stage/src/developer.allizom.org/kuma; python2.6 manage.py cron humans_txt IOError: [Errno 13] Permission denied: '/data/developer-stage/src/developer.allizom.org/kuma/media/humans.txt'
Assignee | ||
Comment 26•12 years ago
|
||
:groovecoder - while i was changing the way we do the `build_sitemaps` i also applied this same change to the way `humans.txt` is updated. as it turns out, i hadn't set the permissions correctly on that file -- this has been corrected now. $ whoami apache $ cd /data/developer-stage/src/developer.allizom.org/kuma; python2.6 manage.py cron humans_txt $ echo bash-4.1$ echo $? 0
Assignee | ||
Comment 27•12 years ago
|
||
:groovecoder - this has all now been pushed to prod. https://developer.mozilla.org/sitemap.xml
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 28•11 years ago
|
||
Looks like the cron job might have an error? the sitemap.xml files aren't updating?
Assignee | ||
Comment 29•11 years ago
|
||
i had configured the prod cron to update the stage sitemaps :( oops! this has been fixed now. https://developer.mozilla.org/sitemap.xml $ svn diff Index: modules/webapp/files/developer/admin/etc-cron.d/developer =================================================================== --- modules/webapp/files/developer/admin/etc-cron.d/developer (revision 55880) +++ modules/webapp/files/developer/admin/etc-cron.d/developer (working copy) @@ -7,7 +7,7 @@ 11 6,18 * * * apache cd /data/developer/src/developer.mozilla.org/kuma; python2.6 manage.py update_product_details # bug 799662 -0 5 * * * apache cd /data/developer-stage/src/developer.allizom.org/kuma; python2.6 manage.py cron build_sitemaps +0 5 * * * apache cd /data/developer/src/developer.mozilla.org/kuma; python2.6 manage.py cron build_sitemaps 0 0 * * * apache cd /data/developer/src/developer.mozilla.org/kuma; python2.6 manage.py cron humans_txt
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•