Closed Bug 1005976 Opened 11 years ago Closed 10 years ago

MDN file attachments serving bad Last-Modified header

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jbennett, Assigned: nmaul)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/196] [unplanned - troubleshooting])

File attachments uploaded to MDN end up on mdn.mozillademos.org, which seems to be serving them back with a Last-Modified header of "Thu, 01 Jan 1970 00:00:00 GMT" regardless of the actual date on which the file was uploaded. Per bug 1005412, this is actively causing problems for some users. Best guess is that somehow ctime/mtime/etc. information is either not getting set when a file is saved, or is not getting from the filesystem to the web server, but we're at a loss as to how to investigate further.
Blocks: 1005412
Whiteboard: [unplanned - troubleshooting]
I don't think this is anything on our end. Here's the file info for the curl link in bug 1005412: [root@developer1.webapp.scl3 attachments]# find . -type f -iname "test-64.png" ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png [root@developer1.webapp.scl3 attachments]# ls -al ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png -rwxr-xr-x 1 apache apache 8635 May 2 18:01 ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png [root@developer1.webapp.scl3 attachments]# stat ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png File: `./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png' Size: 8635 Blocks: 24 IO Block: 65536 regular file Device: 13h/19d Inode: 70583894 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) Access: 2014-05-02 18:01:18.735334000 -0700 Modify: 2014-05-02 18:01:18.737344000 -0700 Change: 2014-05-02 18:01:18.737344000 -0700 Datestamp looks good, as does the directory structure where it's stored. Unless I've got the wrong file altogether... but that was the only thing I found with the name "test-64.png", and the attachment number (7751) matches, so... For clarification, mdn.mozillademos.org does *not* go through a CDN. It's hosted locally out of SCL3, and in fact goes to the same nodes that run developer.mozilla.org. So that's one less thing you can cut out of the possibilities. :) Is there some way to verify how Kuma serves up these files? As you can tell above, the file name does not match up directly to the URL, so Kuma (or something) must be interpreting the request and determining what filesystem path matches to the request. It looks to me like: https://github.com/mozilla/kuma/blob/master/urls.py#L60 handles the request, and https://github.com/mozilla/kuma/blob/master/apps/wiki/views.py#L2380 is where the Last-Modified is set. My best guess is that is somehow misfiring... perhaps attachment.current_revision.created is set incorrectly? I'm going to mark this bug as invalid because I don't see anything on the infra side that could cause this, and you've already got a bug on the dev side. If you do though, please re-open and let us know where to look. Thanks!
Assignee: server-ops-webops → nmaul
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
The data on the MDN/kuma side is correct: [jbennett@developer2.webapp.scl3 kuma]$ python manage.py shell Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48) Type "copyright", "credits" or "license" for more information. In [1]: from wiki.models import Attachment In [2]: a = Attachment.objects.get(id=7751) In [3]: a.current_revision.created Out[3]: datetime.datetime(2014, 5, 2, 18, 1, 18) In [4]: a.get_file_url() Out[4]: 'https://mdn.mozillademos.org/files/7751/test-64.png' The filesystem structure is a bit overcomplicated in order to avoid conflicts between files with the same name; it uses the MD5 of the upload timestamp to work around that. The view in kuma which serves files is here: https://github.com/mozilla/kuma/blob/70b658ba3b9d6567309f91ae51bb01305276ccd6/apps/wiki/views.py#L2370 If the request is on the host we serve files from, it serves the file, setting Last-Modified to 'created' of the current revision. If not, it issues a redirect to the correct host. So: Metadata in the kuma DB is correct. Metadata on the filesystem is correct. If kuma were serving the file, it would be setting the correct Last-Modified header. Which indicates this isn't a bug in kuma, or in how kuma puts files onto the disk.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
For what it's worth, I was able to reproduce a *correct* `Last-Modified` behaviour in Kuma, locally, so this definitely appears to be something between Kuma and the end user (read: me). ``` $ curl -kLIXGET 'https://developer-local.allizom.org/files/1/campipe.gif' HTTP/1.1 301 MOVED PERMANENTLY Date: Tue, 06 May 2014 16:51:39 GMT Server: WSGIServer/0.1 Python/2.6.9 Vary: Cookie,Accept-Encoding X-Frame-Options: DENY Content-Type: text/html; charset=utf-8 Location: https://mdn-local.mozillademos.org/files/1/campipe.gif Transfer-Encoding: chunked HTTP/1.1 200 OK Date: Tue, 06 May 2014 16:51:40 GMT Server: WSGIServer/0.1 Python/2.6.9 Last-Modified: 2014-05-06 09:46:09 Content-Length: 54194 Content-Type: image/gif X-Frame-Options: ALLOW-FROM: developer-local.allizom.org Vary: Cookie ```
The relevant config for mdn.mozillademos.org is as follows: ProxyPassMatch ^/(files/\d+/.+)$ http://developer-local:81/$1 "developer-local" is in /etc/hosts as 127.0.0.1, and is a ServerAlias on the main developer.mozilla.org VirtualHost block. On the nodes, you can bypass this ProxyPassMatch and go directly to Kuma like so: # curl -sIXGET -H 'X-Forwarded-Host: mdn.mozillademos.org' http://developer-local:81/files/7751/test-64.png HTTP/1.1 200 OK Date: Tue, 06 May 2014 20:03:54 GMT Server: Apache X-Frame-Options: ALLOW-FROM: developer.mozilla.org Vary: Cookie X-Backend-Server: developer1.webapp.scl3.mozilla.com Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT Content-Length: 8635 Content-Type: image/png You can verify that's through Kuma by removing the 'X-Forwarded-Host' option... it will through a 500 ISE, and the error in the logs is: [Tue May 06 12:59:09 2014] [error] [client 127.0.0.1] SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): developer-local This leaves the only thing between Kuma and curl to be Apache and mod_wsgi. You can see their configs on the nodes here: /etc/httpd/mozilla/domains/developer.mozilla.org.conf. I don't see anything in there that messes with dates. The one thing I see different between production and comment 3 is Apache/mod_wsgi vs the local WSGIServer. Can we go further in Kuma on the command line and actually display the view? I don't know how to do that. Also, does this only happen with some file attachments, or all? Also also, is this a recent development or do we think it's been happening a long time but only recently noticed?
Flags: needinfo?(jbennett)
I pulled five randomly-chosen files from the production DB: https://mdn.mozillademos.org/files/3989/Persona_1stlogin.png https://mdn.mozillademos.org/files/7589/transform-tooltip-de.png https://mdn.mozillademos.org/files/6755/inspecteur-modifier-html.png https://mdn.mozillademos.org/files/185/BrowserDirectoryStructure.gif https://mdn.mozillademos.org/files/2663/String%20hierarchy.jpg All of them say "Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT". Which points to this being a problem on every file. I also just copy-pasted lines from the file-serving view into a Python interpreter on prod, and checked the result that the view would set for Last-Modified... which was the correct timestamp. Don't know for certain when this started; the relevant code in the view appeared in this commit: https://github.com/mozilla/kuma/commit/581ca78a029bcd43d722cc5fa22cd69bde288b4e
Flags: needinfo?(jbennett)
I've been thinking this over and poking around the web for similar occurrences, and have come up with very limited ideas: 1) http://permalink.gmane.org/gmane.comp.web.zope.general/37756 ... similar problem, python code, returning bad Last-Modified (and also bad Content-Length, so maybe different). Turns out to be in their case something terminating the connection too soon, causing no data (or bad data) to be returned. That might explain why this happens in prod, but not in local dev envs (which are a different stack). 2) Some sort of library bug? Like, a version of something we run in prod that is different in your local dev environments. It also occurs to me, can we try to replicate this on dev or stage? I naively tried to modify some of the links to do the right thing, but got a lot of 404's. Not sure what URLs might be usable for testing. It's possible this is a bug in Apache (or more likely, mod_wsgi), but AFAIK we haven't run into anything like this on other properties... though it's possible nobody has really looked- it seems most things don't care too much about Last-Modified. Only Etags really rely on it, and that's largely been superseded by Cache-Control these days. The only thing I can really think to do on the infra at this point is to completely change how we host this site... replace apache+mod_wsgi with something else... perhaps nginx+uwsgi, or apache+gunicorn. This is fairly major change though. I don't understand the effects of the problem well enough to say if this is worthwhile.
Whiteboard: [unplanned - troubleshooting] → [kanban:https://kanbanize.com/ctrl_board/4/196] [unplanned - troubleshooting]
I believe this is resolved now, by virtue of bug 1005412. If I've missed something, please re-open and specify. Thank you!
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.