Closed
Bug 1005976
Opened 11 years ago
Closed 10 years ago
MDN file attachments serving bad Last-Modified header
Categories
(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)
Infrastructure & Operations Graveyard
WebOps: Community Platform
x86
macOS
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jbennett, Assigned: nmaul)
References
Details
(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/196] [unplanned - troubleshooting])
File attachments uploaded to MDN end up on mdn.mozillademos.org, which seems to be serving them back with a Last-Modified header of "Thu, 01 Jan 1970 00:00:00 GMT" regardless of the actual date on which the file was uploaded.
Per bug 1005412, this is actively causing problems for some users.
Best guess is that somehow ctime/mtime/etc. information is either not getting set when a file is saved, or is not getting from the filesystem to the web server, but we're at a loss as to how to investigate further.
Updated•11 years ago
|
Whiteboard: [unplanned - troubleshooting]
Assignee | ||
Comment 1•11 years ago
|
||
I don't think this is anything on our end. Here's the file info for the curl link in bug 1005412:
[root@developer1.webapp.scl3 attachments]# find . -type f -iname "test-64.png"
./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png
[root@developer1.webapp.scl3 attachments]# ls -al ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png
-rwxr-xr-x 1 apache apache 8635 May 2 18:01 ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png
[root@developer1.webapp.scl3 attachments]# stat ./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png
File: `./2014/05/02/7751/bcca866f665e7c05c29034a690f70768/test-64.png'
Size: 8635 Blocks: 24 IO Block: 65536 regular file
Device: 13h/19d Inode: 70583894 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 48/ apache) Gid: ( 48/ apache)
Access: 2014-05-02 18:01:18.735334000 -0700
Modify: 2014-05-02 18:01:18.737344000 -0700
Change: 2014-05-02 18:01:18.737344000 -0700
Datestamp looks good, as does the directory structure where it's stored. Unless I've got the wrong file altogether... but that was the only thing I found with the name "test-64.png", and the attachment number (7751) matches, so...
For clarification, mdn.mozillademos.org does *not* go through a CDN. It's hosted locally out of SCL3, and in fact goes to the same nodes that run developer.mozilla.org. So that's one less thing you can cut out of the possibilities. :)
Is there some way to verify how Kuma serves up these files? As you can tell above, the file name does not match up directly to the URL, so Kuma (or something) must be interpreting the request and determining what filesystem path matches to the request.
It looks to me like:
https://github.com/mozilla/kuma/blob/master/urls.py#L60 handles the request, and
https://github.com/mozilla/kuma/blob/master/apps/wiki/views.py#L2380 is where the Last-Modified is set.
My best guess is that is somehow misfiring... perhaps attachment.current_revision.created is set incorrectly?
I'm going to mark this bug as invalid because I don't see anything on the infra side that could cause this, and you've already got a bug on the dev side. If you do though, please re-open and let us know where to look. Thanks!
Assignee: server-ops-webops → nmaul
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 2•11 years ago
|
||
The data on the MDN/kuma side is correct:
[jbennett@developer2.webapp.scl3 kuma]$ python manage.py shell
Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48)
Type "copyright", "credits" or "license" for more information.
In [1]: from wiki.models import Attachment
In [2]: a = Attachment.objects.get(id=7751)
In [3]: a.current_revision.created
Out[3]: datetime.datetime(2014, 5, 2, 18, 1, 18)
In [4]: a.get_file_url()
Out[4]: 'https://mdn.mozillademos.org/files/7751/test-64.png'
The filesystem structure is a bit overcomplicated in order to avoid conflicts between files with the same name; it uses the MD5 of the upload timestamp to work around that.
The view in kuma which serves files is here: https://github.com/mozilla/kuma/blob/70b658ba3b9d6567309f91ae51bb01305276ccd6/apps/wiki/views.py#L2370
If the request is on the host we serve files from, it serves the file, setting Last-Modified to 'created' of the current revision. If not, it issues a redirect to the correct host.
So:
Metadata in the kuma DB is correct.
Metadata on the filesystem is correct.
If kuma were serving the file, it would be setting the correct Last-Modified header.
Which indicates this isn't a bug in kuma, or in how kuma puts files onto the disk.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment 3•11 years ago
|
||
For what it's worth, I was able to reproduce a *correct* `Last-Modified` behaviour in Kuma, locally, so this definitely appears to be something between Kuma and the end user (read: me).
```
$ curl -kLIXGET 'https://developer-local.allizom.org/files/1/campipe.gif'
HTTP/1.1 301 MOVED PERMANENTLY
Date: Tue, 06 May 2014 16:51:39 GMT
Server: WSGIServer/0.1 Python/2.6.9
Vary: Cookie,Accept-Encoding
X-Frame-Options: DENY
Content-Type: text/html; charset=utf-8
Location: https://mdn-local.mozillademos.org/files/1/campipe.gif
Transfer-Encoding: chunked
HTTP/1.1 200 OK
Date: Tue, 06 May 2014 16:51:40 GMT
Server: WSGIServer/0.1 Python/2.6.9
Last-Modified: 2014-05-06 09:46:09
Content-Length: 54194
Content-Type: image/gif
X-Frame-Options: ALLOW-FROM: developer-local.allizom.org
Vary: Cookie
```
Assignee | ||
Comment 4•11 years ago
|
||
The relevant config for mdn.mozillademos.org is as follows:
ProxyPassMatch ^/(files/\d+/.+)$ http://developer-local:81/$1
"developer-local" is in /etc/hosts as 127.0.0.1, and is a ServerAlias on the main developer.mozilla.org VirtualHost block.
On the nodes, you can bypass this ProxyPassMatch and go directly to Kuma like so:
# curl -sIXGET -H 'X-Forwarded-Host: mdn.mozillademos.org' http://developer-local:81/files/7751/test-64.png
HTTP/1.1 200 OK
Date: Tue, 06 May 2014 20:03:54 GMT
Server: Apache
X-Frame-Options: ALLOW-FROM: developer.mozilla.org
Vary: Cookie
X-Backend-Server: developer1.webapp.scl3.mozilla.com
Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT
Content-Length: 8635
Content-Type: image/png
You can verify that's through Kuma by removing the 'X-Forwarded-Host' option... it will through a 500 ISE, and the error in the logs is:
[Tue May 06 12:59:09 2014] [error] [client 127.0.0.1] SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): developer-local
This leaves the only thing between Kuma and curl to be Apache and mod_wsgi. You can see their configs on the nodes here: /etc/httpd/mozilla/domains/developer.mozilla.org.conf. I don't see anything in there that messes with dates.
The one thing I see different between production and comment 3 is Apache/mod_wsgi vs the local WSGIServer.
Can we go further in Kuma on the command line and actually display the view? I don't know how to do that.
Also, does this only happen with some file attachments, or all?
Also also, is this a recent development or do we think it's been happening a long time but only recently noticed?
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(jbennett)
Reporter | ||
Comment 5•11 years ago
|
||
I pulled five randomly-chosen files from the production DB:
https://mdn.mozillademos.org/files/3989/Persona_1stlogin.png
https://mdn.mozillademos.org/files/7589/transform-tooltip-de.png
https://mdn.mozillademos.org/files/6755/inspecteur-modifier-html.png
https://mdn.mozillademos.org/files/185/BrowserDirectoryStructure.gif
https://mdn.mozillademos.org/files/2663/String%20hierarchy.jpg
All of them say "Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT". Which points to this being a problem on every file. I also just copy-pasted lines from the file-serving view into a Python interpreter on prod, and checked the result that the view would set for Last-Modified... which was the correct timestamp.
Don't know for certain when this started; the relevant code in the view appeared in this commit:
https://github.com/mozilla/kuma/commit/581ca78a029bcd43d722cc5fa22cd69bde288b4e
Flags: needinfo?(jbennett)
Assignee | ||
Comment 6•11 years ago
|
||
I've been thinking this over and poking around the web for similar occurrences, and have come up with very limited ideas:
1) http://permalink.gmane.org/gmane.comp.web.zope.general/37756 ... similar problem, python code, returning bad Last-Modified (and also bad Content-Length, so maybe different). Turns out to be in their case something terminating the connection too soon, causing no data (or bad data) to be returned. That might explain why this happens in prod, but not in local dev envs (which are a different stack).
2) Some sort of library bug? Like, a version of something we run in prod that is different in your local dev environments.
It also occurs to me, can we try to replicate this on dev or stage? I naively tried to modify some of the links to do the right thing, but got a lot of 404's. Not sure what URLs might be usable for testing.
It's possible this is a bug in Apache (or more likely, mod_wsgi), but AFAIK we haven't run into anything like this on other properties... though it's possible nobody has really looked- it seems most things don't care too much about Last-Modified. Only Etags really rely on it, and that's largely been superseded by Cache-Control these days.
The only thing I can really think to do on the infra at this point is to completely change how we host this site... replace apache+mod_wsgi with something else... perhaps nginx+uwsgi, or apache+gunicorn. This is fairly major change though. I don't understand the effects of the problem well enough to say if this is worthwhile.
Updated•10 years ago
|
Whiteboard: [unplanned - troubleshooting] → [kanban:https://kanbanize.com/ctrl_board/4/196] [unplanned - troubleshooting]
Assignee | ||
Comment 7•10 years ago
|
||
I believe this is resolved now, by virtue of bug 1005412. If I've missed something, please re-open and specify. Thank you!
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•