Closed Bug 853885 Opened 11 years ago Closed 11 years ago

HTTP Caching issues on sumo production

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rrosario, Assigned: nmaul)

Details

Users started reporting caching issues on prod yesterday. Basically, they reply to a question in the support forum and get redirected to a cached version of the page without their reply. I created a test question and can confirm the issue:
https://support.mozilla.org/en-US/questions/954201

I am seeing `X-Cache-Info: cached` in the response headers after replying. If I refresh the page, I see `X-Cache-Info: caching`.

This is puzzling. We've never cached our HTTPS traffic on SUMO. Did something change recently?
Summary: Caching issues on sumo production → HTTP Caching issues on sumo production
Zeus sets that header, and it obeys whatever Cache-Control headers are sent by the servers (up to a cap... if you set a 1-hour max-age, it will only cache for 10 minutes).

In addition, if no cache headers are sent at all, it will cache for its default timeout, which is 10 minutes. For this reason alone it's a good idea to send something all the time.


Double-check the cache headers direct from the servers... something like this, if you can:

curl -v -H 'Host: support.mozilla.org' http://support1.webapp.phx1.mozilla.com:81/page-to-check

When I do this with the page you linked, I get no cache headers at all. That explains why Zeus will cache for a short time.


It's worth noting that this Zeus config isn't new... it's been like this for a long time. Perhaps something in the app (or in Apache's config) has changed recently?
Assignee: server-ops-webops → nmaul
Nothing related to this has changed in the app recently. Yesterday we started having apache restart issues because mod_wsgi was upgraded by puppet. So, I am guessing there may have been related changes to the Apache config.
I don't see anything in the WebOps-managed apache config... nothing substantial since at least Feb 7, when the PyOpenSSL work was done.

The work yesterday resulted in exactly 1 line being changed, and it was just a re-enabling the mod_wsgi module.
There seems to have been some environmental change. If I look at the WSGI request info on stage I see:
...
META:{'DOCUMENT_ROOT': '/data/www/support.allizom.org/kitsune/webroot',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTPS': 'on',
 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', ...

In prod that is missing:
...
META:{'DOCUMENT_ROOT': '/data/www/support.mozilla.org/kitsune/webroot',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',...

I am guessing that changed in the past few days. Why is it different on stage vs prod (missing HTTPS=on)?

We were depending on that for setting the cache control headers. I fixed our code to use django's `request.is_secure()` which I *thought* was working for us. AFAICT, it is checking `os.environ.get("HTTPS") == "on"` in our case:
https://github.com/django/django/blob/master/django/http/request.py#L117

Can you confirm that the HTTPS environment variable is being set in prod for the wsgi processes that handle https?
i just confirmed that the apache environment variable is set - from the apache config:

  SetEnv HTTPS on


i just reviewed an `svn log` on this apache config and it has /not/ been updated recently.
OK, I've fixed the issue in Bug 853904. Instead of checking `request.META['HTTPS'] != 'off'` we are now checking `os.environ.get("HTTPS") == "on"` to determine a request is over https.

The only explanation I have is that something changed recently to make our `request.META['HTTPS'] != 'off'` check fail. It still works on stage though.

I think we can close this as WFM?
It really looks like the mod_wsgi upgrade did it:
http://code.google.com/p/modwsgi/wiki/ChangesInVersion0304#Features_Changed

"Note that you can still set HTTPS in Apache configuration using the SetEnv or SetEnvIf directive, or via a rewrite rule. In that case, that will override what wsgi.url_scheme is set to and once wsgi.url_scheme is set appropriately, the HTTPS variable will be removed from the set of variables passed through to the WSGI environment. "


Just curious why we arent on the same version on stage and dev?
dev/stage/prod are all the same version of mod_wsgi:

[support1.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support2.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support3.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support4.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support5.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support1.stage.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64
[support1.dev.webapp.phx1.mozilla.com] out: mod_wsgi-3.4-1.el6.rfx.x86_64

This seems almost certain to just be a settings drift/mismatch between dev/stage/prod, where some of them had "SetEnv HTTPS on" and others didn't.


Good catch on this difference, and good find on the modwsgi Changelog info!


Marking this as R/F instead of WFM, since we did wind up making a change on our side (and you did on in the code too).
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Awesome! Thanks jakem and cturra!
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.