Closed
Bug 1468414
Opened 7 years ago
Closed 7 years ago
MDN CDN serving stale data
Categories
(developer.mozilla.org Graveyard :: General, enhancement, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dil, Assigned: rjohnson)
References
Details
(Keywords: in-triage, Whiteboard: [specification][type:bug])
What did you do?
================
1. Created new MDN account.
2. Made doc change.
3. Saw change on page.
4. Logged out.
5. Reloaded doc and history pages many times.
6. Tried to login again many times.
What happened?
==============
- Old doc and history page content was loaded.
- Login attempts mostly always failed, with rare success.
What should have happened?
==========================
- New content should have been loaded.
- Login attempts should always have worked.
Is there anything else we should know?
======================================
- This was for a minor change here: https://developer.mozilla.org/en-US/docs/Web/API/DOMException
- This seems to be a CDN cache and/or DNS issue.
- Chromium would (mostly always) load old pages.
- Firefox and curl would load the newer page.
- Chromium page loads using one IP would have old data, yet fetch() in the console would use a different IP and get correct results.
- After some exploration with curl, I could emulate the browser failure like this:
curl -v --cookie "dwf_sg_task_completion=False" "https://developer.mozilla.org/en-US/docs/Web/API/DOMException" --resolve "developer.mozilla.org:443:52.85.233.175" -H "Accept-Language: en-US,en;q=0.9" -H "Accept-Encoding: gzip, deflate, br" --output - | gzip -d - | less
- The old "NETWORK_ER</code><code>R" string would exist vs newer "NETWORK_ERR" update.
- The IP the browser and fetch() use change over time.
- Using no headers seems to always(?) return the correct data.
- The above headers are a subset of what the browser is sending that seem to always(?) cause old data to be returned.
- In case it's a Vary header issue or similar, here's a bit of curl I/O:
Connected to developer.mozilla.org (52.85.131.130) port 443 (#0)
> GET /en-US/docs/Web/API/DOMException HTTP/2
> Host: developer.mozilla.org
> User-Agent: curl/7.60.0
> Accept: */*
> Cookie: dwf_sg_task_completion=False
> Accept-Language: en-US,en;q=0.9
> Accept-Encoding: gzip, deflate, br
>
< HTTP/2 200
< content-type: text/html; charset=utf-8
< access-control-allow-origin: *
< cache-control: s-maxage=86400, public, max-age=0
< content-language: en-US
< date: Tue, 12 Jun 2018 11:55:47 GMT
< server: meinheld/0.6.1
< set-cookie: dwf_sg_task_completion=False; expires=Thu, 12-Jul-2018 11:55:47 GMT; Max-Age=2592000; Path=/; secure
< strict-transport-security: max-age=63072000
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-kuma-revision: 1321868
< x-xss-protection: 1; mode=block
< content-encoding: gzip
< vary: Accept-Encoding,Cookie
< age: 38536
< x-cache: Hit from cloudfront
< via: 1.1 a9ced60f02a91a154a8631077a254a91.cloudfront.net (CloudFront)
< x-amz-cf-id: tAkFoXqICmpIWxO11_QfrrJhShh5ts-dfQpyWF7PqvnE0853Rmpw3w==
<
Reporter | ||
Comment 1•7 years ago
|
||
Here's some chromium network panel "copy as curl" commands that compare fetch() (new content) vs browser page load (old content) showing it's just the headers making a difference as of the CDN contents as of right now. DNS picked 52.85.131.130 for both of these tests, which could be forced with the --resolve option. I'm told there is a CDN cache experiment going on which may explain this for regular edited pages. Getting stale data depending on headers could be annoying, but the login failure issue is more serious and probably related.
Loads new content (based on fetch()):
curl 'https://developer.mozilla.org/en-US/docs/Web/API/DOMException' -H 'dnt: 1' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: en-US,en;q=0.9' -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36' -H 'accept: */*' -H 'referer: https://developer.mozilla.org/en-US/docs/Web/API/DOMException' -H 'authority: developer.mozilla.org' --compressed
Loads old content (based on page load):
curl 'https://developer.mozilla.org/en-US/docs/Web/API/DOMException' -H 'authority: developer.mozilla.org' -H 'cache-control: max-age=0' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'referer: https://developer.mozilla.org/en-US/docs/Web/API/DOMException' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: en-US,en;q=0.9' -H 'cookie: dwf_sg_task_completion=False' -H 'dnt: 1' --compressed
Reporter | ||
Comment 2•7 years ago
|
||
Login issue appears fixed via:
https://github.com/mdn/infra/pull/5
I'm unsure if the other caching behavior is desired or a problem. Feel free to close this bug as you all see fit.
Comment 3•7 years ago
|
||
We are experimenting with a longer cache time for pages (24 hours, up from 5 minutes) for a few days. It appears that this longer cache time exposed a bug in caching for the login URLs, where the same request was replayed to GitHub, and GitHub correctly rejected it. We've adjusted the caching behavior, and login / logout should now work.
The purpose of the experiment is to measure end-user performance gains from a longer cache time (bug 1431259). The theory is that a longer cache time will mean that more users get (slightly stale) content from an edge CDN, rather than very fresh content that is served from one region, and that we'll see an aggregate improvement in page load time. If the improvement is significant, we can justify spending time on cache invalidation to support longer cache times. If the improvement isn't large, then we will look elsewhere to speed up the site. In either case, we will probably return to shorter cache timeouts in a few days.
The key to determining if the CDN is serving the request from cacheis the header:
x-cache: Hit from cloudfront
Other headers, such as the via: header, will give more details about which CDN server handled the request. This is probably the source of your variation between browsers. Different CDN server might have a different copy of the page, some before your change, some after. CDNs are a bit of a black box.
We debated if we should announce this experiment with a banner. I argued against it, because I was unsure if any users would notice the change (logged-in users should see their own edits). I also was curious what the effect would be of a delay in some users getting the older versions of pages for a while. I'm sorry you were impacted by this change, and thank you for digging in and analyzing the problem.
I believe rjohnson fixed the login / logout issue with https://github.com/mdn/infra/pull/5. I'll leave this open until the experiment is done.
Comment 4•7 years ago
|
||
Commits pushed to master at https://github.com/mozilla/kuma
https://github.com/mozilla/kuma/commit/c37aa79409661133beb2e683067b03392e55f817
bug 1468414: update cdn tests to reflect new cdn config
https://github.com/mozilla/kuma/commit/f341ef7ea4b02d25547bb22953814624a4eb91d8
Merge pull request #4857 from escattone/users-signin-headless-tests-1468414
bug 1468414: update cdn tests to reflect new cdn config
Assignee | ||
Comment 6•7 years ago
|
||
The initial experiment of using a cache timeout of 24 hours was stopped early last week, and the login issue was fixed by https://github.com/mdn/infra/pull/5. Resolving, since there are no further issues to address.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•