Closed Bug 559337 Opened 11 years ago Closed 9 years ago

MDC should use static pages to deliver normal page views (pre-cache)

Categories

(developer.mozilla.org Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: BenB, Unassigned)

Details

Split-off from Bug 550582 comment 7.

MDC is basically just a website delivering static pages. It's a Wiki, but that's merely the way we chose to the content management. The vast majority of hits and usage is just looking static pages, and that's also the part which is critical for our work (look up documentation) and blocks our work as developers when it's slow or down.

Therefore, I propose:
- Whenever somebody edits a page, generate a page (*exactly* as you would deliver it to the browser) and save it to the harddrive of the server.
- When somebody requests a page, merely viewing the latest version of the page,
you deliver the static, cached page from harddisk. There is zero processing (Apache delivering static files). You can implement that (View page -> static files) with simple URL rewriting, on webserver level, before the request even touches the Wiki application.
- When somebody does Edit, View History, or View older page version, you (don't do any URL rewriting and) continue to use the Wiki app as now.

It should be possible to implement that
- without any changes to the Wiki app code
- within a few hours
- without any extra processing cost, given that every saved edit goes back
  to view page anyways.
- with reasonable storage use

Advantages:
- *much* faster. Currently, a simple View page takes often 20s, at best
  3-4s. This is annoying during every day development.
  When delivering static pages, you should go down to 200ms.
- *much* less server load, as there's no processing at all for the majority
  of hits.
- more reliable: When the Wiki software is down (as right now), the static
  pages can still be delivered. Currently, developers on #maildev complain
  that they can't work.
Summary: MDC should use static pages to deliver normal page views → MDC should use static pages to deliver normal page views (pre-cache)
The problem with this is that many pages use transclusion -- that is, they embed other pages -- or they use templates. Detecting when these transclusions or templates are edited would make this a good bit harder.

The pages aren't as static as you think they are. :)
> they embed other pages -- or they use templates

That's not a problem, because you cache the pages exactly as they would be delivered to the browser. You essentially do wget/curl to create the cache.

If you worry about cache being outdated due to templates being changed:
1) I think that's a minor problem, because I'd think that templates or embedded pages don't change that often. You could for example refresh all cached pages every night.
2) If my assumption is not true and embedded pages change often and need to be uptodate always, you'd need to build in a trigger from the wiki software which notifies you on all embedded pages, with a list of all pages that include that embedded page, so that you can regenerate them. That would be a single call, so that's doable as well.

I claim that the current situation (bug 550582) is so bad that we can live with 1) until 2) is implemented. Slightly outdated pages are better than no response, or having to wait 20s for every click.
How about we try to actually fix the site instead of applying hacks to work around the problem? That's what we're busy trying to do right now. I'd rather put the time into that than this workaround.
Even with the site software "working", it's too slow. It's not a hack at all.
I agree with Ben -- we have never seen MDC as fast as it would be with proper use of a cache (though invalidation when templates are changed should not be difficult -- the system can track backlinks, or just invalidate completely when templates are changed), and I doubt we will even once the current outstanding issues with the performance of the back end are resolved.

But we don't need to pre-cache for this to work.  Having it actually generate the page on first access would be fine, especially since the load on the backing system would be _dramatically_ reduced by the amount of traffic serviced by the cache.  We just need it to be set up to cache better (including the skin resources!) and tell the cache when to invalidate.  This is a well-understood problem, and we can copy the pattern from mediawiki if we need to.
For example Squid "Reverse Proxy"
<http://wiki.squid-cache.org/SquidFaq/ReverseProxy>. Shouldn't be hard, and can be a separate box, even. (In fact, you could have another admin set it up and not divert sheppy's time from fixing the current problem. It also implies that you need no changes to the software.)

I would still go with pre-caching, but if you prefer a normal cache, fine. Just be sure that *everything* is really cached, as shaver said.
As for HTTPS, you could do that on the proxy or on an SSL accelerator box before that, not on the MindSource box, so that wouldn't have to be a problem.
I did this "pre-caching" myself a while ago, out of pure need, because MDC was unusable.
http://mdn.beonex.com/
It was intended as short-term workaround, in case you need to read the docs, but can't reach MDC, but it turned out to be useful.
Limitations:
- Problems with XUL:Foo vs. XUL/Foo and nsIFoo (just need a URL rewrite rule)
- "dir.1" URLs
- Only English
- No edit, history etc.
- Not updated

All of these limitations could be easily fixed, if this was to be implemented on mozilla.org servers with cooperation of the master server.

I would expect this to take 1-2 days, with the suggestions above. (The above mirror took me only about 2 hours of work, including server setup, wget, URL adjustment and URL rewriting etc., and I am not even a sys admin.)
Component: Deki Infrastructure → Other
Saw the word transclusion in here and just couldn't resist...

Pretty sure the caching we use on Kuma covers all of the use cases mentioned in comment 0. Luke: Please reopen if I am wrong.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
We don't generate static .html files, but we have much improved caching:

* All static assets are served from a CDN
* We have a proper cache system in place
* We cache the post-template rendered HTML for wiki pages to avoid template renders each time

Altogether I think we've cut at least 70% off the response and render times.
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.