Closed Bug 766252 Opened 13 years ago Closed 13 years ago

Kumascript: Generate content for complex pages using an offline queue

Categories

(developer.mozilla.org Graveyard :: Editing, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lorchard, Assigned: lorchard)

References

Details

(Whiteboard: u=admin c=wiki s=2012-07-03 p=3 t=2012-07-09)

There are complex pages on MDN that consist of dozens of includes from other pages. In particular, the XUL docs contain many examples of this: https://developer-new.mozilla.org/en-US/docs/XUL/textbox If the kumascript-rendered content for this page is not cached, or if the cache has gone stale, regenerating the page can take 30 seconds or more. And even worse, additional requests for the same page can come in and kick off the same process over and over in parallel. Speeding up this process is a long-haul effort, and not likely to be a quick win. So, I think the best solution for now is something like this: * On request to view a page processed by kumascript, look for cached content. * If Cache-Control: max-age=0 or no-cache was sent, and the user is logged in, delete cached content (if any). * If cached content was found, serve the cached content immediately. * If cached content was NOT found: * Serve up a page saying something like "Rebuild of this page in progress, please wait. {spinner}" with a 5 second auto-refresh * Meanwhile, check to see if there is already a regeneration task for this page queued. If not, queue it. (This check could even look for a mutex flag in cache, if the queue processor doesn't have an easy way to check) A queued regeneration task will do the usual things necessary for processing page content through kumascript, including pushing the results into cache. Then, after the task completes, the view logic outlined above will naturally refresh into serving the newly-cached content. Now here's the bad news: We don't have an offline queue for MDN/Kuma yet. Mozilla webdev state-of-art seems to use a Celery queue. I think we might even have most of the parts in place to set Celery up. We just need to do it, ideally on a Vagrant VM first and then through IT bugs to get it going on real hosts. I think this might be a big enough problem to make it a blocker for July launch.
Depends on: 766256
Notes to self: * Constance config for # secs in "rebuilding plz wait" page refresh? Or, make it self-increment for back-off? (eg. 5 sec, 10 sec, 15 sec, etc) * Constance boolean to enable/disable regeneration queue? Handy to turn it off sometimes in dev / local.
Not sure if I fully understand this, but could we just make page regeneration more invisible to the user? For example, if a user visits a page while it is regenerating, an old version is shown instead. When regeneration completes, an unobtrusive banner (like this: http://superdit.com/wp-content/uploads/2011/04/twitter_com.png) appears to let them know that a new version of the page is available. Something similar could be done when a person is actually editing the page like this. The "Save" and "Preview" buttons could be replaced with a button called "Generate preview and keep editing". After hitting that button, a preview is generated in the background while the user continues to edit. When the preview becomes available, the banner appears again to let them know. The editor still would not get immediate previews, but they also would not be blocked while the page regenerates. I might be completely off track here. Let me know if I am.
Quick correction: The "Save" button would not need to disappear, just the "Preview" button. And, of course, this would only be necessary for pages that have a large number of includes.
(In reply to John Karahalis [:openjck] from comment #2) > Not sure if I fully understand this, but could we just make page > regeneration more invisible to the user? > > For example, if a user visits a page while it is regenerating, an old > version is shown instead. That's the problem: In a lot of cases, there *is* no old version to show. The cache has gone stale or been evicted from cache memory due to infrequent access; or the cache has been invalidated on purpose by an edit or a reload / shift-reload. > When regeneration completes, an unobtrusive banner > (like this: http://superdit.com/wp-content/uploads/2011/04/twitter_com.png) > appears to let them know that a new version of the page is available. That would be the purpose of the "Page being regenerated" interstitial and its auto-refresh. Since it's probably likely there's no content to show, the interstitial would indicate that a rebuild is happening. Only, instead of requiring a manual refresh, it would just happen when the page is ready. > Something similar could be done when a person is actually editing the page > like this. The "Save" and "Preview" buttons could be replaced with a button > called "Generate preview and keep editing". After hitting that button, a > preview is generated in the background while the user continues to edit. > When the preview becomes available, the banner appears again to let them > know. The editor still would not get immediate previews, but they also would > not be blocked while the page regenerates. > > I might be completely off track here. Let me know if I am. The problem is that, without an offline queue, there is no "in the background". It's all responses to on-demand web requests, whether they're made visible to the user or not. And, like I said in the "even worse" part - there's nothing to stop multiple people from requesting the same page and triggering parallel rebuilds to drag down the servers.
(In reply to John Karahalis [:openjck] from comment #3) > Quick correction: The "Save" button would not need to disappear, just the > "Preview" button. And, of course, this would only be necessary for pages > that have a large number of includes. At present, there's really no way to detect whether a page has a large number of includes - or any other condition that causes a slow load (eg. quering bugzilla, etc).
Whiteboard: u=admin c=wiki s=2012-07-03 p=
Depends on: 766627
All the above said: if there *is* stale cache available, try to display it along with the auto-refresh while waiting for rebuild. Another concern along those lines: We don't ever want the "please wait page rebuilding" content to get picked up by the googlebot and indexed. I wonder if there's a way to prevent that, short of forcing an on-demand page build for search engine crawlers? (That sounds like trouble)
(In reply to Les Orchard [:lorchard] from comment #6) > Another concern along those lines: We don't ever want the "please wait page > rebuilding" content to get picked up by the googlebot and indexed. I wonder > if there's a way to prevent that, short of forcing an on-demand page build > for search engine crawlers? (That sounds like trouble) Here's a lead: Maybe we can serve up a 503 Service Temporarily Unavailable and a Retry-After: 600 header, when a page is being rebuilt. Just to bots, never to humans. That appears to tell googlebot to come back later. http://googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html http://support.google.com/webmasters/bin/answer.py?hl=en&answer=40132&from=83040&rd=1 Another question after that would be if a 503 makes the googlebot run from the whole site, or just that one page
Whiteboard: u=admin c=wiki s=2012-07-03 p= → u=admin c=wiki s=2012-07-03 p=3
Another mental note: Kick off a page rebuild after document save. I think Kitsune-heritage code already kind of does this with schedule_rebuild_kb(), but that in particular doesn't quite work for this.
Assignee: nobody → lorchard
Blocks: 765649
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/fd34084ca0be3503b765185957eff1491de80902 bug 766252: Deferred and singleton page rendering * Documents can be rendered on demand or offloaded into the Celery queue * Documents with `defer_rendering == True` are rendered with Celery queue tasks. Otherwise, rendering is attempted on demand during the request/reasponse cycle. * If a Document takes too long to render, `defer_rendering` is automatically set to True. This time limit is controlled by KUMA_DOCUMENT_FORCE_DEFERRED_TIMEOUT in Constance settings * Allow only one rendering per Document at any given time. Additional attempts to render a Document while already in progress results in a warning message and/or stale content while the rendering finishes. * Rendered content is now kept in the DB, rather than temporary cache * Lots of additional Document fields to track rendering state, store rendered content, and control deferred rendering. * Reload no longer triggers a render of the page. However, editing and saving a Document does. * Shift-Reload by a logged-in user schedules a fresh render, assuming there's not one already in progress. * Display warning messages about the render status of a page. But, only for logged-in users to keep search engine crawlers from indexing them. * Display detailed Kumascript errors only to logged in users, also to avoid search indexing. * Document rendering via kumascript refactored out of views.py and into models.py and kumascript.py * Disable schedule_rebuild_kb and _maybe_schedule_rebuild in favor of new Document rendering system. * Tweaks to Document admin to allow mass enable/disable of deferred rendering, link to public URL of Documents. * Reenable celery queue in settings_local, tweaks to failing tests * manage.py command for manually triggering or queuing a page render https://github.com/mozilla/kuma/commit/7219dc97c97544418397292e5a98cf5e203dbd13 Merge pull request #318 from lmorchard/page-rebuild-queue-task-766252 bug 766252: Deferred and singleton page rendering
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: u=admin c=wiki s=2012-07-03 p=3 → u=admin c=wiki s=2012-07-03 p=3 t=2012-07-09
Version: Kuma → unspecified
Component: Docs Platform → Editing
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.