What problem would this feature solve? ====================================== There are a few ways that a page can get out of date after it is edited - It uses a macro, and the macro output has changed - It uses a macro, and the macro has been removed - A related page has changed, such as the addition of a translation, and the cached metadata is out of data - Content sanitation has been updated It is a hard problem to determine "when this changes, the page render is invalid". See bug 772704. It is easier to re-render pages in a background task periodically, with a goal of re-rendering all the pages on the site. Who has this problem? ===================== All visitors to MDN How do you know that the users identified above have this problem? ================================================================== MDN users complain once or twice a month that a page is out of date, because it has not been re-rendered with the current macros. We are serving content that would be filtered using the current content bleaching, but it is not because the pages have not been edited in years (mozilla/mdn#4237) How are the users identified above solving this problem now? ============================================================ Staff constantly uses the force-refresh function to re-render pages, and trains frequent users in the usage. Do you have any suggestions for solving the problem? Please explain in detail. ============================================================================== We have the mechanisms for re-rendering pages in the background, using a periodic task "render_stale_documents". However, this requires that a staff member marks a page as needing periodic refreshing. Currently, 318 of 54,874 documents get re-rendered by this process. It could be expanded to a general mechanism for periodically re-rendering all documents. Is there anything else we should know? ====================================== This hasn't been attempted in the past, because it took a long time to re-render all the documents. There may be changes needed to the rendering and storage architecture to make this an effective option.
10 months ago
Just to record a couple thoughts: * This will be very helpful for browser-compat * Keep a log of pages which generate errors during rebuilding, so we can easily check them * As a component of this work, adding a way to manually trigger a rebuild of a specific subtree of the site would be useful, to let us rebuild pages in the case of error or important changes
10 months ago
As a test of the new AWS infrastructure, I queued a re-render of all the pages on the site. It took over 6 hours to render the first 100 or so pages, and then about 2 hours to render the remaining 57,000 pages. The aggregate stats from Datadog and New Relic don't help much with answering "why". My suspicion is that there are some pages, like https://developer.mozilla.org/en-US/docs/MDN/Doc_status/CSS, that take a lot of work to render, and the vast majority are easily rendered. My suggestion for proceeding is to see if the existing data on Document models (render_scheduled_at, render_started_at, last_rendered_at) can be used to find these outliers, so they can be excluded or handled differently during a full re-render. More data is needed, but this isn't a blocker for the AWS move, so putting it back on the shelf. However, it looks like a daily or weekly re-render could be achievable with the AWS infrastructure. This test was done when there was no external traffic, so further tests are needed.
Marking this a P2 so we'll cover it at next quarterly bug triage.
Priority: -- → P2
My script to re-render a set of documents has worked without incident for a few recent re-renders. The next step is to refactor it into a management command: https://gist.github.com/jwhitlock/43e34e07bef8c3f1863e91f076778ca6
You need to log in before you can comment on or make changes to this bug.