Closed Bug 1365987 Opened 7 years ago Closed 4 years ago

Wiki pages should be periodically re-rendered

Categories

(developer.mozilla.org Graveyard :: Wiki pages, enhancement, P2)

All
Other
enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jwhitlock, Unassigned)

References

Details

(Keywords: in-triage, Whiteboard: [specification][type:feature][points=6+])

What problem would this feature solve?
======================================
There are a few ways that a page can get out of date after it is edited

- It uses a macro, and the macro output has changed
- It uses a macro, and the macro has been removed
- A related page has changed, such as the addition of a translation, and the cached metadata is out of data
- Content sanitation has been updated

It is a hard problem to determine "when this changes, the page render is invalid". See bug 772704.

It is easier to re-render pages in a background task periodically, with a goal of re-rendering all the pages on the site.

Who has this problem?
=====================
All visitors to MDN

How do you know that the users identified above have this problem?
==================================================================
MDN users complain once or twice a month that a page is out of date, because it has not been re-rendered with the current macros.

We are serving content that would be filtered using the current content bleaching, but it is not because the pages have not been edited in years (mozilla/mdn#4237)

How are the users identified above solving this problem now?
============================================================
Staff constantly uses the force-refresh function to re-render pages, and trains frequent users in the usage.

Do you have any suggestions for solving the problem? Please explain in detail.
==============================================================================
We have the mechanisms for re-rendering pages in the background, using a periodic task "render_stale_documents".  However, this requires that a staff member marks a page as needing periodic refreshing.  Currently, 318 of 54,874 documents get re-rendered by this process.  It could be expanded to a general mechanism for periodically re-rendering all documents.

Is there anything else we should know?
======================================
This hasn't been attempted in the past, because it took a long time to re-render all the documents.  There may be changes needed to the rendering and storage architecture to make this an effective option.
See Also: → 772704
Just to record a couple thoughts:

* This will be very helpful for browser-compat

* Keep a log of pages which generate errors during rebuilding, so we can easily check them

* As a component of this work, adding a way to manually trigger a rebuild of a specific subtree of the site would be useful, to let us rebuild pages in the case of error or important changes
See Also: → 1370500
As a test of the new AWS infrastructure, I queued a re-render of all the pages on the site. It took over 6 hours to render the first 100 or so pages, and then about 2 hours to render the remaining 57,000 pages.  The aggregate stats from Datadog and New Relic don't help much with answering "why". My suspicion is that there are some pages, like https://developer.mozilla.org/en-US/docs/MDN/Doc_status/CSS, that take a lot of work to render, and the vast majority are easily rendered.

My suggestion for proceeding is to see if the existing data on Document models (render_scheduled_at, render_started_at, last_rendered_at) can be used to find these outliers, so they can be excluded or handled differently during a full re-render.

More data is needed, but this isn't a blocker for the AWS move, so putting it back on the shelf. However, it looks like a daily or weekly re-render could be achievable with the AWS infrastructure.  This test was done when there was no external traffic, so further tests are needed.
Marking this a P2 so we'll cover it at next quarterly bug triage.
Priority: -- → P2
My script to re-render a set of documents has worked without incident for a few recent re-renders. The next step is to refactor it into a management command:

https://gist.github.com/jwhitlock/43e34e07bef8c3f1863e91f076778ca6
Whiteboard: [specification][type:feature] → [specification][type:feature][points=6+]
Assignee: nobody → jwhitlock
Status: NEW → ASSIGNED
Priority: P2 → P1
Blocks: 1482383
Blocks: 1438889

As part of bug 1404669, I re-rendered all redirects in staging and all documents (wiki plus redirects) in production, using the script, removing the blockers for other bugs.

It took somewhere between 12 and 24 hours to re-render 133,545 documents. This is somewhere between 1.5 and 2 documents per second. Redirect documents rendered at about 2 per second on staging, and previous re-renders of {{Compat}} documents took about 1 per second.

811 indexed documents had errors. A bulk re-render (100 at a time) cleared half of them. A slow re-render (1 at a time) cleared another batch, leaving 233 that may have real errors. Some are blocked by removing file data from the Kumascript call (bug 1520574), others appear to be macro errors.

I'm still working on getting my re-rendering code into the Kuma code base, to enable periodic re-rendering.

Priority: P1 → P2

De-prioritized for other work. When we get back to this, it is possible that offline rendering will be the focus, so I'm closing https://github.com/mozilla/kuma/pull/5128 for now.

I've shown Ryan how to use my re-render script (comment #4) to regenerate documents as needed. I'm leaving the MDN team, so I'm not going to continue refining this code to get it into the Kuma code base.

Assignee: jwhitlock → nobody
Status: ASSIGNED → NEW
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.