Closed Bug 874311 Opened 11 years ago Closed 11 years ago

Add an option to page.subpages() to let it return tags for each page

Categories

(developer.mozilla.org Graveyard :: General, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sheppy, Unassigned)

References

Details

(Whiteboard: [specification][type:change])

What feature should be changed? Please provide the URL of the feature if possible.
==================================================================================
Currently, Page.subpages() doesn't include the tags list in the objects in the returned array. lorchard says this is for performance reasons.

However, sometimes you're going to wind up turning right around and calling wiki.getpage() on all the returned pages just to get that information anyway. So let's make it possible to get the tag list from page.subpages().

One way to do this without a performance hit when unnecessary: add an options parameter to page.subpages that accepts as one of its flag values "return tags". That way, you only get the tags if you need them, avoiding the performance hit the rest of the time.

What problems would this solve?
===============================
Currently, templates that need to get the tags for all the subpages of a page have to do two calls that fetch patch information; I imagine there's a significant performance hit there. I bet performance would be improved in these cases by combining this all into one function.

Who would use this?
===================
Template developers working on automated generation of tables of contents, indexes, and the like.

What would users see?
=====================
No user impact, directly.

What would users do? What would happen as a result?
===================================================
See above.

Is there anything else we should know?
======================================
I've started building templates that I hope to use broadly across MDN to automatically build landing pages and the like; their performance is I think taking a noticeable hit from this issue (adding the wiki.getpage() calls doubled the length of time the code takes to run, leading to a lot of timeouts loading pages). I suspect this would improve things markedly.
I suppose, alternatively, could we get a similar amount of performance help by adding a page.getTags() routine, which only returns the tags for a given page?
Summary: Page.subpages() should be able to return page tags → Add an option to page.subpages() to let it return tags for each page
(In reply to Eric Shepherd [:sheppy] from comment #0)
> Currently, templates that need to get the tags for all the subpages of a
> page have to do two calls that fetch patch information; I imagine there's a
> significant performance hit there. I bet performance would be improved in
> these cases by combining this all into one function.

It's worth trying this with an option, but it's not a guaranteed performance gain.

Adding tags to the subpages() result will add an order of magnitude of DB joins, recursively-driven against a single DB server. Might not end up being a big hit, but it smells like bad scaling, especially for large subtrees of the site.

Another thing to keep in mind is that getPage() provides a lot of info - eg. basic metadata, extracted summary, tags, translations, sections, etc. Thinking ahead, I'd like to avoid creeping toward populating subpages() with all the things.

Long term, it might be better to look into:

* run batches of getPage() requests in parallel, spread across many Django & DB servers,

* ensure that the results are heavily cached, possibly generated at document save time like kumascript rendering

(In reply to Eric Shepherd [:sheppy] from comment #1)
> I suppose, alternatively, could we get a similar amount of performance help
> by adding a page.getTags() routine, which only returns the tags for a given
> page?

IMO, this isn't really a big difference vs page.getPage(), and is more cacheable as getPage().
Tags are a fairly special case situation, in that we're going to use them a lot from macros, much more often than most page information. Between slug, title, and tags, you have probably 90% of the use cases covered from a macro perspective.

We need to figure out the optimum way to allow macros to quickly pull that information as needed. Here are the most common use cases for macros to be working with tags:

1. As a one-off operation (that is, not operating on a large set of pages), a macro needs to get all the tags on a specific page so that it can figure out things about that page. This is probably the most widespread use-case but obviously not the most impactful in terms of performance issues; we already support this well.

2. A macro needs to get the tags for every subpage in a tree (up to some depth (usually only for immediate children, but I presume not always). This will be used for building menus and landing pages. We can fake this using page.subpages() then by calling wiki.getPage() currently but it's extremely slow; we need to figure out a better way to do this faster.

3. A macro needs to find all pages matching a specified set of tags, optionally filtered by locale. We do not have any way to search by tags yet.

This bug is about #2.

I'm open to pretty much any ideas for how to improve performance here; this is a capability we'll need to have, but I don't really care how we get it. If we can cache the results, great. If we can figure out how to speed up the operation itself, awesome. Some combination? Fabulous!

It sounds like lorchard's feeling is that our best bet is to make the results of page.getPage() cacheable, which sounds good to me. How hard is it to accomplish that?

We're already doing this operation in real-world usage on MDN, and are going to be increasingly reliant on it going forward, so figuring out a solution for this is fairly critical. We're going to be using tags a lot to implement the content side of the zone system as well as to help ensure that our hierarchy works properly when people add/modify content without a good understanding of the hierarchy structure itself.
(In reply to Eric Shepherd [:sheppy] from comment #3)

> 3. A macro needs to find all pages matching a specified set of tags,
> optionally filtered by locale. We do not have any way to search by tags yet.

FWIW, we do have a way to do this. There's a JSON feed for finding docs by tag, eg:

    https://developer.mozilla.org/en-US/docs/feeds/json/tag/Web

Since we're probably not using this anywhere yet, it's likely to need improvement. Go ahead and open a bug to collect any needful things for that feed.

> 2. A macro needs to get the tags for every subpage in a tree (up to some
> depth (usually only for immediate children, but I presume not always). This
> will be used for building menus and landing pages. We can fake this using
> page.subpages() then by calling wiki.getPage() currently but it's extremely
> slow; we need to figure out a better way to do this faster.
> ...
> It sounds like lorchard's feeling is that our best bet is to make the
> results of page.getPage() cacheable, which sounds good to me. How hard is it
> to accomplish that?

So, here's a thumbnail sketch after letting it simmer in my head for a day:

* (optional) Refactor _get_document_for_json in views.py to a Document.to_json() method in models.py - https://github.com/mozilla/kuma/blob/master/apps/wiki/views.py#L283

* Add a blob/text field to the Document model, eg. json_meta

* When a Document is changed & saved, build the JSON representation with document.to_json() & stash that in json_meta.

* When a Document is fetched, check json_meta. If it's empty, build $json representation & update the doc.

* (optional) Write a script that sweeps through all documents and fills in json_meta. Run this after deployment of all the above.

Doing the above front-loads the work of building JSON metadata for docs (including tags, localizations, etc), and ties it to when content is changed. At retrieval time, all the hard work should have already been done.

Then, we can offer an option for page.subpages that expands Document metadata by simply including the contents of json_meta. That gets us tags & all the rest. This also should make page.getPage() run much faster.

I'll probably end up doing this, but hopefully the details above give a decent starting point if someone else wants to scoop this up.
Depends on: 875349
Comment 4 was mainly me thinking out loud, so I filed bug 875349 to better represent the JSON metadata generation change as a blocker task. After that's done, the work to support an option in page.subpages() will still need to be done as a follow up (ie. this bug)
(In reply to Les Orchard [:lorchard] from comment #4)
> (In reply to Eric Shepherd [:sheppy] from comment #3)
> 
> > 3. A macro needs to find all pages matching a specified set of tags,
> > optionally filtered by locale. We do not have any way to search by tags yet.
> 
> FWIW, we do have a way to do this. There's a JSON feed for finding docs by
> tag, eg:
> 
>     https://developer.mozilla.org/en-US/docs/feeds/json/tag/Web

That only works for a single tag. What if I need pages that are marked as both Web and Reference? That's something we'll need soon. For example, we'll need before long the ability to build a list of reference pages relevant to add-on developers, even if they're in disparate places on the site. This will happen, because the add-on docs and docs about Mozilla internals are both relevant to add-on developers but will not be in the exact same subtree of the site, so we can't just rely on subtrees to do this properly.

> Then, we can offer an option for page.subpages that expands Document
> metadata by simply including the contents of json_meta. That gets us tags &
> all the rest. This also should make page.getPage() run much faster.

I'm in love with this plan!
(In reply to Eric Shepherd [:sheppy] from comment #6)
> > FWIW, we do have a way to do this. There's a JSON feed for finding docs by
> > tag, eg:
> > 
> >     https://developer.mozilla.org/en-US/docs/feeds/json/tag/Web
> 
> That only works for a single tag. What if I need pages that are marked as
> both Web and Reference?

Like I said: File another bug :) Need to collect requirements like this, separate from this bug.
> Like I said: File another bug :) Need to collect requirements like this,
> separate from this bug.

Done: filed bug 875368.
Now that we have the caching of JSON metadata, hopefully this can now happen pretty easily.
This should be working now, by way of a new template method page.subpagesExpand:

https://developer.mozilla.org/en-US/docs/Template:DekiScript:Page
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.