If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Non-taggit-powered Related Articles

VERIFIED FIXED in 2.3

Status

support.mozilla.org
Knowledge Base Software
P2
normal
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: jsocol, Assigned: jsocol)

Tracking

unspecified

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

7 years ago
Since taggit isn't going to cut it for generating related object lists, we'll need to do this via Sphinx. (Unless someone has a simpler suggestion?)

There are a few filters we should put in place:

* Restrict category to the category of the current document.
* Restrict locale.
* Restrict to documents with approved revisions.

As for what generates "related" the two things I'm thinking are: tags or just query the title.

The downside to tags is that they're an intersection, so if article X is the only article with tag Y, it won't have any related articles. We could change that in the current API, where we have to pass all the filters to the SearchClient but if we rework the way the SearchClients work that might not be true.

The downside to querying the title is it's harder to predict what the related articles will be.

New ideas welcome.
(Assignee)

Updated

7 years ago
Assignee: nobody → james
(Assignee)

Comment 1

7 years ago
I've got the following query doing selects very quickly:

SELECT
  t1.object_id,
  t2.object_id,
  count(*) AS common_tags
FROM
  taggit_taggeditem t1 JOIN
  taggit_taggeditem t2 JOIN
  wiki_document d
WHERE 
  t1.tag_id = t2.tag_id AND
  t1.object_id <> t2.object_id AND
  t1.content_type_id = 31 AND
  t1.object_id = 375 AND
  t2.content_type_id = 31 AND
  d.id = t2.object_id AND
  d.locale = 'en-US' AND
  d.category = 1 AND
  d.current_revision_id IS NOT NULL
GROUP BY
  t1.object_id,
  t2.object_id
ORDER BY
  common_tags DESC
LIMIT 5;

This ran almost instantly, bug I only have two tagged documents at the moment. Here's the output of EXPLAIN:

> +----+-------------+-------+-------------+----------------------------------------------------------------------------------+-------------------------------------------------------+---------+------------------------+------+----------------------------------------------------------------------------------------------------------------------+
> | id | select_type | table | type        | possible_keys                                                                    | key                                                   | key_len | ref                    | rows | Extra                                                                                                                |
> +----+-------------+-------+-------------+----------------------------------------------------------------------------------+-------------------------------------------------------+---------+------------------------+------+----------------------------------------------------------------------------------------------------------------------+
> |  1 | SIMPLE      | t1    | index_merge | taggit_taggeditem_3747b463,taggit_taggeditem_e4470c6e,taggit_taggeditem_c32a93c2 | taggit_taggeditem_c32a93c2,taggit_taggeditem_e4470c6e | 4,4     | NULL                   |    1 | Using intersect(taggit_taggeditem_c32a93c2,taggit_taggeditem_e4470c6e); Using where; Using temporary; Using filesort |
> |  1 | SIMPLE      | t2    | ref         | taggit_taggeditem_3747b463,taggit_taggeditem_e4470c6e,taggit_taggeditem_c32a93c2 | taggit_taggeditem_e4470c6e                            | 4       | const                  |    6 | Using where                                                                                                          |
> |  1 | SIMPLE      | d     | eq_ref      | PRIMARY,wiki_document_928541cb,wiki_document_a253e251                            | PRIMARY                                               | 4       | sumo_data.t2.object_id |    1 | Using where                                                                                                          |
> +----+-------------+-------+-------------+----------------------------------------------------------------------------------+-------------------------------------------------------+---------+------------------------+------+----------------------------------------------------------------------------------------------------------------------+

Here is a similar query for Questions, where I've included some arbitrary conditions for fairness. This took 0.77 seconds to run, on my VM.

SELECT
  t1.object_id,
  t2.object_id,
  count(*) AS common_tags
FROM
  taggit_taggeditem t1 JOIN
  taggit_taggeditem t2 JOIN
  questions_question q
WHERE
  t1.tag_id = t2.tag_id AND
  t1.object_id <> t2.object_id AND
  t1.content_type_id = 19 AND
  t1.object_id = 757142 AND
  t2.content_type_id = 19 AND
  q.id = t2.object_id AND
  q.solution_id IS NOT NULL AND
  q.is_locked = 0
GROUP BY
  t1.object_id,
  t2.object_id
ORDER BY
  common_tags DESC
LIMIT 5;

Here's the EXPLAIN for that one:

> +----+-------------+-------+--------+----------------------------------------------------------------------------------+----------------------------+---------+------------------------+------+----------------------------------------------+
> | id | select_type | table | type   | possible_keys                                                                    | key                        | key_len | ref                    | rows | Extra                                        |
> +----+-------------+-------+--------+----------------------------------------------------------------------------------+----------------------------+---------+------------------------+------+----------------------------------------------+
> |  1 | SIMPLE      | t1    | ref    | taggit_taggeditem_3747b463,taggit_taggeditem_e4470c6e,taggit_taggeditem_c32a93c2 | taggit_taggeditem_c32a93c2 | 4       | const                  |    5 | Using where; Using temporary; Using filesort |
> |  1 | SIMPLE      | t2    | ref    | taggit_taggeditem_3747b463,taggit_taggeditem_e4470c6e,taggit_taggeditem_c32a93c2 | taggit_taggeditem_3747b463 | 4       | sumo_data.t1.tag_id    | 1261 | Using where                                  |
> |  1 | SIMPLE      | q     | eq_ref | PRIMARY,solution_id_refs_id_95fb9a4d                                             | PRIMARY                    | 4       | sumo_data.t2.object_id |    1 | Using where                                  |
> +----+-------------+-------+--------+----------------------------------------------------------------------------------+----------------------------+---------+------------------------+------+----------------------------------------------+

I also tried with subqueries for the content type and it didn't really change much (I assume it's a negligible part of the query, which explain helps).

I think this show a couple of things:
1) It is possible to do tag-based related item queries in SQL significantly faster than taggit.
2) Doing so with reasonable filters requires passing in parameters.
3) The time it takes depends greatly on the number of tagged objects--it will be faster for the KB than Questions.
4) Because of 3, we should probably cache the results.
5) Indexes FTW.

Any thoughts on doing it this way?
Summary: Sphinx-powered Related Articles → Non-taggit-powered Related Articles
(Assignee)

Comment 2

7 years ago
Kudos to Laura for an excellent suggestion:

1) Add a missing index in the first query (wiki_document.category). It'll speed it up in general.
2) Run this on a cron, without the ORDER BY or `t1.object_id = #` requirement, and store the results in a table.
3) The views will just pull it out of the table. ORDER BY an index integer should be much faster an ORDER BY a COUNT(*).

I'm building this out using ManyToMany(through=) so it's testable.
(Assignee)

Comment 3

7 years ago
https://github.com/jsocol/kitsune/commit/9c7e416d

Filed bug 609771 to turn on the cron.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Verified related articles are populating based on tags
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.