Closed Bug 914266 Opened 11 years ago Closed 11 years ago

Add is_archived column to questions.

Categories

(support.mozilla.org :: Questions, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED
2013Q3

People

(Reporter: rrosario, Assigned: mythmon)

References

Details

(Whiteboard: u=sumo-team c=questions p=3 s=2013.18)

Attachments

(1 file)

We currently are abusing the is_locked column for several different use cases:
* Duplicate questions
* Off topic questions or spam (sometimes they are deleted instead of locked)
* Questions older than 180 days

This last case is different than the other two. It is more like an archiving. We want the question to count for all of the metrics but we don't want them appearing in search results. The SUMO team does not want the first two cases to be counted in any of our metrics. Therefore, we need an extra state.

My proposed solution is to add a new column to questions for archiving them. This will only be applied by our cron job that expires old questions.

What needs to happen?

* Schema change. We need to add the new column to questions_question. This might take a while to run and, therefore, require DBA supervision.
* Data migration. For all questions older than 180 days, we want to set is_archived=True to all the questions that have is_locked=True and reset is_locked to False.
* Search mapping change. We want to add archived to the mapping.
* Search querying changes. We need to tweak our default search to not show archived questions (in addition to not showing locked questions).
* UI/View changes.
** We want archived questions to behave just like locked ones. That is, you can't reply to them or vote on them and whatever else we block for locked questions.
** We want to show a note about the question being archived just like we do for locked questions.
* Update/add tests.

Did I miss anything? This is quite a bit. Is it worth it? Are there other solutions?
Blocks: 911967
I'll work on this (and estimate it) after I'm done tracking down the undefined errors in prod.
Assignee: nobody → mcooper
Here is my first thoughts on the scale of this:

* Schema change: This sounds easy, but last time we touched this schema, it kind of blew up. Plus we probably want a DBA to run this in a way to avoid downtime.

* Data migration: My only concern here is we might set some threads that were locked by a moderator to unlocked and archived. The questions would remain in a state in which new posts can't be made, but the UI would say it was archived instead of locked, which might cause confusion. I'm not sure how much traffic old questions get. (Unless we have a way to tell why a thread was closed, was it a moderator or the 180 day timer. I assume not, otherwise we wouldn't have this bug).

* Search mapping change: This is pretty easy as far as ES changes go, but it is still a mapping change which means it has to be a two step process, and the deploy will take a while and the deploy takes a bunch of care.

* Search querying changes: This is probably the easiest thing on the list.

* UI Stuff: This would mainly be hunting down all the places we use the locked status in templates and extending it to the archived status. We also probably need an icon for archived things. This would need (at least) two sizes: a small one for use in the question list, and a larger one for the note at the bottom of the question detail page.

Tests: Yay tests. This seems like a simple thing to test., so that shouldn't be too hard.

----

I would say this at least 3pts. If it proves to be bigger than that, a good target for a spin off is to move the ES parts to another bug.
(In reply to Mike Cooper [:mythmon] from comment #2)
> 
> I would say this at least 3pts. If it proves to be bigger than that, a good
> target for a spin off is to move the ES parts to another bug.

If we spin parts of this off to an ES-work-specific bug, would the changes for that bug have to land with the changes in this one? Or could that land separately?
RE: Data Migration: Do we keep track of who locks a thread? I think the answer is no. 

How does the graphing of the KPI work? Are we storing the historical data or are we calculating the graph as we go? If so, this change could create a break in time in how things are measured...but it's not the biggest deal (the amount of threads that get locked manually is not that big).

Regarding the UI, if we decide that we want a slightly different message for these threads (i.e. "This question may be obsolete" or something like that) I don't see that being a problem either: The thread was manually locked...but it's also old so the archived status should apply. me.thinks.
Locked threads are currently mainly double posts. There is essentially no need to reply to them any further. We are currently marking them as such before we lock them with a reply in the thread, so the confusion should be minimal.
(In reply to Mike Cooper [:mythmon] from comment #2)
> * Schema change: This sounds easy, but last time we touched this schema, it
> kind of blew up. Plus we probably want a DBA to run this in a way to avoid
> downtime.

+1

> * Data migration: My only concern here is we might set some threads that
> were locked by a moderator to unlocked and archived. The questions would
> remain in a state in which new posts can't be made, but the UI would say it
> was archived instead of locked, which might cause confusion. I'm not sure
> how much traffic old questions get. (Unless we have a way to tell why a
> thread was closed, was it a moderator or the 180 day timer. I assume not,
> otherwise we wouldn't have this bug).

All questions older than 180 days will just lose the locked bit. We can't tell if they were locked manually or just meant to be archived. But going forward, anything newer than that will have a locked vs archived state.
Will: I think that we could land changes for the db side of this without changing the Search stuff. The default search seems to filter only by age, and not by is_locked (as far as I can tell), so changing all questions to have is_archived=True shouldn't affect search.

That being said, I would prefer to do this in one fell swoop.
I said this was probably 3 points, but I didn't write it in the whiteboard field. Fixing that. Now that Will's funtimes with South are done, I'm going to start on this.
Status: NEW → ASSIGNED
Whiteboard: u=sumo-team c=questions p= s=2013.18 → u=sumo-team c=questions p=3 s=2013.18
The questions_question table is only 184Mb, with 223,739. Adding a column should take less than a minute, although it depends on traffic and such. 

Are the rest of the queries in the code written such that if we change a slave to have the new field, replication won't break? We can test this out if you want,we can take a slave out of the load balancer, add the column, and see if replication breaks. If it does not, we can do it with the rest of the slaves, one at a time. With the master, we can failover and do the master while it's out of the load balancer, then switch back.

It's not a bad practice to do the failover/switch back, and actually this month we have some maintenance we'd like to do, so we'd already planned to have a failover/switchback anyway. We can move it to when it's convenient for you, with your changes, too.
Sheeri: I'm not 100% sure, but I think we looked into the question about extra columns and replication last time we did this. I am fairly sure the answer is that it will all be fine, because Django always inserts into and selects from explicitly named columns. I'm also +1 on doing this in concert with the maintenance you were planning. Does sometime next week work for you? I expect that at least the DB part of this will be done and reviewed by Tuesday.
Yeah, I tend to think we'll be fine too. And next week works great!
Attached image shot_25044.png
For the UI work, this needs an icon for the question list page. It would fill theempty space in this screenshot, and should be 16px by 16px.
Needsinfo-ing Kadir so he can poke the correct people.
Flags: needinfo?(a.topal)
Our go-to person is on PTO for another week unfortunately, so let's move forward with the place holder and we can replace it once he is back.
Flags: needinfo?(a.topal)
We talked about this on IRC: Mike is going to use a place holder, since we don't want to block on the icon. The threads will not be indexed by search engines and will not be included in SUMO search results. Very few people should ever see them.
(In reply to Kadir Topal [:atopal] from comment #15)
> The threads will not be indexed by search engines.

This is a change to our current setup. For example, look at this search:
https://www.google.com/search?q=When+I+open+a+new+tap+then+select+a+web+site

The top result is this archived question:
https://support.mozilla.org/en-US/questions/931126


I don't think we should change this behavior without doing some due diligence first. Maybe file a bug if we want that to be the behavior.
Flags: needinfo?(a.topal)
I stand corrected, we are only doing that with questions that have no reply.
Flags: needinfo?(a.topal)
PR: https://github.com/mozilla/kitsune/pull/1623

This is ready to land, but I haven't yet because the deployment needs to happen carefully. It has a DB migration we want DBA assistance with, and an ES schema change that requires a 2 step deploy. I am going to wait on landing this until I hear from Sheeri (or an appropriate analogue) that we are ready to go.

Sheeri (or analogue): The migration in question is known as "0002_auto__add_field_questions_is_archived". It adds a column to the table questions_question. In addition, we have to inform South (the migration tool) that the migration has been run so it doesn't try and duplicate effort.

MariaDB [kitsune]> select * from south_migrationhistory where app_name='questions';
+----+-----------+-------------------------------------------+---------------------+
| id | app_name  | migration                                 | applied             |
+----+-----------+-------------------------------------------+---------------------+
| 14 | questions | 0001_initial                              | 2013-09-09 13:26:05 |
| 53 | questions | 0002_auto__add_field_question_is_archived | 2013-09-16 13:01:25 |
+----+-----------+-------------------------------------------+---------------------+
2 rows in set (0.00 sec)

The row with id=53 will need to be inserted. The id and the applied fields don't have to be the same, but the app_name and migration fields do. As far as creating the column goes, I think this SQL would do the trick, but feel free to tweak it.

    ALTER TABLE `questions_question` ADD COLUMN `is_archived` bool NULL;

Once those database changes are in place, we can push the new code and start on the ES half of the deploy.

Thanks for the help!
This landed in these two commits
 - https://github.com/mozilla/kitsune/commit/c9f858193dd750657b4197a6b47da235a7a27b69
 - https://github.com/mozilla/kitsune/commit/ff75f8e11e1df76fa3498004e2018de67107739b

I chatted with sheeri and cyborgshadow in irc. We decided to do stage via the normal migration system, to see how it 
goes. So I am deploying this to stage. Signs currently point to doing the migration via South on prod too, as the migration runs pretty quick.

I'm working on deploying this and doing the ES two-stop on stage. Then I'll go to prod.
This is prod now! We slowed down the site a bit because the ES part of the migration hammered the DB. It turns out that part of the migration was probably ill conceived, since I then turned around and reindexed the site. Oh well. Next time we will know better.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This came up in the contributor forum:

https://support.mozilla.org/forums/contributors/709632 archived threads?
See Also: → 680924
See Also: → 602990
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: