[database] hack created cruft Action data in the live db, can we clean that up?

RESOLVED FIXED in 2.1

Status

Webtools
Elmo
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Pike, Assigned: Pike)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Assignee)

Description

6 years ago
Created attachment 573034 [details]
bad bad code

My local (unreviewed) code to migrate from one release to the next is a piece of shite. Notably, the output looks like:

migrating 46 40
action count 736
migrating 37 14
action count 918
migrating 21 13
action count 994
migrating 14 10
action count 1042

... meaning that, once I understood why the code said what it said, that I'm probably having thousands of Action entries in the db that have absolutely no right to exist. The fix in the attached file for reference is likely 

    somap = {}
    for app_code, app in app4code.iteritems():

vs

    for app_code, app in app4code.iteritems():
        somap = {}

I don't deserve any help in cleaning this up, but I'd take it anyway. How god-freakin aweful, and how shameful that I haven't looked at what this code does everytime I ran it so far. I did complain to myself about it taking long, of course. Bad Axel, corner, stupid hat.

It'd be really good to have the data cleaned up before I try to do the merging of all that bad data in data1.1, too. Like, duplicate data is one thing, but up to 3 silly dupes, bad.
I'm still not grasping what's going on due to lack of goals and context. 

However, clearly whatever code we're talking about it needs to be folded in nicely and in a structured way so we can document it and unit test it. 

I don't know how to actually execute this but I'd love to see it as a management command where the meat of the management command is distinctly separated from the handling of argument options. Then we can write tests that solidify what's going on and we can assert that dupes aren't created. 

Honestly don't know where to begin in terms of helping out here.
For my own help mainly: here's the snippet you sent me to try to explain something about a potentially better way of doing something 
http://pastebin.mozilla.org/1378550
(Assignee)

Comment 3

6 years ago
Created attachment 587290 [details] [diff] [review]
add a command to go through the duplicate actions, and delete all but one

This command goes through all Actions, on a 100-ish chunk base, and deletes all obsolete/duplicate actions.

I tested this locally against a db dump.

A word on the chunking:
The cloning does copy over the 'when', so one can be sure that all duplicate actions have the same time. So, in order to not have duplicate actions in different chunks, I chunk by when. To have about the same amount of actions in each chunk, I first run a query to find a good timestamp, and then take all actions within that timeframe.
Assignee: nobody → l10n
Status: NEW → ASSIGNED
Attachment #587290 - Flags: review?(peterbe)
Comment on attachment 587290 [details] [diff] [review]
add a command to go through the duplicate actions, and delete all but one

Review of attachment 587290 [details] [diff] [review]:
-----------------------------------------------------------------

r- due to the broken 'quiet' option. Other things are just pyflake nits. Feel free to land once you fix the quiet option.

::: apps/shipping/management/commands/cleanup_actions.py
@@ +37,5 @@
> +'''Update all local clones to the revisions that are shipped with a milestone.
> +'''
> +
> +from collections import defaultdict
> +import datetime

not used

@@ +39,5 @@
> +
> +from collections import defaultdict
> +import datetime
> +from optparse import make_option
> +import os.path

not used

@@ +41,5 @@
> +import datetime
> +from optparse import make_option
> +import os.path
> +
> +from django.core.management.base import BaseCommand, CommandError

CommandError not used

@@ +42,5 @@
> +from optparse import make_option
> +import os.path
> +
> +from django.core.management.base import BaseCommand, CommandError
> +from django.db.models import Max

not used

@@ +46,5 @@
> +from django.db.models import Max
> +from shipping.models import Action
> +
> +class Command(BaseCommand):
> +    option_list = BaseCommand.option_list + (

The 'quiet' option disappears when the `option_list` is redefined. I think it should be this::

    option_list = BaseCommand.option_list + (
        make_option('-q', '--quiet', dest = 'quiet', action = 'store_true',
                    help = 'Run quietly'),
        make_option('-n', '--dry-run', dest = 'dry', action = 'store_true',
                    help = 'Run quietly'),
        )
Attachment #587290 - Flags: review?(peterbe) → review-
(Assignee)

Comment 5

6 years ago
https://github.com/mozilla/elmo/commit/f3495a4bb6aae6da0139ded2f21cfd9289bbe44e, FIXED.

I'll check if it's useful to run remotely, if not, it might want to wait 'til this is on production so that we can run it on bm-l10n-dashboard01
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1
(Assignee)

Comment 6

6 years ago
Takes a good while locally, even without trying to actually remove anything, aka --dry-run.

Might be a good exercise to do this on a fresh port of the db on -dev-, from bm-l10n-dashboard01, too.
(Assignee)

Comment 7

6 years ago
Gotta reopen. there's actually a race condition in the code, but that's OK-easy to fix. Basically, if there's more than a 100 actions on the cut time, things run in loops. And on the db state on dev, that triggers. That didn't happen with an earlier clone for me.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 8

6 years ago
Created attachment 589025 [details] [diff] [review]
make next_cut be earlier, and slice more inclusive

Bustage fix, this patch does two things:

The slice that we're walking over takes both start and end time. That makes us iterate over some actions more than once, but that's OK. It does make sure we're not stalling, though.
The next_cut is moved out a tad further to the past, by excluding the current cutoff.

Both together make the chunk of actions a good deal larger than 100, I've seen as much as 300 in a dry-run.

Gonna request review once I've done a smoke-test on dev on this.
(Assignee)

Comment 9

6 years ago
Comment on attachment 589025 [details] [diff] [review]
make next_cut be earlier, and slice more inclusive

This worked on -dev-, cutting us down from 20k to 13k actions, too.
Attachment #589025 - Flags: review?(peterbe)
Comment on attachment 589025 [details] [diff] [review]
make next_cut be earlier, and slice more inclusive

Code looks sane. Admittedly, I haven't really grokked what's going on.
Attachment #589025 - Flags: review?(peterbe) → review+
(Assignee)

Comment 11

6 years ago
https://github.com/mozilla/elmo/commit/13cd73be134535993d6380570e36b55b195acc0d, marking FIXED again.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.