Closed Bug 768488 Opened 8 years ago Closed 6 years ago

[shipping] better outreach data

Categories

(Webtools Graveyard :: Elmo, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: Pike, Assigned: Pike)

Details

Attachments

(1 file, 2 obsolete files)

Now that we know how locales fall back, it'd be good to take that into the outreach data.

As I'm hacking on the other side of the story, aka, the outreach node app to send the actual mails, I'll be taking this one.
Attached patch rewrite the outreach data (obsolete) — Splinter Review
So, here's the intent:

This goes in parallel with a rewrite of my non-elmo outreach app, which is now an exhibit-based node app, no more couchdb.

To aid this, I've migrated the elmo outreach side to an exhibit, too. This allows to be more responsive in what you're selecting for outreach, in particular you can now just drill down to "let's mail all folks that are 1 release behind", and they deserve a more consistent messaging than the ones that are 5 releases behind.

I've also rewrote the actual view logic massively, to use my new toy, class-based views with methods that are view-independent. I didn't make use of that to write tests, because the test data needed is really massive.

I'm not really happy with the table view right now, that is, it'll show a whole ton of data for all apps, even if you just want to see one. That's hard to fix without rewriting the exhibit view, which I might do at some point. But I think I want us to migrate to exhibit3 before we do that, as that's working differently in APIs and refactoring.

PS: We're not loosing test coverage, the previous logic was going without any automated tests, too.
Attachment #638688 - Flags: review?(peterbe)
Attached patch rewrite the outreach data (obsolete) — Splinter Review
Sorry, wrong patch, forgot to git add before git commit --amend.
Attachment #638688 - Attachment is obsolete: true
Attachment #638688 - Flags: review?(peterbe)
Attachment #638743 - Flags: review?(peterbe)
Comment on attachment 638743 [details] [diff] [review]
rewrite the outreach data

Review of attachment 638743 [details] [diff] [review]:
-----------------------------------------------------------------

r- because of the cryptic use of text/plain for JSON output. It might be the right thing to do (e.g. bug in Exhibit or something to do with security) but then I'd like to see an explanation in the code.

::: apps/shipping/views/outreach.py
@@ +43,5 @@
> +             for k in sorted(context['signoffs'].iterkeys())])
> +        return HttpResponse(json.dumps({
> +            "items": items,
> +            "properties": self.getProperties()
> +            }, indent=2), content_type='text/plain')

Why oh why? We've gone through this before and now I can't remember what your reasoning was. 
Is it because of a bug in Exhibit? 

Also, why the `indent=2`? If you added that just for your human viewing pleasure you're much better off installing the JSONView add-on. 

If there is a good reason to send the wrong mimetype, then I'd expect a comment to explain so.

@@ +61,5 @@
> +        avt4av = dict((avt.appversion, avt) for avt in avts)
> +        avt4tree = dict((avt.tree_id, avt) for avt in avts)
> +        branch4av = dict((avt.appversion_id,
> +                          avt.tree.code.split('_', 1)[1]) for avt in avts)
> +        tree4av = dict((avt.appversion_id, avt.tree_id) for avt in avts)

You're looping over the same iterator 4 times. I strongly recommend a regular for loop and within set attributes on avt4av, avt4tree, branch4av, tree4av.

@@ +85,5 @@
> +                if fb not in fallbacks:
> +                    break
> +                fb = fallbacks[fb]
> +        locItems = self.getLocaleItems(locs)
> +        signOffItems = {}

Any particular reason for one-off camelCase variables?

@@ +111,5 @@
> +                    }
> +        loc4tree = dict((t, locs) for t, locs in loc4tree.iteritems()
> +                        if locs)
> +        badlocs = set()
> +        badlocs.update(*loc4tree.values())

Just
`badlocs = set(loc4tree.values())`
?

@@ +166,4 @@
>  
> +    def getAppVersionTreeThroughs(self):
> +        """get the AVTs for aurora and beta"""
> +        forests = list(Forest.objects

You don't need to expand the QuerySet here. The `__in` operator will recognize a QuerySet and take care of it.
Attachment #638743 - Flags: review?(peterbe) → review-
Priority: -- → P2
(In reply to Peter Bengtsson [:peterbe] from comment #3)
> Comment on attachment 638743 [details] [diff] [review]
> rewrite the outreach data
> 
> Review of attachment 638743 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- because of the cryptic use of text/plain for JSON output. It might be the
> right thing to do (e.g. bug in Exhibit or something to do with security) but
> then I'd like to see an explanation in the code.
> 
> ::: apps/shipping/views/outreach.py
> @@ +43,5 @@
> > +             for k in sorted(context['signoffs'].iterkeys())])
> > +        return HttpResponse(json.dumps({
> > +            "items": items,
> > +            "properties": self.getProperties()
> > +            }, indent=2), content_type='text/plain')
> 
> Why oh why? We've gone through this before and now I can't remember what
> your reasoning was. 
> Is it because of a bug in Exhibit? 
> 
> Also, why the `indent=2`? If you added that just for your human viewing
> pleasure you're much better off installing the JSONView add-on. 
> 
> If there is a good reason to send the wrong mimetype, then I'd expect a
> comment to explain so.

I use plain/text, because that's the most powerful way to handle JSON. Addons don't help me diff before and after data or anything like that. If I can't use plain text tools for the data, I'm doomed.

> @@ +61,5 @@
> > +        avt4av = dict((avt.appversion, avt) for avt in avts)
> > +        avt4tree = dict((avt.tree_id, avt) for avt in avts)
> > +        branch4av = dict((avt.appversion_id,
> > +                          avt.tree.code.split('_', 1)[1]) for avt in avts)
> > +        tree4av = dict((avt.appversion_id, avt.tree_id) for avt in avts)
> 
> You're looping over the same iterator 4 times. I strongly recommend a
> regular for loop and within set attributes on avt4av, avt4tree, branch4av,
> tree4av.

The question is if the iterator is more expensive than the locality of the dict creation code. Not sure if either clearly wins, tbh.

Readability reasons?

> @@ +85,5 @@
> > +                if fb not in fallbacks:
> > +                    break
> > +                fb = fallbacks[fb]
> > +        locItems = self.getLocaleItems(locs)
> > +        signOffItems = {}
> 
> Any particular reason for one-off camelCase variables?

Not really.

> @@ +111,5 @@
> > +                    }
> > +        loc4tree = dict((t, locs) for t, locs in loc4tree.iteritems()
> > +                        if locs)
> > +        badlocs = set()
> > +        badlocs.update(*loc4tree.values())
> 
> Just
> `badlocs = set(loc4tree.values())`
> ?

The code creates the union of all values(), so that's not the same thing.

> @@ +166,4 @@
> >  
> > +    def getAppVersionTreeThroughs(self):
> > +        """get the AVTs for aurora and beta"""
> > +        forests = list(Forest.objects
> 
> You don't need to expand the QuerySet here. The `__in` operator will
> recognize a QuerySet and take care of it.

I can probably just fold that into one query.
(In reply to Axel Hecht [:Pike] from comment #4)
> (In reply to Peter Bengtsson [:peterbe] from comment #3)
> > Comment on attachment 638743 [details] [diff] [review]
> > rewrite the outreach data
> > 
> > Review of attachment 638743 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > r- because of the cryptic use of text/plain for JSON output. It might be the
> > right thing to do (e.g. bug in Exhibit or something to do with security) but
> > then I'd like to see an explanation in the code.
> > 
> > ::: apps/shipping/views/outreach.py
> > @@ +43,5 @@
> > > +             for k in sorted(context['signoffs'].iterkeys())])
> > > +        return HttpResponse(json.dumps({
> > > +            "items": items,
> > > +            "properties": self.getProperties()
> > > +            }, indent=2), content_type='text/plain')
> > 
> > Why oh why? We've gone through this before and now I can't remember what
> > your reasoning was. 
> > Is it because of a bug in Exhibit? 
> > 
> > Also, why the `indent=2`? If you added that just for your human viewing
> > pleasure you're much better off installing the JSONView add-on. 
> > 
> > If there is a good reason to send the wrong mimetype, then I'd expect a
> > comment to explain so.
> 
> I use plain/text, because that's the most powerful way to handle JSON.
> Addons don't help me diff before and after data or anything like that. If I
> can't use plain text tools for the data, I'm doomed.
> 

Elmo is the only app that I work on that doesn't use a general decorator (or equivalent) that takes care of the json outputting. Usually it looks like this:

@json_view
def something(request):
    return {'key': 'value'}

Eg.
https://github.com/mozilla/pto/blob/develop/pto/apps/autocomplete/views.py#L12

If you insist on using your debugging trick in the code, I'd love to see a comment that justifies it. 

Perhaps what we should do is a decorator like `json_view` (in the example above) that takes a `settings.DEBUG_JSON_OUTPUT_CONTENT_TYPE` or something. Then we get the best of both worlds. 


> > @@ +61,5 @@
> > > +        avt4av = dict((avt.appversion, avt) for avt in avts)
> > > +        avt4tree = dict((avt.tree_id, avt) for avt in avts)
> > > +        branch4av = dict((avt.appversion_id,
> > > +                          avt.tree.code.split('_', 1)[1]) for avt in avts)
> > > +        tree4av = dict((avt.appversion_id, avt.tree_id) for avt in avts)
> > 
> > You're looping over the same iterator 4 times. I strongly recommend a
> > regular for loop and within set attributes on avt4av, avt4tree, branch4av,
> > tree4av.
> 
> The question is if the iterator is more expensive than the locality of the
> dict creation code. Not sure if either clearly wins, tbh.
> 
> Readability reasons?
> 

From Alex Martelli's famous Anti-Patterns in Python presentation: "A for loop is not a sin"
:)

If you worry that `for item in avt:` would expand the whole list before it iterates you can do: `for item in iter(avt):`


> 
> > @@ +111,5 @@
> > > +                    }
> > > +        loc4tree = dict((t, locs) for t, locs in loc4tree.iteritems()
> > > +                        if locs)
> > > +        badlocs = set()
> > > +        badlocs.update(*loc4tree.values())
> > 
> > Just
> > `badlocs = set(loc4tree.values())`
> > ?
> 
> The code creates the union of all values(), so that's not the same thing.
> 
I think I get it. loc4tree.values() is a list of lists.
Attached patch updated patchSplinter Review
Addressed a few comments, but I kept

        avt4av = dict((avt.appversion, avt) for avt in avts)
        avt4tree = dict((avt.tree_id, avt) for avt in avts)
        branch4av = dict((avt.appversion_id,
                          avt.tree.code.split('_', 1)[1]) for avt in avts)
        tree4av = dict((avt.appversion_id, avt.tree_id) for avt in avts)

as it's easier to read than

        avt4av = {}
        avt4tree = {}
        branch4av = {}
        tree4av = {}
        for avt in avts:
            avt4av[avt.appversion] = avt
            avt4tree[avt.tree_id] = avt
            branch4av[avt.appversion_id] = avt.tree.code.split('_', 1)[1]
            tree4av[avt.appversion_id] = avt.tree_id

I also stick to using formated plain text for JSON, and for production. elmo's different in that it handles big data, which people have questions about. I don't inted to jump through loops to answer them, if I can look at text in production instead.
Attachment #638743 - Attachment is obsolete: true
Attachment #679640 - Flags: review?(peterbe)
FYI this is much better than plain text: http://cl.ly/Kl30
Thank http://jsonview.com/ for that
Comment on attachment 679640 [details] [diff] [review]
updated patch

Review of attachment 679640 [details] [diff] [review]:
-----------------------------------------------------------------

Some nits that are more personal. Feel free to ignore and land. 

However, the repetition of the forest names I think is a more serious nit but I'm not going to r- because perhaps there's a reason they're separate and it's just a coincidence today that they happen to be equal.

::: apps/shipping/templates/shipping/outreach.html
@@ +8,5 @@
> +
> +{% block head_matter %}
> +<link href="{% url shipping.views.outreach.data %}" type="application/json" rel="exhibit/data"/>
> +{% exhibit %}
> +<script>

Any reason why this isn't in an external file? 
The only advantage I can see with inline javascript is if you need to conveniently pass along some variables from django. 

Also, it's a Mozilla security recommendation to minimize inline Javascript.

::: apps/shipping/views/outreach.py
@@ +20,5 @@
>  
>  
> +def dashboard(request):
> +    forests = (Forest.objects
> +               .filter(name__in=('releases/l10n/mozilla-aurora',

See comment below about moving these constants into a class attribute.

@@ +25,5 @@
> +                                 'releases/l10n/mozilla-beta')))
> +    apps = (AppVersionTreeThrough.objects.current()
> +            .filter(tree__l10n__in=forests)
> +            .order_by('appversion__app__code')
> +            .values_list('appversion__app__code', 'appversion__app__name')

What do you need `appversion__app__name` for?
I looked through outreach.html and I didn't see any use of the `name` value. Only the code is used in the for-loops on `apps`.

@@ +37,5 @@
>  
> +    def get(self, request):
> +        context = self.getContext()
> +        items = ([context['locales'][l]
> +                  for l in sorted(context['locales'].iterkeys())] +

You don't need `.iterkeys()`. Sorted is smart enough to do the exact same thing.

@@ +46,5 @@
> +            "properties": self.getProperties()
> +            }, indent=2), content_type='text/plain')
> +
> +    @staticmethod
> +    def getProperties():

THis is overly verbose. It's only ever used in one place. 
I suggest you move these three lines that set up the `props` dict straight into the `get()` method above.

@@ +56,5 @@
> +    def getContext(self):
> +        """get the data, doesn't need any params
> +        Returns a {'locales':[locale_items], 'signoffs':[so_items]}
> +        """
> +        avts = self.getAppVersionTreeThroughs()

This new way of looping over them as a list is slower and it's based on the assumption that `avts` is a list. If it was a QuerySet you'd be running 4 SQL selects when 1 is all you need. And what if that assumption breaks?

I know it's personal taste but I though it was easier to read when it was 1 for loop with a bunch of setattrs going on. 

I know this isn't particularly important but if was an important loop I would press on.

@@ +64,5 @@
> +                          avt.tree.code.split('_', 1)[1]) for avt in avts)
> +        tree4av = dict((avt.appversion_id, avt.tree_id) for avt in avts)
> +        locs = set()
> +        loc4tree = defaultdict(set)
> +        for t, l in (Run.objects

is that a `l` as in One. Or a `l` as in Lego? :)

`l` is a bad variable name. 

Can we use `tree` and `loc` instead?

@@ +117,5 @@
> +        for loc in goodlocs:
> +            locale_items.pop(loc, None)
> +        # get current compare-locales state for locales/appvers
> +        # that don't have sign-off activity
> +        or_ = lambda l, r: l | r

Use `from operator import or_`

Feels less hacky. And `operator` is written in C so it'll save us a couple hundreds of nanoseconds. :)

@@ +154,5 @@
> +            'signoffs': signoff_items
> +        }
> +
> +    def getLocaleItems(self, locs):
> +        return dict((l.code, {

What's wrong with a plain old for loop? 
They're easier on the eye, less indentation trouble and easier to debug because you can put in print statements and stuff.

@@ +166,4 @@
>  
> +    def getAppVersionTreeThroughs(self):
> +        """get the AVTs for aurora and beta"""
> +        forests = ('releases/l10n/mozilla-aurora',

Move this out into a class attribute for the whole data view. 
These precious specific  forest names is repeated in two places in the same class.

@@ +168,5 @@
> +        """get the AVTs for aurora and beta"""
> +        forests = ('releases/l10n/mozilla-aurora',
> +                   'releases/l10n/mozilla-beta')
> +        avts = (AppVersionTreeThrough.objects.current()
> +                .filter(tree__l10n__name__in=forests)

...and thus that would become `self.forest_names` instead
Attachment #679640 - Flags: review?(peterbe) → review+
Let's close this out, I've stopped maintaining that branch a while ago.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.