Closed Bug 1455314 Opened 6 years ago Closed 6 years ago

Deprecate heavy_users view

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
1

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: frank)

References

Details

This isn't getting any use, and we've moved to defining active_users with total_uri_count.
Saptarshi, Peter - you are both stakeholders on this dataset, can you confirm that you're not using it for any analyses?
Flags: needinfo?(sguha)
Flags: needinfo?(pdolanjski)
Sadly, i've not been using. I had hoped that others would be using this for their queries unfortunately it never took off.
So sorry for all the effort you put in to this
Flags: needinfo?(sguha)
I agree with :joy.  I think this was a cart before the horse thing, where there was an assumption that segmenting into heavy usage would enable folks to focus on that top segment.  I think we can deprecate it.
Flags: needinfo?(pdolanjski)
Thanks :pdol and :joy, that's what I needed. Data projects are moving quickly at Mozilla and it's understandable that we've decided to use different methods of understanding users.

Brendan, you have the only query on heavy_users. I'm letting you know that the table is going to be deprecated in the next week. Do you have any issues with that?
There was an effort to get a data set for signatures based off of crash pings, and we were hoping to combine the heavy_users data set with that. But since we still aren't using crash pings for release health, I guess we'll care about this again sometime later.
(In reply to David Durst [:ddurst] from comment #5)
> There was an effort to get a data set for signatures based off of crash
> pings, and we were hoping to combine the heavy_users data set with that. But
> since we still aren't using crash pings for release health, I guess we'll
> care about this again sometime later.

David, can you give a bit more info? What were you planning to use the heavy_users info for here?
Flags: needinfo?(ddurst)
Sure, though we didn't get as far as proposing regular use of it, more of a prototype that used it (and assuming it would still be useful, but ymmv and things have surely changed).

We did a test of symbolication and signature generation from crash pings to generate top crashers lists. We then looked at those lists in three groups: 1) top crashers within 2 weeks of profile creation, 2) top crashers withing 6 weeks of profile creation, 3) top crashers for heavy users (no window).

The idea was that we could eventually be using these signatures for release health monitoring and determining if there was a difference for new and not-new and heavy users. There may well be better ways to do that now, and we haven't moved forward with crash pings in this way (though we are collecting them and still could).
Flags: needinfo?(ddurst)
Assignee: nobody → fbertsch
Blake, can you remove `heavy_users` from the hive metastore(s)?

I'll be removing this job from telemetry-airflow.
Flags: needinfo?(bimsland)
(In reply to David Durst [:ddurst] from comment #7)
> The idea was that we could eventually be using these signatures for release
> health monitoring and determining if there was a difference for new and
> not-new and heavy users. There may well be better ways to do that now, and
> we haven't moved forward with crash pings in this way (though we are
> collecting them and still could).

Sounds like a good use case, but we could probably serve this equally well using clients_daily and either `total_uri_count` or `active_ticks` (where the former is how we now define active users [0]). When you get to that point we can come up with a plan.

[0] https://docs.telemetry.mozilla.org/cookbooks/active_dau.html
(In reply to Frank Bertsch [:frank] from comment #9)
> (In reply to David Durst [:ddurst] from comment #7)
> > The idea was that we could eventually be using these signatures for release
> > health monitoring and determining if there was a difference for new and
> > not-new and heavy users. There may well be better ways to do that now, and
> > we haven't moved forward with crash pings in this way (though we are
> > collecting them and still could).
> 
> Sounds like a good use case, but we could probably serve this equally well
> using clients_daily and either `total_uri_count` or `active_ticks` (where
> the former is how we now define active users [0]). When you get to that
> point we can come up with a plan.
> 
> [0] https://docs.telemetry.mozilla.org/cookbooks/active_dau.html

Well, I'm on another team now, so it's really whether someone involved with stability is interested in such a thing.
> Well, I'm on another team now, so it's really whether someone involved with
> stability is interested in such a thing.

I've included in the docs ways to recreate the dataset from existing data, so that should cover these use-cases.

https://docs.telemetry.mozilla.org/datasets/obsolete/heavy_users/reference.html#replacement
Depends on: 1465510
Flags: needinfo?(bimsland)
This has been deprecated.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.