Closed Bug 1260715 Opened 9 years ago Closed 9 years ago

Review and schedule CSV summary export for the fennec-dashboard for Fennec 46

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Product:

Component:

Type:

defect

Priority:

P1

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [measurement:client])

Georg Fritzsche [:gfritzsche]

Assignee

Description

•

9 years ago

In bug 1251189, we build an IPython notebook that generates us the weekly & monthly CSV data needed for the fennec-dashboard. However, with bug 1257589 we don't have useful data from Fennec 45 yet. We will have to: * wait until we are clear on what ping version we have in Fennec 46 * potentially update the CSV export to those changes * wait for validation of the 46 "core" ping data * if we are good, start scheduling the job for 46+

Georg Fritzsche [:gfritzsche]

Assignee

Comment 1

•

9 years ago

To find "new records" / "new clients" we use check like "profile creation date == submission date". This seems pretty fragile to missing pings, temporary loss of network, ... We could change that into something like "profile creation date is in current week/month range" (depending on the job type) to make it more stable.

Georg Fritzsche [:gfritzsche]

Assignee

Comment 2

•

9 years ago

From a first backfill viewed in the diagnostics viewer [1], the retention data seems off. I see d1 < d7 < d30, expected is d1 > d7 > d30. It looks like the retention checks in the notebook [2] are the wrong way around: > # Is the user still engaged after 1 day (d1)? > if days_after_creation == 1: > safe_increment(acc, 'd1') > > # And after 7 days (d7)? > if days_after_creation <= 7: > safe_increment(acc, 'd7') ... 1: https://metrics.services.mozilla.com/diagnostic-data-viewer/?dataset=fennec-v4-weekly# 2: https://github.com/mozilla-services/data-pipeline/blob/master/reports/fennec_dashboard/summarize_csv.ipynb

Alessio Placitelli [:Dexter]

Comment 3

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #2) > From a first backfill viewed in the diagnostics viewer [1], the retention > data seems off. > I see d1 < d7 < d30, expected is d1 > d7 > d30. Good catch, my bad. I'll fix this right now and link to the PR.

Alessio Placitelli [:Dexter]

Comment 4

•

9 years ago

https://github.com/mozilla-services/data-pipeline/pull/198

Georg Fritzsche [:gfritzsche]

Assignee

Comment 5

•

9 years ago

The retention change was merged. Other things: * we need to update this to work from schema version 2 * we need to think about how to avoid having to bump the schema version manually all the time * we should print out how many date chunk a backfill is working over (so we have an idea of job progress)

Georg Fritzsche [:gfritzsche]

Assignee

Comment 6

•

9 years ago

We also need to remove the count() calls that trigger redundant transformations etc.

Georg Fritzsche [:gfritzsche]

Assignee

Comment 7

•

9 years ago

Also we need to filter for os=="Android" until we know how/if iOS is supposed to be integrated.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 8

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #5) > * we need to think about how to avoid having to bump the schema version > manually all the time I had the same issue with the ETL script used to move core ping data into Re:dash. Mark Reid suggested using source_version='*' in get_pings. It works! A single get_pings can grab all the data for all the pings.

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Priority: P3 → P2

Georg Fritzsche [:gfritzsche]

Assignee

Comment 9

•

9 years ago

PR for the changes: https://github.com/mozilla-services/data-pipeline/pull/206

Georg Fritzsche [:gfritzsche]

Assignee

Comment 10

•

9 years ago

I backfilled the weekly dataset from the 46 release date on: https://metrics.services.mozilla.com/diagnostic-data-viewer/?dataset=fennec-v4-weekly#

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Assignee: nobody → gfritzsche

Georg Fritzsche [:gfritzsche]

Assignee

Comment 11

•

9 years ago

We backfilled the weekly summary, looking at it in the diagnostic data viewer it seems good: https://metrics.services.mozilla.com/diagnostic-data-viewer/?dataset=fennec-v4-weekly# We have to wait for a full month of data for the first monthly backfill and more weekly sanity checks. I'll pick this up again early June and wrap it up then.

Assignee: gfritzsche → nobody

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Assignee: nobody → gfritzsche

Priority: P2 → P1

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Points: 2 → 3

Georg Fritzsche [:gfritzsche]

Assignee

Comment 12

•

9 years ago

This was backfilled: * weekly data: until May 29 * monthly data: for the whole of May I scheduled a job for the weekly update (to monday 11AM UTC).

Georg Fritzsche [:gfritzsche]

Assignee

Comment 13

•

9 years ago

PR with a refactoring for speed-up and some logging improvements: https://github.com/mozilla-services/data-pipeline/pull/214

Georg Fritzsche [:gfritzsche]

Assignee

Comment 14

•

9 years ago

We had a first meeting in London with adavis and bbermes: Overall the numbers don't look completely off. * The release channel ADI is within the Adjust data bounds and growing toward that install base. * Sadly Adjust is not giving us a breakdown by app version, so we can't directly cross-check (although adavis had ideas on cross-checking raw Adjust report & "core" ping submission numbers). * The retention numbers are lower than what we see in Adjust (but not by very large margins), we don't know why yet. * Using "fixed retention" seems good, as it seems to be the "industry standard". Actions from there: * We need a follow-up meeting and need to make a decision on whether the dashboard data can go live * hide FHR retention data from dashboard, because the metric changed (fixed vs. rolling retention) * poke mreid for the PR review The backfill for last week already happened per the scheduled job, so the scheduling seems to work now.

Georg Fritzsche [:gfritzsche]

Assignee

Comment 15

•

9 years ago

Notes from the London meeting (moco only due to ADI numbers): https://docs.google.com/document/d/1cYGaQ3s2Vhk489oi-bLPtwtmLkNZqsRA_QhzWk6bvCY/edit

Georg Fritzsche [:gfritzsche]

Assignee

Comment 16

•

9 years ago

The monthly data for June is now also available & the weekly scheduled jobs have been running fine: https://metrics.services.mozilla.com/fennec-dashboard/

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Blocks: 1284932

Georg Fritzsche [:gfritzsche]

Assignee

Comment 17

•

9 years ago

The scheduling happened and should be working properly. I'm breaking out the rentention data investigation into bug 1284932.

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.