Closed Bug 1240522 Opened 9 years ago Closed 7 years ago

Generate low-latency "ADI per channel" data

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: gfritzsche, Unassigned)

References

Details

User Story

From mail thread:

> My goal is relatively simple. I would like to have a clear picture the usage of
> Firefox on the beta and release channel per version.
> 
> For example, I would like to have:
> * 42.0 - XXX users
> * 43.0 - YY users
> * 41.0.1 - ZZ users
> * 40.0 - 2 users
> 
> same for beta
> (42.0b9, 43.0b4, etc).
> This will help us when we start new builds for the partial generation
> (the binary diff between two versions).
> 
> I need an update of the number every hours (more if it is expensive)
> and if the data could be fresh (~ 1 day), this would be perfect.

...
> I don't need any graph. I am just interested by the current numbers.

Attachments

(1 file)

Example data 9 years ago Georg Fritzsche [:gfritzsche] 1.25 KB, text/plain		Details

Georg Fritzsche [:gfritzsche]

Reporter

Description

•

9 years ago

Release management wants to see ADI/uptake per channel in a low-latency dashboard.

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

User Story: (updated)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 1

•

9 years ago

Attached file Example data — Details

Example of required data (without the desired beta version information above).

Mike Trinkala [:trink]

Comment 2

•

9 years ago

We can do this analysis real time with a hyperloglog per partition if ~99% accuracy is acceptable and we can easily refresh the output once a minute.

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

Assignee: nobody → gfritzsche

Georg Fritzsche [:gfritzsche]

Reporter

Comment 3

•

9 years ago

Some questions that came up here and affect implementation: What accuracy is required for the data? (see comment 2) What ADI definition should we apply here exactly? Count of unique clients seen on a channel in the last 24h (i.e. rolling 24h window)? Or per calendar day? There are subtleties here for how to account for clients: (1) "we received Telemetry data from client now" vs. (2) "that data was generated three days ago" ... but i think we can safely go with (1) for this use-case.

Flags: needinfo?(sledru)

Sylvestre Ledru [:Sylvestre]

Comment 4

•

9 years ago

> What accuracy is required for the data? (see comment 2) I would be happy with a 90% accuracy :) > What ADI definition should we apply here exactly? What version of Firefox users have on their system? > Count of unique clients seen on a channel in the last 24h (i.e. rolling 24h > window)? This one > Or per calendar day? > > There are subtleties here for how to account for clients: > (1) "we received Telemetry data from client now" vs. > (2) "that data was generated three days ago" > ... but i think we can safely go with (1) for this use-case. OK, I trust you :)

Flags: needinfo?(sledru)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 5

•

9 years ago

(In reply to Mike Trinkala [:trink] from comment #2) > We can do this analysis real time with a hyperloglog per partition if ~99% > accuracy is acceptable and we can easily refresh the output once a minute. Trink, it sounds like this is the ideal way forward then? Do we have examples for that? Any other pointers?

Flags: needinfo?(mtrinkala)

Mike Trinkala [:trink]

Comment 6

•

9 years ago

This can be stripped down (since you don't need the graphs, daily, weekly, and monthly rollups) and it use the HyperLogLog for ADI https://github.com/mozilla-services/data-pipeline/blob/b17a11805ae3666f5938a62d815204fc81c595f9/heka/sandbox/filters/firefox_active_instances.lua

Flags: needinfo?(mtrinkala)

Sylvestre Ledru [:Sylvestre]

Updated

•

9 years ago

Blocks: 1146863

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

9 years ago

This looks very useful for bug 1246675 as well, which needs real-time ADI.

Blocks: 1246675

Georg Fritzsche [:gfritzsche]

Reporter

Comment 8

•

9 years ago

I am actively working on this, but got stuck with the data-pipeline project not building locally on OS X. This part is now sorted: https://github.com/mozilla-services/data-pipeline/pull/187 ... so i can finally move on to prototyping this locally.

Georg Fritzsche [:gfritzsche]

Reporter

Comment 9

•

9 years ago

(In reply to Ben Hearsum (:bhearsum) from comment #7) > This looks very useful for bug 1246675 as well, which needs real-time ADI. What maximum latency does this require? 1h, 5min, 1min, ...?

Flags: needinfo?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Comment 10

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #9) > (In reply to Ben Hearsum (:bhearsum) from comment #7) > > This looks very useful for bug 1246675 as well, which needs real-time ADI. > > What maximum latency does this require? 1h, 5min, 1min, ...? 15min, if possible (obviously the lower the better though). This is based on a current rough uptake rate of ~800,000 installs/hour on the release channel (~200,000/15min), which gives us about a 1% margin of error when trying to hit 20,000,000 installs.

Flags: needinfo?(bhearsum)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 11

•

9 years ago

(In reply to Ben Hearsum (:bhearsum) from comment #10) > (In reply to Georg Fritzsche [:gfritzsche] from comment #9) > > (In reply to Ben Hearsum (:bhearsum) from comment #7) > > > This looks very useful for bug 1246675 as well, which needs real-time ADI. > > > > What maximum latency does this require? 1h, 5min, 1min, ...? > > 15min, if possible (obviously the lower the better though). > > This is based on a current rough uptake rate of ~800,000 installs/hour on > the release channel (~200,000/15min), which gives us about a 1% margin of > error when trying to hit 20,000,000 installs. Ok, for release-throttling you are hit by another factor: After a fresh install or update, we currently don't send out a ping immediately. Bug 1120370 & bug 1120372 are about sending out pings immediately in these cases. Until we have those, we have an additional error margin from the reporting latency (for which we could run an analysis job to find the average/95th percentile/...).

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

Summary: Implement low-latency "ADI per channel" dashboard → Generate low-latency "ADI per channel" data

bhearsum@mozilla.com (:bhearsum)

Comment 12

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #11) > (In reply to Ben Hearsum (:bhearsum) from comment #10) > > (In reply to Georg Fritzsche [:gfritzsche] from comment #9) > > > (In reply to Ben Hearsum (:bhearsum) from comment #7) > > > > This looks very useful for bug 1246675 as well, which needs real-time ADI. > > > > > > What maximum latency does this require? 1h, 5min, 1min, ...? > > > > 15min, if possible (obviously the lower the better though). > > > > This is based on a current rough uptake rate of ~800,000 installs/hour on > > the release channel (~200,000/15min), which gives us about a 1% margin of > > error when trying to hit 20,000,000 installs. > > Ok, for release-throttling you are hit by another factor: > After a fresh install or update, we currently don't send out a ping > immediately. > Bug 1120370 & bug 1120372 are about sending out pings immediately in these > cases. > Until we have those, we have an additional error margin from the reporting > latency (for which we could run an analysis job to find the average/95th > percentile/...). Hm, I'm surprised to hear this. When I was doing some initial poking I was under the impression that Telemetry's UPDATE_STATE_CODE_COMPLETE_STARTUP value is sent by default from all users on the release channel - and the dashboard's seem to show that. As I understand it, that value wouldn't cover new installs, but it would be sent after users restart after applying an update. I could be wrong about that, though.

Benjamin Smedberg

Comment 13

•

9 years ago

That histogram is recorded immediately, but it's typically not going to be sent to Mozilla in the telemetry ping until either: * the next local midnight * the user shuts down and restarts the browser There is inherently a pretty large latency associated with this, so it's not something that can drive realtime dashboards. Which is why bug 1120370 and bug 1120372 exist, so that we can do this in realtime. FWIW, most this data already exists in the daily rollups at https://analysis-output.telemetry.mozilla.org/stability-rollups/2016/20160216-active-daily.csv.gz except I removed the buildid facet because it made the dataset larger and wasn't necessary for my purposes. Building a periscope version of this would be pretty straightforward.

bhearsum@mozilla.com (:bhearsum)

Comment 14

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > That histogram is recorded immediately, but it's typically not going to be > sent to Mozilla in the telemetry ping until either: > > * the next local midnight > * the user shuts down and restarts the browser > > There is inherently a pretty large latency associated with this, so it's not > something that can drive realtime dashboards. Which is why bug 1120370 and > bug 1120372 exist, so that we can do this in realtime. Would UPDATE_STATE_CODE_COMPLETE_STAGE be more appropriate for this (which I assume is sent immediately after staging the update), or should we just wait for the new pings?

Georg Fritzsche [:gfritzsche]

Reporter

Comment 15

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > FWIW, most this data already exists in the daily rollups at > https://analysis-output.telemetry.mozilla.org/stability-rollups/2016/ > 20160216-active-daily.csv.gz except I removed the buildid facet because it > made the dataset larger and wasn't necessary for my purposes. Building a > periscope version of this would be pretty straightforward. But this is a rollup that is only generated daily, right? Not a rolling window and suitable for low-latency needs?

Benjamin Smedberg

Comment 16

•

9 years ago

Daily, correct. There is no data that would let us drive a low-latency dashboard like this.

Georg Fritzsche [:gfritzsche]

Reporter

Comment 17

•

9 years ago

(In reply to Ben Hearsum (:bhearsum) from comment #14) > (In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > > That histogram is recorded immediately, but it's typically not going to be > > sent to Mozilla in the telemetry ping until either: > > > > * the next local midnight > > * the user shuts down and restarts the browser > > > > There is inherently a pretty large latency associated with this, so it's not > > something that can drive realtime dashboards. Which is why bug 1120370 and > > bug 1120372 exist, so that we can do this in realtime. > > Would UPDATE_STATE_CODE_COMPLETE_STAGE be more appropriate for this (which I > assume is sent immediately after staging the update), or should we just wait > for the new pings? This doesn't tell us which version a client updated to and we don't receive that without lag either. We should do the new ping types, lets talk timelines by mail or on the bugs about those pings.

Georg Fritzsche [:gfritzsche]

Reporter

Comment 18

•

9 years ago

PR for the pipeline additions: https://github.com/mozilla-services/data-pipeline/pull/190

bhearsum@mozilla.com (:bhearsum)

Comment 19

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #18) > PR for the pipeline additions: > https://github.com/mozilla-services/data-pipeline/pull/190 I'm a bit confused comment #16 says that we don't have data that would make this possible...but this PR seems to say otherwise. What am I missing?

Georg Fritzsche [:gfritzsche]

Reporter

Comment 20

•

9 years ago

(In reply to Ben Hearsum (:bhearsum) from comment #19) > (In reply to Georg Fritzsche [:gfritzsche] from comment #18) > > PR for the pipeline additions: > > https://github.com/mozilla-services/data-pipeline/pull/190 > > I'm a bit confused comment #16 says that we don't have data that would make > this possible...but this PR seems to say otherwise. What am I missing? We do have data, but it is not sent timely enough from the client to cover your needs. The work here is a first step to your requirements too. We would also need the mentioned bug 1120370 and bug 1120372 to happen client-side, then we can make the code from this bug take those pings into account too to get lower latency.

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

Depends on: 1250897

Georg Fritzsche [:gfritzsche]

Reporter

Comment 21

•

9 years ago

As it turns out, this does not provide the versions in the form "45.0b5" etc., so beta builds can't be told apart. We don't have this information in the Telemetry data yet. Other use-cases do a buildid to version number lookup, but i don't think this works well with Heka filters: we would have to regularly update that data from some location with some mechanism. Per bug 1250897 Benjamin doesn't mind adding that data to the Telemetry data and it's relatively easy to do. This seems the preferred path forward here, as this is much easier to handle in the filter. I will have to push a follow-up to the PR once we have the client design for this.

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

Priority: P1 → P2

bhearsum@mozilla.com (:bhearsum)

Comment 22

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #21) > As it turns out, this does not provide the versions in the form "45.0b5" > etc., so beta builds can't be told apart. > We don't have this information in the Telemetry data yet. > Other use-cases do a buildid to version number lookup, but i don't think > this works well with Heka filters: we would have to regularly update that > data from some location with some mechanism. > > Per bug 1250897 Benjamin doesn't mind adding that data to the Telemetry data > and it's relatively easy to do. > This seems the preferred path forward here, as this is much easier to handle > in the filter. > > I will have to push a follow-up to the PR once we have the client design for > this. Any update here, Georg? For my use case in bug 1246675, I don't need "beta" channel data.

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 23

•

9 years ago

Sorry for the missing updates here. This first lagged from waiting on bug 1250897. More recently there were concerns off-bug about addressing other (related) use-cases from one single setup; currently i am waiting on an update on that before we can pick this up again.

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

9 years ago

Assignee: gfritzsche → nobody

bhearsum@mozilla.com (:bhearsum)

Comment 24

•

9 years ago

> In reply to Georg Fritzsche [:gfritzsche] from comment #23) > Sorry for the missing updates here. This first lagged from waiting on bug > 1250897. > More recently there were concerns off-bug about addressing other (related) > use-cases from one single setup; currently i am waiting on an update on that > before we can pick this up again. Does the unassigning mean this has been deprioritized, or is that just a symptom of waiting on the aforementioned update?

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 25

•

9 years ago

Usually we only assign ourselves to bugs we are actively working on. I'm waiting for an update, so i'm updating the bug state to match that.

Flags: needinfo?(gfritzsche)

Nick Thomas [:nthomas] (UTC+12)

Comment 26

•

8 years ago

Any news on that update ?

bhearsum@mozilla.com (:bhearsum)

Comment 27

•

8 years ago

(In reply to Nick Thomas [:nthomas] from comment #26) > Any news on that update ?

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 28

•

8 years ago

Katie, did we schedule this?

Flags: needinfo?(gfritzsche) → needinfo?(kparlante)

Priority: P2 → P3

Katie Parlante

Comment 29

•

8 years ago

FWIW, we're putting this back on the front-burner for Q4, though I can't guarantee that it will be completed this quarter.

Flags: needinfo?(kparlante)

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Whiteboard: [measurement:client]

bhearsum@mozilla.com (:bhearsum)

Comment 30

•

8 years ago

(In reply to Katie Parlante from comment #29) > FWIW, we're putting this back on the front-burner for Q4, though I can't > guarantee that it will be completed this quarter. Thanks for the update Katie, it helps with planning!

Mauro Doglio [:mdoglio]

Comment 31

•

8 years ago

Is this plan still valid? Should we set bug 1120370 and bug 1120372 as blockers?

Thomas Huelbert

Updated

•

7 years ago

Component: Metrics: Pipeline → General

Product: Cloud Services → Data Platform and Tools

Mark Reid [:mreid]

Comment 32

•

7 years ago

I think this is no longer relevant, and will be tackled as part of the Mission Control project.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.