Closed
Bug 1017269
Opened 11 years ago
Closed 9 years ago
upload latency is at least 2 days
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mmc, Unassigned)
Details
Attachments
(1 file)
20.27 KB,
application/force-download
|
Details |
I notice that after watching the telemetry dashboard for pinning for about 2 weeks that submissions by build date take several days (4-5) to stabilize even for Nightly users. According to bug 630880 it seems that Nightly users should upload every 2 hours. Is this really working as intended, or do Nightly users really take that long to update on average?
Reporter | ||
Comment 1•11 years ago
|
||
Comment 2•11 years ago
|
||
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #0)
> I notice that after watching the telemetry dashboard for pinning for about 2
> weeks that submissions by build date take several days (4-5) to stabilize
> even for Nightly users. According to bug 630880 it seems that Nightly users
> should upload every 2 hours. Is this really working as intended, or do
> Nightly users really take that long to update on average?
Yes they take that long to stabilize. Telemetry only does rollups within a 24hour window, so that can add a maximum of a day of lag.
John's team might be able to provide you with an adoption curve based on fhr data.
The only thing we can do in telemetry is tweak time to first signal.
Comment 3•11 years ago
|
||
Saptarshi has a nice summary report on Nightly daily build adoption. TL;DR is that it takes longer than you might expect. I've cc'd him here.
Comment 4•11 years ago
|
||
Indeed, it takes time for users to update. For all builds in March, 2014 the median days to update was 3 days. The percentiles are below
Percentile Days to Update
1: 0.00 0.0000
2: 0.05 0.7500
3: 0.10 1.0000
4: 0.15 1.0400
5: 0.20 1.2500
6: 0.25 1.5000
7: 0.30 1.7300
8: 0.35 2.0000
9: 0.40 2.1900
10: 0.45 2.5000
11: 0.50 3.0000
12: 0.55 3.5000
13: 0.60 4.0000
14: 0.65 5.0000
15: 0.70 6.0000
16: 0.75 7.8125
17: 0.80 10.0000
18: 0.85 14.0000
19: 0.90 18.5000
20: 0.95 27.0000
Attached is a PDF of growth rates of 30 builds in March. Each line
corresponds to the growth rate (fitted) of a build. The red line is
the mean. The graph plots the proportion of total profiles ever on
that build vs number of days since release.
As you can see from the red line, 50% growth is reached in 3 days and
80% in ~ 10 days.
This is how long it takes for users to update to a build. Not every
user updates to every build. After that you have to wait for users to
use their browser. Telemetry sends once a day at most.
hope it helps
Saptarshi
Reporter | ||
Comment 5•11 years ago
|
||
Thanks, Saptarshi! That's really interesting/horrifying :) Where did this data come from? I'd like to plot it against my dashboard so we understand what percent of users are contributing by build date so far. I have access to peach.
Reporter | ||
Comment 6•11 years ago
|
||
So I guess there are 3 components to latency:
- Update time
- Telemetry rollup time, which I think is set 2 hours for Nightly users if bug 630880 is still in effect
- Telemetry aggregator time, which seems to be about a day.
The aggregator delay means that even for users who update immediately, or for submissions by calendar date instead of build date, dashboards are consistently 2 days behind. Taras, can this be improved?
Flags: needinfo?(taras.mozilla)
Comment 7•11 years ago
|
||
Monica,
1) Update time
2) Telemetry rollup time, which I think is set 2 hours for Nightly users if bug 630880 is still in effect
3) Telemetry aggregator time, which seems to be about a day.
1) is outside of my control.
2) i have no idea what this has to do with telemetry
3)
a)We were waiting for a usecase to make our dashboards closer to realtime. I think a 30-60min time from submission to json aggregation is reasonable and achievable. To do this across the board we need to wait for a production version of our upcoming task scheduling system(http://docs.taskcluster.net/) so we can port telemetry to it. This is atleast 2 months out.
b) if rollup delay is critical for you we can specialcase your path through the code and give you the 30-60min latency. This is hard to do in the general case, but it's easy to put in specific hacks.
We can also tweak telemetry clientside to not wait for idle-daily if builddate is within 1 day of current date. This will give you a bigger(but biased) early signal.
Before we commit to doing anything here, you'd have to describe a solid usecase to justify switching gears on this.
Flags: needinfo?(taras.mozilla)
Reporter | ||
Comment 8•11 years ago
|
||
The usecase is using telemetry to respond to outages. A 2-day delay basically means that by the time the telemetry dashboard knows about mistakes, users will have already escalated through bugzilla. By then the only thing dashboards can do is verify that there was a problem.
Comment 9•11 years ago
|
||
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8)
> The usecase is using telemetry to respond to outages.
Do you mean telemetry outages in general? Or using telemetry to infer outages in other services?
For the former, we already have monitoring in place for submission rates, etc.
Reporter | ||
Comment 10•11 years ago
|
||
I mean using telemetry to monitor for outages in other services.
Summary: upload latency seems long → upload latency is at least 2 days
Comment 11•11 years ago
|
||
@Saptarshi: Would you be willing to redo that growth data in comment 4 for nightly and/or aurora builds from around May 25 to Jun 25?
Specifically we're looking to see if there was an impact from bug 1003159 landing on June 6 on nightly, and a week later on aurora. Theoretically that should have increased the uptake.
Flags: needinfo?(sguha)
Comment 12•9 years ago
|
||
Nothing happened in this bug in a while.
We have now much improved latency for Telemetry with "main" pings (with immediate uploading except for shutdown etc.).
Bug 1120370 & bug 1120372 will improve Telemetry latency after updates & new installs.
Lets take other future latency improvements to a new bug driven by the current needs.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(sguha)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•