I notice that after watching the telemetry dashboard for pinning for about 2 weeks that submissions by build date take several days (4-5) to stabilize even for Nightly users. According to bug 630880 it seems that Nightly users should upload every 2 hours. Is this really working as intended, or do Nightly users really take that long to update on average?
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #0) > I notice that after watching the telemetry dashboard for pinning for about 2 > weeks that submissions by build date take several days (4-5) to stabilize > even for Nightly users. According to bug 630880 it seems that Nightly users > should upload every 2 hours. Is this really working as intended, or do > Nightly users really take that long to update on average? Yes they take that long to stabilize. Telemetry only does rollups within a 24hour window, so that can add a maximum of a day of lag. John's team might be able to provide you with an adoption curve based on fhr data. The only thing we can do in telemetry is tweak time to first signal.
Saptarshi has a nice summary report on Nightly daily build adoption. TL;DR is that it takes longer than you might expect. I've cc'd him here.
Created attachment 8431296 [details] growth.pdf Indeed, it takes time for users to update. For all builds in March, 2014 the median days to update was 3 days. The percentiles are below Percentile Days to Update 1: 0.00 0.0000 2: 0.05 0.7500 3: 0.10 1.0000 4: 0.15 1.0400 5: 0.20 1.2500 6: 0.25 1.5000 7: 0.30 1.7300 8: 0.35 2.0000 9: 0.40 2.1900 10: 0.45 2.5000 11: 0.50 3.0000 12: 0.55 3.5000 13: 0.60 4.0000 14: 0.65 5.0000 15: 0.70 6.0000 16: 0.75 7.8125 17: 0.80 10.0000 18: 0.85 14.0000 19: 0.90 18.5000 20: 0.95 27.0000 Attached is a PDF of growth rates of 30 builds in March. Each line corresponds to the growth rate (fitted) of a build. The red line is the mean. The graph plots the proportion of total profiles ever on that build vs number of days since release. As you can see from the red line, 50% growth is reached in 3 days and 80% in ~ 10 days. This is how long it takes for users to update to a build. Not every user updates to every build. After that you have to wait for users to use their browser. Telemetry sends once a day at most. hope it helps Saptarshi
Thanks, Saptarshi! That's really interesting/horrifying :) Where did this data come from? I'd like to plot it against my dashboard so we understand what percent of users are contributing by build date so far. I have access to peach.
So I guess there are 3 components to latency: - Update time - Telemetry rollup time, which I think is set 2 hours for Nightly users if bug 630880 is still in effect - Telemetry aggregator time, which seems to be about a day. The aggregator delay means that even for users who update immediately, or for submissions by calendar date instead of build date, dashboards are consistently 2 days behind. Taras, can this be improved?
Monica, 1) Update time 2) Telemetry rollup time, which I think is set 2 hours for Nightly users if bug 630880 is still in effect 3) Telemetry aggregator time, which seems to be about a day. 1) is outside of my control. 2) i have no idea what this has to do with telemetry 3) a)We were waiting for a usecase to make our dashboards closer to realtime. I think a 30-60min time from submission to json aggregation is reasonable and achievable. To do this across the board we need to wait for a production version of our upcoming task scheduling system(http://docs.taskcluster.net/) so we can port telemetry to it. This is atleast 2 months out. b) if rollup delay is critical for you we can specialcase your path through the code and give you the 30-60min latency. This is hard to do in the general case, but it's easy to put in specific hacks. We can also tweak telemetry clientside to not wait for idle-daily if builddate is within 1 day of current date. This will give you a bigger(but biased) early signal. Before we commit to doing anything here, you'd have to describe a solid usecase to justify switching gears on this.
The usecase is using telemetry to respond to outages. A 2-day delay basically means that by the time the telemetry dashboard knows about mistakes, users will have already escalated through bugzilla. By then the only thing dashboards can do is verify that there was a problem.
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8) > The usecase is using telemetry to respond to outages. Do you mean telemetry outages in general? Or using telemetry to infer outages in other services? For the former, we already have monitoring in place for submission rates, etc.
I mean using telemetry to monitor for outages in other services.
@Saptarshi: Would you be willing to redo that growth data in comment 4 for nightly and/or aurora builds from around May 25 to Jun 25? Specifically we're looking to see if there was an impact from bug 1003159 landing on June 6 on nightly, and a week later on aurora. Theoretically that should have increased the uptake.
Nothing happened in this bug in a while. We have now much improved latency for Telemetry with "main" pings (with immediate uploading except for shutdown etc.). Bug 1120370 & bug 1120372 will improve Telemetry latency after updates & new installs. Lets take other future latency improvements to a new bug driven by the current needs.