Closed Bug 731662 Opened 12 years ago Closed 12 years ago

Telemetry dashboard submission counts are incorrect

Categories

(Mozilla Metrics :: Frontend Reports, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: xstevens, Assigned: mroldan)

Details

(Whiteboard: [Telemetry])

Attachments

(1 file)

The telemetry histogram dashboard is currently displaying low numbers. In the GC_MS histogram for appBuildID=20120215222917 on Firefox 11 for Windows shows 18 for the submission count. This number should be greater than 107461.
Assignee: nobody → maria.roldan
The Telemetry dashboard is fairly critical to us for doing comparisons of performance across releases. What's the timeline of this investigation? For instance, we're currently investigating the possibility of a regression in FF11 that causes slow page load times. Should we file a bug to do a manual comparison or will this be fixed early next week? Thanks in advance.
We're on top of this, and definitely expect to be fixed early next week - bit if you tell us exactly what you need we can direct our efforts into those needs and try to get as many information as possible along the process
Alex,

For any analysis looking at data across releases, the evolution dashboard should be much better than this one that shows an "across all time" aggregate view of a single histogram.

The trend chart at the bottom was originally supposed to be a way to see the volume of submitted data over time.  When we changed the aggregation mechanism, this chart wasn't properly changed to sum the counts.


I'd like to understand how this bottom chart is to be used so we can change it properly.

The options I can see are to either change it to count the number of submissions per build number, regardless of submission date, or to change it to count the number of submissions per day, but in that case, it could potentially be confusing since it might be aggregating data across build numbers unless the user had filtered for one single build.
Group: metrics-private
"The options I can see are to either change it to count the number of submissions per build number, regardless of submission date" <- this is the correct option
I've changed the query that feeds the "Submissions by Platform Build ID" chart.
Please see if it looks better now.
I may be reading the dashboard incorrectly but for GC_MS I now see 
Sampled to 5000 points (21% of 24239).
This is much less than the 107461 that Xavier states should be visible in comment 0.

Maria - Can you provide information about the data process flow from when a Telemetry ping comes in to when it is visible on the Telemetry dashboard? How can we put automated safeguards in place to provide some assurance that the data being displayed on the dashboard is good/complete?
My previous comment about that was incorrect somewhat. It shouldn't be 107461. I'm working on correcting some of the internal counting right now.
xstevens - As I understood when we spoke yesterday, this issue should now be fixed. At this point we're just waiting on the jobs to run to pull in the submissions from the last 6 weeks, correct?
That's correct. I started the runs yesterday. I am waiting until all of them have finished before updating the front-end index, because I don't want two different sets of numbers in there.
Have the runs completed? Has the front-end index been updated?
Whiteboard: [Telemetry]
Runs have been completed going as far back as Feb 1.  Xavier asked Taras about whether we needed to go even further back.  Taras felt that would probably be good enough but asked Xavier to get a sign-off from you.

If we need to go further back, that can be done first thing next week.

The telemetryHistograms dashboard will already be up to date.  We need to regenerate the Evolution database before the telemetryEvolution dashboard will be correct.

Pedro, what is the soonest that can get done?
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #12)
> Runs have been completed going as far back as Feb 1.  Xavier asked Taras
> about whether we needed to go even further back.  Taras felt that would
> probably be good enough but asked Xavier to get a sign-off from you.

It would be useful when comparing the trends release to release to go back as far as possible. However, many probes only first landed in 2012 so I don't know that it will definitely be useful to go back further. How far back do we need to go to recover the rest of the missing data? Do we have any usable data from before Feb. 1? (i.e. Is there a gap on the dashboard or is there no data from before Feb. 1?)
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #12)
> The telemetryHistograms dashboard will already be up to date.  We need to
> regenerate the Evolution database before the telemetryEvolution dashboard
> will be correct.
> 
> Pedro, what is the soonest that can get done?

The Telemetry Evolution dashboard is up to date. The Evolution database has been reprocessed during the weekend.
Lawrence,

Just to clarify because there still seems to be some misunderstanding. We didn't lose any original data and all of our recorded data is "useable". We have all the data ever recorded in HBase since the beginning of telemetry.

That said, for our calculated data (the data you see in dashboards); when I was talking to Taras he said that going back to February 1 (6 weeks) was enough for his purposes. Unless someone else using this system needs the data to go back further we were talking about dropping the calculated data before February 1. Can you let me know if its okay to do this? If not, how far back would you want before we could start dropping some of the calculated data?
Xavier, poor choice of words on my part. I do understand that we have all of the raw data - i.e. nothing has been lost. (I have communicated this to others as well.) I was referring to the fact that the calculated data does not include all of the raw data.

I do not have a specific need to go back further in time as this point. I suggest that given that Firefox 10 released on Jan. 31, Feb. 1 is a fine date to use at this point. I would like to maintain the raw data (in case there is a plan to delete it) in case we do find a reason to query further back and want to pull in older data. In this case I expect that we would want to go back 6 more weeks to Firefox 9. I also expect that as we go further back in time it will take less time to run the jobs as there is less raw data available.
I spoke with Daniel, Xavier, and co this week. They are going to run jobs to pull in the data from before Feb. 1.
So we are back to January 1 now. Do we want to continue going back further or is this enough?
We want to keep going back. I expect that the data volume will continue to decrease as you move further back in time.
Telemetry evolution dashboard is up to date.
All new data between Jan 1st. and Jan 31th has been incorporated to the evolution database.
Post datacenter migration update: I am now running the backfill jobs for November 2011.
Going to close this up. Backfilled from 11/1/2011 to current.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Xavier, we were going to backfill right back to the inception of Telemetry. Is there any more data that we could backfill?
There is data back to October 13th-ish if I remember correctly. I can start a process to backfill all of that data later today.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: