Closed Bug 687117 Opened 9 years ago Closed 8 years ago

Telemetry front-end should be faster

Categories

(Mozilla Metrics :: Data/Backend Reports, defect, critical)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: justin.lebar+bug, Assigned: pedro.alves)

Details

The telemetry front-end takes up to 30s to respond to some queries.  It should be much faster than this.

I know the system has to aggregate lots of data to produce a report, but since the aggregated data isn't particularly large, the app should be able to cache aggressively.

My recommendation would be that, in order of importance,

  - We cache all histograms with the default filters
  - We cache each histogram viewed by a user, with the filters used
  - When a user views a histogram using filterset X, we compute all histograms with filterset and cache them.

I think one day is a reasonable cache expiration time.
We do have cache in place, and even pre-computation is possible, but can you tell me which measures take that long?
The front-end isn't loading right now, so I can't time anything currently.  Last night, loading histograms I hadn't looked at before took the longest.  But even loading a histogram I'd looked at before took 5s or so, which seems much too slow for the fast-path.
I'll check the cache settings, the server was unedr heavy load but should be fixed by now, and cache should react oa subsecond base. I'll tweak the ttll too
Should I file a separate bug for precomputation, or would you like to track that here?

Cached results are showing *much* faster now.
Assignee: nobody → pedro.alves
fyi from bug 689142:
> 
> Taras, do we have a bug about crazy slowness of the telemetry web interface?
> It's just unusable, in 20 minutes I've been able to see 2 histograms, then
> everything hung and I have been unable to see anything other even waiting 10
> minutes. Maybe it is that bad just for remoties?
Group: metrics-private
Will look, there's been some issues with the cluster, shouldn't definetly take that long, a few seconds top
There's something really wrong here. I'm on it
Severity: normal → blocker
Ok - after 2 days of fighting with this I traced the issue into ES. Anurag - can you help me here? Can you (restart|upgrade) the cluster?


Xavier, do we have the new cluster in place?


-pedro
Assignee: pedro.alves → aphadke
We fixed some issues with ES cluster and now the dashboard is loading faster.

Nevertheless, we will continue working on other aspects that can improve the performance of the dashboard.
Severity: blocker → major
The dashboard isn't loading at all for me at the moment.

Is it possible for you to set up an automated process which periodically pings the server?  It seems like I can't access the dashboard every other time I try.
After about 5 minutes, the page timed out and gave me a 500 error.
Damn spoke too early then. Its back up, will keep an eye on it.
Severity: major → critical
@pedro: do you need anything from me out here?
No, back to me, thanks
Assignee: aphadke → pedro.alves
Still slow today, took more than 5 minutes to reach a single telemetry histogram.
Working 24/7 on this. Building aggregations to make it faster. The reason for this (besides the usual slowness, which I'm dealing with) is a physical issue with the machine
I'll close this one down - it's sitting here for a while and v2 should have fixed most of the issues
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.