Closed Bug 1475242 Opened 7 years ago Closed 10 months ago

Prototype hardware clusters with perf metrics

Categories

(Data Platform and Tools :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: frank, Assigned: frank)

References

Details

:Gijs asked about using the metrics from the Hardware Report to build an algorithm to decide: "Is this user on a slow performing machine?" (or some variation of that question). This could be used by Firefox to optimize experience real-time. We brainstormed a bit and decided that a good way to start would be to first find the hardware clusters. The hardware report is 1-dimensional, but we need to understand what the hardware metrics are as a group. This bug is to prototype those clusters using e.g. K-nearest neighbors. The output would be a notebook with the following: - Cleaning of hardware data from longitudinal or clients_daily - Clustering users into hardware groups - Reporting info on perf metrics for those groups Some possibilities for hardware metrics: - CPU Speed - CPU Cores - Logical Cores/Hyperthreading - RAM - Cache Size - GPU - OS Type The last isn't a hardware metric but can impact performance.
Gijs, which perf metrics are you most interested in, and which hardware metrics do you see as most important?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Frank Bertsch [:frank] from comment #1) > Gijs, which perf metrics are you most interested in, and which hardware > metrics do you see as most important? I think the hardware metrics listed in comment #0 are a great start. Not sure if cache size (I assume you mean CPU L1/2/3 cache? Or were you thinking of something else?) will end up being significant, but we might as well try it. For the perf tests, these histograms seem like a good start - GC_MS - SIMPLE_MEASURES_FIRSTPAINT - SIMPLE_MEASURES_SESSIONRESTORED - FX_NEW_WINDOW_MS And we should probably have one covering event loop delays / jank. Olli, which of INPUT_EVENT_RESPONSE_MS or EVENTLOOP_UI_ACTIVITY_EXP_MS (or similar) would you think would be most representative of general performance / jank on a given machine?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(bugs)
What would GC_MS tell to us? Pages which user uses tend to affect to that a lot, and how many tabs are open etc. And GC_MS doesn't tell anything about jank. I don't see EVENTLOOP_UI_ACTIVITY_EXP_MS being used. INPUT_EVENT_RESPONSE_MS and LOAD_INPUT_EVENT_RESPONSE_MS might be good ones (assuming I even vaguely understand what this bug is about ;) ).
Flags: needinfo?(bugs)
(In reply to :Gijs (he/him) from comment #2) > (In reply to Frank Bertsch [:frank] from comment #1) > > Gijs, which perf metrics are you most interested in, and which hardware > > metrics do you see as most important? > > I think the hardware metrics listed in comment #0 are a great start. Not > sure if cache size (I assume you mean CPU L1/2/3 cache? Or were you thinking > of something else?) will end up being significant, but we might as well try > it. CPU Cache sizes will be important to me and the GC. GC does a lot of pointer following, cache and memory performance is probably the dominent factor. > For the perf tests, these histograms seem like a good start > > - GC_MS > - SIMPLE_MEASURES_FIRSTPAINT > - SIMPLE_MEASURES_SESSIONRESTORED > - FX_NEW_WINDOW_MS > > And we should probably have one covering event loop delays / jank. Olli, > which of INPUT_EVENT_RESPONSE_MS or EVENTLOOP_UI_ACTIVITY_EXP_MS (or > similar) would you think would be most representative of general performance > / jank on a given machine? I've been thinking of adding a new telemetry probe for GC mark rate. (the number of objects the GC's mark phase can mark per second) Now might be the right time to do that. Unlike GC_MS mark rate shouldn't be affected by the size of the GC heap. CC'ing other GC people.
(In reply to Paul Bone [:pbone] from comment #4) > (In reply to :Gijs (he/him) from comment #2) > > (In reply to Frank Bertsch [:frank] from comment #1) > > > Gijs, which perf metrics are you most interested in, and which hardware > > > metrics do you see as most important? > > > > I think the hardware metrics listed in comment #0 are a great start. Not > > sure if cache size (I assume you mean CPU L1/2/3 cache? Or were you thinking > > of something else?) will end up being significant, but we might as well try > > it. > > CPU Cache sizes will be important to me and the GC. GC does a lot of > pointer following, cache and memory performance is probably the dominent > factor. Sure. I wasn't trying to say it wasn't a useful metric in general, I just expect it to trend roughly with the other characteristics. Whereas I can imagine 2 machines with the same CPU core count and clock speed, but vastly different GPU / amount of RAM, the CPU cache size for a given CPU model is fixed, and as a result it seems less likely to actually matter for the clustering, even if small variations for a given clock speed + core count combination might exist because of similar-but-not-quite-the-same CPU models. (In reply to Paul Bone [:pbone] from comment #4) > I've been thinking of adding a new telemetry probe for GC mark rate. (the > number of objects the GC's mark phase can mark per second) Now might be the > right time to do that. Unlike GC_MS mark rate shouldn't be affected by the > size of the GC heap. > > CC'ing other GC people. This seems valuable in and of itself, but I think we could potentially add it later? If we need to wait for the telemetry to be created and then data to come in, it'll take at least another 2 weeks (substantially longer for data from the release population). Ideally I'd like to iterate quickly, so I'd prefer not to wait. Based on what Frank has said I think it would be reasonably straightforward to integrate additional performance metrics later. It does sound from your and Olli's comments like GC_MS wouldn't be that helpful. Are there other core/platform/JS metrics that we could/should already use to assess machine performance?
Flags: needinfo?(pbone)
I'll note here that these probes need to be present on release (check using the probe dictionary, GC_MS is not [0]). Extending these to release is probably a good idea if they are useful but out of scope for this bug. [0] https://telemetry.mozilla.org/probe-dictionary/?search=GC_MS&detailView=histogram%2FGC_MS
Unfortunately INPUT_EVENT_RESPONSE_MS and LOAD_INPUT_EVENT_RESPONSE_MS are both prerelease. If these are useful perf metrics we should have them on release. We can proceed without a full list of perf metrics, and add more later.
Quick Update: Things are moving along nicely and most of the data is available, but GPU differences will be hard to compare, so they may be left out of the prototype. The reason is because we have a large number of different models we see, and we would be better off looking at the specific hardware information for each (clock, FLOPS, shading units(?)) and use that to determine quality. The only metric we have available for GPUs is RAM [0]. [0] https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/docs/telemetry/data/environment.html
Assignee: nobody → fbertsch
Priority: -- → P1
(In reply to :Gijs (he/him) from comment #5) > (In reply to Paul Bone [:pbone] from comment #4) > > (In reply to :Gijs (he/him) from comment #2) > > > (In reply to Frank Bertsch [:frank] from comment #1) > > > > Gijs, which perf metrics are you most interested in, and which hardware > > > > metrics do you see as most important? > > > > > > I think the hardware metrics listed in comment #0 are a great start. Not > > > sure if cache size (I assume you mean CPU L1/2/3 cache? Or were you thinking > > > of something else?) will end up being significant, but we might as well try > > > it. > > > > CPU Cache sizes will be important to me and the GC. GC does a lot of > > pointer following, cache and memory performance is probably the dominent > > factor. > > Sure. I wasn't trying to say it wasn't a useful metric in general, I just > expect it to trend roughly with the other characteristics. Whereas I can > imagine 2 machines with the same CPU core count and clock speed, but vastly > different GPU / amount of RAM, the CPU cache size for a given CPU model is > fixed, and as a result it seems less likely to actually matter for the > clustering, even if small variations for a given clock speed + core count > combination might exist because of similar-but-not-quite-the-same CPU models. Right, sorry I misunderstood. > (In reply to Paul Bone [:pbone] from comment #4) > > I've been thinking of adding a new telemetry probe for GC mark rate. (the > > number of objects the GC's mark phase can mark per second) Now might be the > > right time to do that. Unlike GC_MS mark rate shouldn't be affected by the > > size of the GC heap. > > > > CC'ing other GC people. > > This seems valuable in and of itself, but I think we could potentially add > it later? If we need to wait for the telemetry to be created and then data > to come in, it'll take at least another 2 weeks (substantially longer for > data from the release population). Ideally I'd like to iterate quickly, so > I'd prefer not to wait. Based on what Frank has said I think it would be > reasonably straightforward to integrate additional performance metrics later. > > It does sound from your and Olli's comments like GC_MS wouldn't be that > helpful. Are there other core/platform/JS metrics that we could/should > already use to assess machine performance? I agree, don't wait for the new probe. It's still not perfect but GC_MINOR_US is not affected by incremental GC and is less likely to be affected by heap size, If you can divide it by GC_NURSERY_BYTES to get time/size. It's still not great because this is the total nursery size, and time will be proportional to the working set which we'd be assuming correlates with the total size. Hopefully it's good enough in the short term. I've filed Bug 1475896 but am not making it a dependency of this prototype bug.
Flags: needinfo?(pbone)
(In reply to Frank Bertsch [:frank] from comment #8) > Quick Update: Things are moving along nicely and most of the data is > available, but GPU differences will be hard to compare, so they may be left > out of the prototype. The reason is because we have a large number of > different models we see, and we would be better off looking at the specific > hardware information for each (clock, FLOPS, shading units(?)) and use that > to determine quality. The only metric we have available for GPUs is RAM [0]. > > [0] > https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/docs/ > telemetry/data/environment.html This makes sense, thanks for clarifying. We may want to look into better metrics here. How is the prototype going otherwise - do you have something you can share, and/or is there anything else you need from me or someone else that I can help chase?
Flags: needinfo?(fbertsch)
Right now bug 1468351 is taking higher priority - with the added urgency of this prototype using longitudinal, so I'm adding that as a blocker. Re: GPU metrics, we could consider compiling a database ourselves by scraping manufacturer websites or wikipedia. It seems like that hardware information would be extremely relevant to some measures of performance, but the relative importance is outside my wheelhouse.
Depends on: 1468351
Flags: needinfo?(fbertsch)
Blocks: 1480167
Priority: P1 → P3

Is this bug helpful to the work that will be commencing around operational metrics?

Flags: needinfo?(rmiller)
Flags: needinfo?(esmyth)

I think this is still interesting and could be generally useful, but no it wouldn't help any of the near term work on operational metrics.

Flags: needinfo?(esmyth)

Yep, ditto what :esmyth said.

Flags: needinfo?(rmiller)

Hello,

The Mozilla Data Engineering organization is currently going through our extensive backlog, consisting of hundreds of issues stretching back for nearly 10 years. We've done a pass through all of the open bugzilla bugs and have identified and tagged the ones that we think are relevant enough to still need attention. The rest, including the bug with which this comment is associated, we are closing as "WONTFIX" in a single bulk operation.

If you feel we have closed this (or any) issue in error, please feel free to take the following actions:

  • Reopen the bug.
  • Edit the bug to add the string [dataplatform] (including the brackets) to the Whiteboard field. (Note that you must edit the Whiteboard, not the similarly named QA Whiteboard.)

Doing this will ensure that we see the bug in our weekly triage process, where we will decide how to proceed.

Thank you.

Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.