Closed Bug 1781095 Opened 3 years ago Closed 3 years ago

Increase in `wr.scenebuild_time` Invalid Overflow errors coinciding in Fenix 102 with Fenix 103.1.0

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Android
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox103 --- affected
firefox104 --- unaffected
firefox105 --- unaffected

People

(Reporter: travis_, Unassigned)

Details

Chris Peterson noticed an increase in wr.screenbuild_time Invalid Overflow errors that line up with Fenix 103.1.0 release.

However, the wr.scenebuild_time errors only seem to affect Fenix 102.1.1 (and not 102.2.* or 103):

https://mozilla.cloud.looker.com/explore/fenix/metrics?qid=E9pjzIqW3xlw3jWpDYTTAE&origin_space=746&toggle=dat,fil,vis

Summary: Increase in `wr.screenbuild_time` Invalid Overflow errors coinciding with Fenix 103.1.0 → Increase in `wr.scenebuild_time` Invalid Overflow errors coinciding with Fenix 103.1.0

Jamie Nicol on the Graphics team says he doesn't see any Android graphics changes in 102's changelog, so this is presumably a Glean issue.

The wr.scenebuild_time metric's type is a Timing Distribution (microsecond, range 1μs <= x <= ~6.94 days). So wr.scenebuild_time is exceeding ~6.94 days?

Summary: Increase in `wr.scenebuild_time` Invalid Overflow errors coinciding with Fenix 103.1.0 → Increase in `wr.scenebuild_time` Invalid Overflow errors coinciding in Fenix 102 with Fenix 103.1.0

Some background info on Fenix's wr.scenebuild_time

  • Has been recorded in Fenix since Fx71 (bug 1584109) via GeckoView Streaming Telemetry
  • Since the Telemetry histogram is prerelease-only, release-channel data wasn't available even in Glean
  • "Migrated" to call Glean directly in Fx102 via bug 1767257 (expands collection to release channel)
  • After the migration it was still listed as using GeckoView Streaming Telemetry and may, in fact, be double-counting as a result (Gonna look into this more closely here bug 1781109)
    • Since the samples are individual, I don't expect this to be directly causing the invalid_overflows

If you look into the Fenix errors you notice that this huge increase is caused by only 2 or 3 clients (see the table or the "Fenix Errors Affected Clients").

These are weirdly misbehaving clients or actively sending wrong data. Not sure there is a point in spending too much time on it though.

Since these wr.scenebuild_time overflow errors are not a Glean bug, I'll move this bug to the "Core::Graphics: WebRender" component so a graphics engineer can determine whether the WebRender code this bug is a graphics regression in 103, some buggy hardware, and/or whether we need to handle the condition that causes these "seven-day-long" scene build times.

Blocks: gfx-triage
Component: Glean: SDK → Graphics: WebRender
OS: Unspecified → Android
Product: Data Platform and Tools → Core
No longer blocks: gfx-triage
Severity: -- → S4
Priority: -- → P3

Likely that this is non-harmful and not indicative of an issue.

(In reply to Kelsey Gilbert [:jgilbert] (previously Jeff) from comment #6)

Likely that this is non-harmful and not indicative of an issue.

Looks like the wr.buildscene_time telemetry has return to normal on July 27 last week:

https://mozilla.cloud.looker.com/explore/fenix/metrics?qid=QIyNkvSspyvebLFOMa18u9&origin_space=746&toggle=dat,fil,vis

As such, I'll close this bug. I don't think any further investigation is needed.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.