Closed Bug 1730004 Opened 3 years ago Closed 2 years ago

Be able to limit Rust/WebRender asserts to different release channels

Categories

(Core :: Graphics: WebRender, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
97 Branch
Tracking Status
firefox97 --- fixed

People

(Reporter: lsalzman, Assigned: emilio)

References

Details

Attachments

(1 file, 1 obsolete file)

It would be nice if we could have greater control over what release channels we are asserting on in WebRender. On the C++ side we have plenty of flexibility there with, but in Rust the only toggle we have is debug or opt.

I would like to be able to assert everywhere except release in some cases, and currently there is no nice way to do this inside WebRender itself. Likewise, it would be nice to have nightly-only asserts as well.

Flags: needinfo?(emilio)

And use it to replace the nightly assert in properties.mako.rs.

Assignee: nobody → emilio
Status: NEW → ASSIGNED

(In reply to Lee Salzman [:lsalzman] from comment #0)

I would like to be able to assert everywhere except release in some cases, and currently there is no nice way to do this inside WebRender itself. Likewise, it would be nice to have nightly-only asserts as well.

Everywhere-except release is not something you can easily do in C++ either (diagnostic assert will not assert in late beta, and we want that since we want late beta to be as similar to release as possible).

But I posted a patch for diagnostic / nightly asserts, which hopefully work for you? :)

Flags: needinfo?(emilio)

I would like to be able to be able to collect assertions for beta. I just want to weed out the release population. There are ways to do it on the C++ side.... At least being able to catch early beta would be acceptable I think.

(In reply to Lee Salzman [:lsalzman] from comment #3)

There are ways to do it on the C++ side....

What are they? But yeah the diagnostic_* macros I added should catch dev edition / early beta.

Flags: needinfo?(lsalzman)

Those should suffice.

Flags: needinfo?(lsalzman)

I think this makes a bit more sense (Lee was asking me about why these
asserts wouldn't be enabled on early-beta, and I think they should...).

But let me know if you strongly disagree for some reason.

I could make it "debug or is_early_beta_or_earlier or MOZ_DEV_EDITION"
too, perhaps, let me know if you'd rather do that... I don't feel
strongly about diagnostic asserts in late dev-edition builds.

(In reply to Mike Hommey [:glandium] from comment #7)

Bobby, cf. https://phabricator.services.mozilla.com/D125566#4083322

I see value in all the assertion tiers we have:

  • NS_ASSERTION — Useful for invariants we'd generally like to hold, but don't (yet) hold on all tests. The fact that they turn tests orange (but can be annotated to get a green run) allows us to ratchet up compliance.
  • MOZ_ASSERT — The default (non-critical invariants).
  • MOZ_RELEASE_ASSERT — Useful for critical (particularly sec-sensitive) invariants, where we think the user is better served by crashing than continuing.
  • MOZ_DIAGNOSTIC_ASSERT — Useful for non-critical invariants which hold in CI but which we suspect might fail in the wild. These would be good to know about, but also aren't worth risking product stability to enforce.

I think the use-case of diagnostic assertions is important. They were helpful to me when working on media stability, and there appear to be more than a thousand uses in the tree now, so I presume others have found them valuable as well.

The current implementation (MOZ_RELEASE_ASSERT on nightly-or-early-beta, MOZ_ASSERT on late-beta-or-release) is a bit of a kludge but works ok in practice. If we implemented diagnostic reports, we could get the same benefit without compromising stability on prerelease channels (though we'd need some operational capacity to monitor them).

When we had four channels, I just implemented them by splitting down the middle (nightly and aurora had diagnostic asserts enabled, beta and release did not). When we got rid of aurora, there was some disagreement about how this should work in a three-channel world, and the eventual solution was to just divide beta into two subgroups (early beta and late beta). That seems to have been working fine from my perspective.

Flags: needinfo?(bholley)

Yeah, I think part of the issue is that that's not the current implementation of DIAGNOSTIC_ASSERT. The current implementation is dev-edition+nightly IIUC. But if Nightly+early-beta is acceptable I think that's slightly better.

I guess Mike's question was also about the nightly_assert! macro I'm adding (which is equivalent to #ifdef NIGHTLY_BUILD + RELEASE/DIAGNOSTIC assert in C+)+. I think that's also useful fwiw.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #9)

Yeah, I think part of the issue is that that's not the current implementation of DIAGNOSTIC_ASSERT. The current implementation is dev-edition+nightly IIUC. But if Nightly+early-beta is acceptable I think that's slightly better.

I guess Mike's question was also about the nightly_assert! macro I'm adding (which is equivalent to #ifdef NIGHTLY_BUILD + RELEASE/DIAGNOSTIC assert in C+)+. I think that's also useful fwiw.

Oh I see. So seems like there are three possibilities:
(1) Nightly + DevEdition (what diagnostic asserts do today)
(2) Nightly + Early Beta (wider audience)
(3) Nightly-only (narrower audience)

(Looks like at present, DevEdition is about twice the size of Nightly, and Beta is about 10x the size of DevEdition.)

In general, I'm a bit skeptical that we have that many cases where we really need the full beta population to surface a crash. I could see a few specific reasons — e.g. graphics-driver-related things on older hardware — but I think it's probably something we'd want to reach for explicitly.

(1) and (3) seem similar enough in scale that is it's probably worth having just one or the other. Is there a good reason why we'd want nightly asserts instead of diagnostic asserts?

301 to Lee for that.

Flags: needinfo?(lsalzman)

I would still prefer diagnostic work on nightly+early beta. We can often have asserts that don't reliably trigger on a population as small as nightly-only, but start to show up in beta or release. And I would like to be able to have some of those asserts trigger, without having to have the assertion actually causing crashes on release, so nightly+early beta seems like the most useful diagnostic mode to me.

Flags: needinfo?(lsalzman)

I spent some time perusing the set of diagnostic crashes over the last few nightly cycles (e.g. [1]), and I think I'm persuaded that it would be valuable to run all diagnostic asserts on early beta — largely because there are enough instances that crash only a handful of times that it seems reasonably likely that we'd unearth new rare crashes by expanding the population. The downside is that there are also some low-volume diagnostic asserts (that crash, say, 50 times per cycle) which we sometimes let burn for a while, but which could become noticeable stability issues on beta. However, there's a straightforward fix for that: we can just land a patch to convert them to NIGHTLY_ASSERT, which would revert their characteristics to the status quo.

So I think my proposal here would be:
(1) Switch the fatal configuration for MOZ_DIAGNOSTIC_ASSERT from nightly_or_dev_edition to nightly_or_early_beta
(2) Introduce a MOZ_NIGHTLY_ASSERT, which is fatal only on nightly.
(3) As needed, convert any high-volume diagnostic asserts to nightly asserts (or debug-only asserts).

Ryan, Gabriele - Does this seem reasonable from a relman and stability perspective?

[1] https://crash-stats.mozilla.org/search/?moz_crash_reason=~MOZ_DIAGNOSTIC_ASSERT&version=92.0a1&date=%3E%3D2021-04-05T00%3A31%3A00.000Z&date=%3C2021-10-05T00%3A31%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Flags: needinfo?(ryanvm)
Flags: needinfo?(gsvelto)

One question would be that of the performance impact. (specifically, performance regression tracking with early beta and later beta having different performance profiles because of the difference in asserts)

(In reply to Mike Hommey [:glandium] from comment #14)

One question would be that of the performance impact. (specifically, performance regression tracking with early beta and later beta having different performance profiles because of the difference in asserts)

I'm not worried too about it. I think Beta performance matters the least of the three channels. Nightly matters because that's what we profile and where we measure progress, and Release matters because that's what the public judges us on. So I think any minor overhead that's acceptable for Nightly is also probably acceptable for early beta.

Opening diagnostic asserts up to early beta seems reasonable to me. I worry about adding another MOZ_NIGHTLY_ASSERT for a couple reasons:

  1. Fragmentation - how will developers know which is the right one to use?
  2. Non-uniformity of different populations - what conditions are we interested in catching a very non-representative Nightly test audience (compared to release) that we care less about everywhere else?

Ideally if we have a high-volume diagnostic assert being hit in the wild, we'll either prioritize fixing whatever invariant is being broken on such a regular basis or determine that it's not as invariant as we thought and deal with it accordingly.

Flags: needinfo?(ryanvm)

(In reply to Mike Hommey [:glandium] from comment #14)

One question would be that of the performance impact. (specifically, performance regression tracking with early beta and later beta having different performance profiles because of the difference in asserts)

FWIW, this is already an issue we deal with every cycle. Our perf sheriffs end up filing a bug for every post-b6 early_beta flip we do to track all the perf test changes that result from it.

I'm a bit worried about the triage aspect of it. Nightly crash triage is currently understaffed and if we plan on potentially increasing the crash-iness of beta we might also want to have eyes on it before it happens. Apart from that I think it's a good idea. Nightly population is sometimes too small to catch certain assertions tripping and we had stuff that slipped all the way to release before we noticed it.

Flags: needinfo?(gsvelto)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)

Ideally if we have a high-volume diagnostic assert being hit in the wild, we'll either prioritize fixing whatever invariant is being broken on such a regular basis or determine that it's not as invariant as we thought and deal with it accordingly.

I think there is a middle ground — specifically when we're not hyper-concerned with the current failure volume, but would like to monitor it going forward. This can be the case when we're expecting the failures to go away in the future ("we know this code probably has bugs right now, but would like to know if the failures persist when we're done refactoring it"), or when we want to ensure the failure volume doesn't go up ("a handful of failures per nightly cycle is not a major concern, but a spike to 1000 failures would indicate a new failure mode that we should investigate").

(In reply to Gabriele Svelto [:gsvelto] from comment #18)

I'm a bit worried about the triage aspect of it. Nightly crash triage is currently understaffed and if we plan on potentially increasing the crash-iness of beta we might also want to have eyes on it before it happens.

The idea is that the crash triage lead for beta could quickly land a patch (or ask the sheriffs to land a patch) to downgrade any high-volume beta crashes from MOZ_DIAGNOSTIC_ASSERT to MOZ_NIGHTLY_ASSERT. This wouldn't require any investigation, and would thus be a roughly-O(1) operation at most once-per-cycle, and thus hopefully not strain resources.

So is going with either the first or both patches as they are ok? Or do you want me to change them somehow? If so, how? Thanks!

Flags: needinfo?(mh+mozilla)

Is this bug going to move at all? It would be nice if we could start being able to use these asserts...

Flags: needinfo?(mh+mozilla)

Mike, review re-ping, or anything that you want changed?

Flags: needinfo?(mh+mozilla)
Depends on: 1748969
Attachment #9240525 - Attachment description: Bug 1730004 - Add a crate with diagnostic / nightly asserts to xpcom/rust. r=nika → Bug 1730004 - Add a crate with diagnostic / nightly asserts.
Attachment #9241152 - Attachment is obsolete: true
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/c2e100fe6bb8
Add a crate with diagnostic / nightly asserts. r=nika,emilio
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch
Flags: needinfo?(mh+mozilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: