Crash Reports On-Demand
Categories
(Toolkit :: Crash Reporting, enhancement)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox119 | --- | affected |
People
(Reporter: gcp, Assigned: gerard-majax)
References
(Blocks 2 open bugs)
Details
Attachments
(13 files, 3 obsolete files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
209.79 KB,
image/png
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
310.31 KB,
image/png
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
Signature list based UX popup for non-visible processes
From bholley:
The final step here is to build a channel back to the client enabling us to specify (via RemoteSettings) specific signatures for which we do or don’t want full crash reports. This would then enable two key things:
We could substantially reduce the frequency with which we currently prompt users to submit crash reports.
We could start narrowly prompting users in situations where we currently don’t due to UX concerns, i.e. utility process crashes.
The key observation here is that a stack (or even just a MOZ_CRASH_REASON) is often sufficient to diagnose a crash, and even when the extra information in a full report is useful, you usually don’t need very many reports. So if we can decouple our capabilities for aggregate statistics and discovery from the ones we use for advanced diagnostics, we can largely eliminate the UX tension around the latter.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Assignee | ||
Comment 1•1 year ago
|
||
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 2•1 year ago
|
||
| Assignee | ||
Comment 3•1 year ago
|
||
| Assignee | ||
Comment 4•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 5•1 year ago
|
||
| Assignee | ||
Comment 6•1 year ago
|
||
| Assignee | ||
Comment 7•1 year ago
|
||
| Assignee | ||
Comment 8•1 year ago
|
||
| Assignee | ||
Comment 9•1 year ago
|
||
| Assignee | ||
Comment 10•1 year ago
|
||
| Assignee | ||
Comment 11•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•11 months ago
|
Updated•11 months ago
|
Updated•11 months ago
|
Updated•11 months ago
|
| Assignee | ||
Comment 12•10 months ago
|
||
Updated•10 months ago
|
| Assignee | ||
Comment 13•9 months ago
|
||
Marking leave-open since we'll land desktop and android separately
Comment 14•9 months ago
|
||
Comment 15•9 months ago
|
||
Backed out for causing xpcshell and bc failures
| Assignee | ||
Comment 16•9 months ago
|
||
Oh maybe we don't cleanup correctly and this breaks others
| Assignee | ||
Comment 17•9 months ago
|
||
| Assignee | ||
Comment 18•9 months ago
|
||
| Assignee | ||
Comment 19•9 months ago
|
||
Ok so we had a bug around pendingIDs in CrashSubmit, potentially missing correct async-ness and the android failure was just that those tests were skipped on android, and when I added mine I forgot to properly skip the failing test
| Assignee | ||
Comment 20•9 months ago
|
||
Comment 21•9 months ago
|
||
| Assignee | ||
Updated•9 months ago
|
Comment 23•9 months ago
|
||
Comment 24•9 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/3e2dd273e950
https://hg.mozilla.org/mozilla-central/rev/8503113625c6
https://hg.mozilla.org/mozilla-central/rev/70b252be4e4a
https://hg.mozilla.org/mozilla-central/rev/d8248c5e6705
https://hg.mozilla.org/mozilla-central/rev/fc69e3b55559
https://hg.mozilla.org/mozilla-central/rev/e277ae28b877
Comment 25•6 months ago
|
||
When rewriting https://github.com/mozilla/remote-settings-ondemand-crashes to use data from bigquery, I saw that we're only partitioning top-crashers by process type and release channel. Should we also partition on os?
Updated•6 months ago
|
Comment 26•6 months ago
•
|
||
(In reply to Alex Franchuk [:afranchuk] from comment #25)
When rewriting https://github.com/mozilla/remote-settings-ondemand-crashes to use data from bigquery, I saw that we're only partitioning top-crashers by process type and release channel. Should we also partition on os?
When I wrote this comment I forgot that I already wrote my query in such a way that, while we do initially partition on just process type and release channel for selecting top crashers, before applying per-configuration limits we also group by platform (where platform is os/osversion/architecture). So, our top crashers are per process type and release channel, but then the reports we request are partitioned in such a way that we guarantee diverse platforms (if there are any). That being said, I wonder:
- whether that might be too fine-grained: we currently limit to selecting 100 crash hashes per configuration, which multiplied by the number of unique configurations (os, os version, architecture, process type, release channel) may be a lot of selected hashes for crashes which are not platform-specific (I just checked, in the past week there are 1085 distinct configurations), and
- whether we want to get top crashers per-os as well, so that some lower-volume OSes don't have their top crashers pushed down (I'm thinking probably we do).
(2) is a straightforward change. I think we still want to incorporate some platform-diversification when selecting hashes, but maybe to address (1) we should have another limit on the overall hashes selected per top crasher (and we can just shuffle the selection to improve diversity). It's a subtle balance to strike: while we want to select enough hashes that the presumably small percentage of people who follow-up and send crash reports will represent enough diversity to serve our needs, we also don't want to be so liberal in our selection that we end up selecting a huge number of hashes. I suppose one implicit feature we have here is that every day a separate set of hashes is selected for the top crashers, so in that way we could select fewer hashes and rely on daily reselection to potentially poke more people into submitting.
| Reporter | ||
Comment 27•6 months ago
|
||
we currently limit to selecting 100 crash hashes per configuration, which multiplied by the number of unique configurations (os, os version, architecture, process type, release channel) may be a lot of selected hashes for crashes which are not platform-specific
In the end, we only need 1 report, but multiple reports from different configurations can help shine light on what's wrong (e.g. more visibility into program state because of different optimizations). Maybe 100 per configurations is more than we need, I suspect we'll get more than 1% engagement.
whether we want to get top crashers per-os as well, so that some lower-volume OSes don't have their top crashers pushed down (I'm thinking probably we do).
Yes, although if we lose them it wouldn't be the end of the world either - in the end the user volume affected would not be as large.
| Assignee | ||
Comment 28•6 months ago
|
||
Comment 29•6 months ago
|
||
https://github.com/mozilla/remote-settings-ondemand-crashes/pull/7 incorporates the aforementioned changes (and will also close bug 1937869).
| Assignee | ||
Comment 30•6 months ago
|
||
Alex, can you share a feedback from a native speaker point of view on the current wording? With https://phabricator.services.mozilla.com/D251017 we could improve it if required.
| Reporter | ||
Comment 31•6 months ago
|
||
Did we test the current server<>client setup that would be used on Nightly? I'm slightly wary of accidentally spamming all Nightly users if we have a bug on either side. If not, perhaps we should do that first before enabling for all.
If you enable crashPull on a client, we should see the expected amount of crashes being updated, and see, or not see, a banner, etc.
Comment 32•6 months ago
|
||
I added a comment fixing plurality to https://phabricator.services.mozilla.com/D225780, and owlish had a similar comment for desktop which wasn't incorporated before merged.
Besides those grammatical fixes, the only other thing I'd suggest is changing the wording from
... unsent crash report that matches crashes being investigated ...
to
... unsent crash report related to crashes being investigated ...
I think that reads more naturally, personally.
| Assignee | ||
Comment 33•6 months ago
|
||
(In reply to Alex Franchuk [:afranchuk] from comment #32)
I added a comment fixing plurality to https://phabricator.services.mozilla.com/D225780, and owlish had a similar comment for desktop which wasn't incorporated before merged.
There was so much bogus "NOT DONE" at some point, it looks like i missed this commetn
Besides those grammatical fixes, the only other thing I'd suggest is changing the wording from
... unsent crash report that matches crashes being investigated ...
to
... unsent crash report related to crashes being investigated ...
I think that reads more naturally, personally.
Comment 34•6 months ago
|
||
(In reply to Alex Franchuk [:afranchuk] from comment #26)
(2) is a straightforward change. I think we still want to incorporate some platform-diversification when selecting hashes, but maybe to address (1) we should have another limit on the overall hashes selected per top crasher (and we can just shuffle the selection to improve diversity). It's a subtle balance to strike: while we want to select enough hashes that the presumably small percentage of people who follow-up and send crash reports will represent enough diversity to serve our needs, we also don't want to be so liberal in our selection that we end up selecting a huge number of hashes. I suppose one implicit feature we have here is that every day a separate set of hashes is selected for the top crashers, so in that way we could select fewer hashes and rely on daily reselection to potentially poke more people into submitting.
Chiming in late on this. This sounds like a good approach to me but I don't think we should worry about casting a net that's too wide. The type of crashes we'll be trying to get are rarely very-high volume to start with, so I don't think we'll ever find ourselves fetching too many crashes in one go, even if we try for any possible platform combination.
Comment 35•6 months ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #34)
Chiming in late on this. This sounds like a good approach to me but I don't think we should worry about casting a net that's too wide. The type of crashes we'll be trying to get are rarely very-high volume to start with, so I don't think we'll ever find ourselves fetching too many crashes in one go, even if we try for any possible platform combination.
I agree, but just in case I'd rather start out more conservative and grow the selection limits once we get a good idea of the response rate. I don't want to have tech articles talking about how users are annoyed by constant prompts to send crash reports (which would look bad in a number of ways). Not to mention it might make a user more inclined to click "never send these" instead of "always send these". Of course, such an incident could be rapidly fixed through Remote Settings anyway, so maybe it's not a big concern. We should probably keep in mind that the Nightly/Beta population may have a different response rate than Release, too.
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
| Assignee | ||
Comment 36•5 months ago
|
||
Updated•5 months ago
|
Comment 37•5 months ago
|
||
Comment 38•5 months ago
|
||
Comment 39•5 months ago
|
||
| bugherder | ||
| Assignee | ||
Comment 41•1 month ago
|
||
It's OK to be enabled now on Nightly
Comment 42•1 month ago
|
||
Comment 43•1 month ago
|
||
| bugherder | ||
| Assignee | ||
Updated•1 month ago
|
| Assignee | ||
Comment 44•1 month ago
|
||
Everything has landed. Remainder of enabling on all channels is bug 1950866
Comment 45•1 month ago
|
||
I've created a simple query for checking the crash submission rate for background processes: https://sql.telemetry.mozilla.org/queries/111595.
However, it isn't particularly useful for Nightly because we do get a decent number of reports there. If you look at Beta/Release, those have hardly any. I would be very interested to see what happens to those graphs when we enable this on all channels.
Description
•