Open Bug 1278729 Opened 9 years ago Updated 3 years ago

Provide an alternative to crashing Nightly users for rare errors we want to know about

Categories

(Core :: General, enhancement)

enhancement

Tracking

()

People

(Reporter: jwatt, Unassigned)

Details

(Keywords: feature)

We currently have things like MOZ_DIAGNOSTIC_ASSERT and gfxDevCrash that will crash our Nightly, Beta and potentially Aurora users even in situations where we will recover in release builds. If we're trying to increase our Nightly users this isn't a great way to get notifications about exceptional states that shouldn't occur. I guess the idea here is that we leverage the crash reporter infrastructure, and that it's much more likely we'll get reports for Nightly crashes than it is that users will take the time to file a bug report for broken behavior (assuming they even notice it). Also having a stack trace makes it much easier to identify root causes. Maybe we can leverage the crash reporter infrastructure without actually crashing though. When these situations occur in our official builds could we have MOZ_DIAGNOSTIC_ASSERT et. al. trigger a dialog window similar to the crash reporter one that looks something like: Critical bug detected |Report this bug| |Ignore| |x| always automatically report critical bugs I'm thinking we could take and send off a "crash" report even though we allow the browser to recover, and hopefully annoy our users less.
Summary: Provide an alternative to crashing nightly users for rare errors we want to know about → Provide an alternative to crashing Nightly users for rare errors we want to know about
Is this something we could send as a telemetry report instead? Inventing a whole new reporting process is a lot of work.
Would we need a whole new reporting process? I thought we could just make MOZ_DIAGNOSTIC_ASSERT and gfxDevCrash call whatever code we have to send off a crash report, with some modifications to that code no doubt. I expected the UI part to be the more involved piece. BTW according to the comment that documents gfxDevCrash, it sends a telemetry report in beta and release builds, then continues. For some reason we decided we wanted to send crash reports in other builds though. Maybe because telemetry is opt-in, but we want to maximize our chance of getting reports on these issues?
Summary: Provide an alternative to crashing Nightly users for rare errors we want to know about → Provide an alternative to crashing Nightly/Aurora users for rare errors we want to know about
Crash reports contain a lot more information than telemetry; I can see the stack, graphics devices and drivers, all sort of other information. From that point of view, it's useful to get the crash, rather than telemetry. On the other hand, we don't want to crash, if we can help it, just because we want this data. So, gfxDevCrash is a compromise. In most of those cases, we're hoping the user doesn't crash further down the line, but could get into a situation that's as bad as crashing, just more prolonged and painful existence. If we can get what crash reports get, including the stack, a utility like this would be very useful. I believe this is what bug 1225851 is about, but I'll let Jonathan, who started this bug, decide if they are dupes.
Flags: needinfo?(jwatt)
Keywords: feature
If that's sufficient for us to stop making Nightly/Aurora crash for MOZ_DIAGNOSTIC_ASSERT and gfxDevCrash, then it could be a duplicate. The one thing that would make me doubt it is that Telemetry is opt-in. I know that this is an old page: https://wiki.mozilla.org/Program_Management/Programs/Telemetry/Improve_Opt-in but the opening sentence "It is thought that the current Telemetry opt-in rates do not provide enough data to be useful" isn't encouraging.
Flags: needinfo?(jwatt)
I would want to keep gfxDevCrash doing what it's doing, but introduce another thing that does what this bug suggests, and replace gfxDevCrash in places where that's the right thing to do. Some telemetry is opt-out, some is opt-in, though nightly is either all opt-out, or always on. Not sure. Anyway, one way to deal with "opted out" scenario is to have this new function crash on nightly/aurora if they've opted out of telemetry.
Note that crash reports are opt-in for everything but full browser crashes, and those are opt-out but since we present the crash reporter dialog to the user it's pretty easy to opt-out. If we're going to gather the type of information we gather in crash reports I'm not sure we'd be able to make this opt-out anyway due to the privacy issues.
One option we could consider is saving these "pseudo crashes" around, and either at shutdown, or next restart, whenever that happens, asking them if they want to send us all this extra information that would help us with quality. Or actually doing it on the next actual crash - have them opt into sending us all the "not related to this crash, but nasty stuff that happened before that we'd like to know about". We'd still need the ability to collect all this information, this would just change the "when do we phone it home" part.
Since there was activity here, bug 1225851 was implemented. Jonathan, has that helped? Or is more to be done here?
Severity: normal → enhancement
Flags: needinfo?(jwatt)
We still use MOZ_DIAGNOSTIC_ASSERT and gfxDevCrash lots which can crash our Nightly users. This bug is about not doing that, so I'd say there's still work to do. (Perhaps popping up a "Firefox detected an invalid state that could cause Firefox to crash. Send crash report?" dialog, or one of the other suggestions above.)
Flags: needinfo?(jwatt)
Summary: Provide an alternative to crashing Nightly/Aurora users for rare errors we want to know about → Provide an alternative to crashing Nightly users for rare errors we want to know about
One conversation that happened recently was to have a way to crash, but only once per version of Firefox. A MOZ_LIMITED_CRASH() or some such. When a new Firefox is installed, only the very first call to MOZ_LIMITED_CRASH() will do the crash, as it does today, crash report and all. At that point, it would become a no-op (or perhaps telemetry) until a new version of Firefox is installed (or a fresh profile is used.) A variation on the theme would be "one crash per particular call to MOZ_LIMITED_CRASH" or "once per certain amount of time", but either way, it wouldn't crash every time it is hit, the way it is today. We would still get crash data, perhaps telemetry to tell us about the "decided not to crash" misses and take it from there. These crashes would be unique enough for us to adjust their importance when looking at the overall number of crash reports.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.