Open Bug 1143958 Opened 9 years ago Updated 2 years ago

Improve the shutdownhang detector

Categories

(Core :: General, defect)

defect

Tracking

()

People

(Reporter: away, Unassigned)

References

(Blocks 2 open bugs)

Details

I'm not happy with the crashes coming from the shutdown watchdog. The crashes are annoying to users and generally unactionable to developers. We haven't found any great issues, only goosechases like bug 1131330, and I think the investigations have been a waste of engineering effort.

According to bug 1038342 comment 0, the motivation was that Firefox sometimes deadlocks and prevents a restart.
* Does the profile unlocker work from bug 286355 help at all?

Other possibilities:
* Could we increase the timeout?
* Could we kill the hung process without going through the crash reporter? This may or may not be bug 1056131. I have no preference on exit(0) versus a special MOZ_CRASH that tells Breakpad to get out of the way. Either way I think this is my favorite option.

Any thoughts or suggestions? (Or if the answer is that we should implement bug 1056131, feel free to resolve this as duplicate)
Flags: needinfo?(vdjeric)
Flags: needinfo?(dteller)
Flags: needinfo?(benjamin)
(In reply to David Major [:dmajor] (UTC+13) from comment #0)
> I'm not happy with the crashes coming from the shutdown watchdog. The
> crashes are annoying to users and generally unactionable to developers. We
> haven't found any great issues, only goosechases like bug 1131330, and I
> think the investigations have been a waste of engineering effort.

I tend to believe that the solution to annoying crashes is changing the crash experience (see https://groups.google.com/forum/#!topicsearchin/mozilla.dev.platform/crash$20experience/mozilla.dev.platform/JE-E17Sp9rE ) rather than hiding the issue. Shutdown hangs, while not quite as bad as runtime hangs, still prevent the users from shutting down their computer, use memory, drain battery. More generally, they represent bad bugs in our platform, which may in cases cause data loss (although these are more covered by AsyncShutdown than the Shutdown Terminator).

Vlad, do you know if the profile unlocker is sufficient to handle all cases of not being able to restart Firefox?

> 
> According to bug 1038342 comment 0, the motivation was that Firefox
> sometimes deadlocks and prevents a restart.
>
> * Does the profile unlocker work from bug 286355 help at all?
> 
> Other possibilities:
> * Could we increase the timeout?

I believe that, if anything, this makes things worse.

The current timeout is 63 seconds (adjusted to not count time spend sleeping) per shutdown phases, which is already quite a lot. Incidentally, telemetry indicates that successful shutdowns don't take more than 2 seconds per shutdown phase.

> * Could we kill the hung process without going through the crash reporter?
> This may or may not be bug 1056131. I have no preference on exit(0) versus a
> special MOZ_CRASH that tells Breakpad to get out of the way. Either way I
> think this is my favorite option.

I'd prefer the more generic approach (see the dev-platform thread linked above), but this would work. If this works under Windows (pending experiments – bsmedberg has suspicions that this might not always be the case), the `_exit(0)` approach could be done quickly. The problem is that by doing `_exit(0)`, we lose any hope of actually fixing the underlying bugs.

> Any thoughts or suggestions? (Or if the answer is that we should implement
> bug 1056131, feel free to resolve this as duplicate)
Flags: needinfo?(dteller)
The profile unlocker is only effective on Windows >XP.

Crashing is definitely superior to just hanging forever, even with the profile unlocker.

I believe we should really focus our efforts on fixing the shutdown hangs. We have rough counts of the major causes, and we should fix those.

I believe that even though the current crash UI isn't great, it's better than the alternative of getting no data, and so I don't think we should make any changes at the present time.
Flags: needinfo?(benjamin)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #1)
> Vlad, do you know if the profile unlocker is sufficient to handle all cases
> of not being able to restart Firefox?

I know there are a few cases where the Profile Unlocker can't kill a hung Firefox, on WinXP as Benjamin said, and when the hung process was launched with -no-remote. Aaron might be able to name additional circumstances.

The Profile Unlocker and Terminator aren't quite the same thing though. As you pointed out, a hung process will waste resources until it's killed.

dmajor: it looks the stack reported in bug 1131330 is actually useful, no? It identifies the cache service as the cause of the hang.
Flags: needinfo?(vdjeric) → needinfo?(aklotz)
> dmajor: it looks the stack reported in bug 1131330 is actually useful, no?
> It identifies the cache service as the cause of the hang.
In the discussion in that bug, it's not clear whether that's the actual root cause or merely a symptom of something else.
The thing I don't like is that there is no clear action item when a user has such a crash. What guidance should sumo be giving out? "Keep reporting, but otherwise ignore it, and be prepared for this to continue for a long time because these bugs are hard to fix?" I don't want this to turn into cargo-cult remedies.
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #3)
> (In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment
> #1)
> > Vlad, do you know if the profile unlocker is sufficient to handle all cases
> > of not being able to restart Firefox?
> 
> I know there are a few cases where the Profile Unlocker can't kill a hung
> Firefox, on WinXP as Benjamin said, and when the hung process was launched
> with -no-remote. Aaron might be able to name additional circumstances.
> 
> The Profile Unlocker and Terminator aren't quite the same thing though. As
> you pointed out, a hung process will waste resources until it's killed.
> 
> dmajor: it looks the stack reported in bug 1131330 is actually useful, no?
> It identifies the cache service as the cause of the hang.

Yeah, profile unlocker has a lot of holes that a hung Firefox can slip through. Bug 1112710 is another one.
Flags: needinfo?(aklotz)
See Also: → 1149287
Depends on: 286355
See Also: → 286355
Blocks: 1451925
Blocks: 1233757
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.