Crash in RunWatchdog in the restart process of upgrading nightly with nsUpdateDriver.cpp in the stack

RESOLVED WORKSFORME

Status

()

Toolkit
Application Update
--
critical
RESOLVED WORKSFORME
2 years ago
5 months ago

People

(Reporter: Usul, Unassigned)

Tracking

({crash, nightly-community, regression})

50 Branch
Unspecified
Linux
crash, nightly-community, regression
Points:
---

Firefox Tracking Flags

(firefox50 affected, firefox51 affected)

Details

(Whiteboard: see comment 8 [fce-active-legacy], crash signature)

Attachments

(1 obsolete attachment)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-2bec1d5c-43ff-428d-b4aa-007ae2160721.
=============================================================
Frame 	Module 	Signature 	Source
0 	libxul.so 	RunWatchdog 	toolkit/components/terminator/nsTerminator.cpp:158
1 	libnspr4.so 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:216
Ø 2 	libpthread-2.23.so 	libpthread-2.23.so@0x75c9 	
Ø 3 	libc-2.23.so 	libc-2.23.so@0x102eac
https://crash-stats.mozilla.com/signature/?product=Firefox&_sort=-date&signature=RunWatchdog shows 23 crashes, all on Linux. I think this might be the Linux version of Bug 1272614 since RunWatchdog shows up there as well.

Updated

2 years ago
Component: General → Application Update
(In reply to Marcia Knous [:marcia - use ni] from comment #1)
> https://crash-stats.mozilla.com/signature/?product=Firefox&_sort=-
> date&signature=RunWatchdog shows 23 crashes, all on Linux. I think this
> might be the Linux version of Bug 1272614 since RunWatchdog shows up there
> as well.

I could only find two reports in this list that seemed to be related to updates. This doesn't mean that the others aren't, but I could not find any indication that they were. In [1], the user comment says that the hang occurred while applying an update. In [2], we seem to be clearly stuck in nsUpdateProcessor::WaitForProcess[3]. Matt, this seems to be in line with your theory in bug 1272614 comment 16.

[1] https://crash-stats.mozilla.com/report/index/e6115d49-1b50-41d3-bb9e-c8f612160718#allthreads
[2] https://crash-stats.mozilla.com/report/index/b93f5262-307e-4357-b1db-cab752160716#allthreads
[3] https://hg.mozilla.org/projects/ash/annotate/63cc31d6cc1c/toolkit/xre/nsUpdateDriver.cpp#l1000
Flags: needinfo?(mhowell)
Yeah, I don't know how to tell for sure from that crash dump that WaitForProcess is actually responsible for the timeout, but I think that's enough to say it's causing trouble for at least somebody. I'll start working on a patch to make that function nonblocking. Not sure how I'm going to do that, but hey, how hard can it be, right? Right?
Assignee: nobody → mhowell
Flags: needinfo?(mhowell)
Created attachment 8774911 [details]
Bug 1288321 - Avoid blocking where possible while waiting for the updater to stage

Review commit: https://reviewboard.mozilla.org/r/67288/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67288/
Comment on attachment 8774911 [details]
Bug 1288321 - Avoid blocking where possible while waiting for the updater to stage

I am not the most confident about this patch that I have ever been; triggering a try build before requesting review.
One of the things that may be causing this is the thread taking more than 2 minutes which I believe is what it is limited too in this instance. One of the things that might alleviate this is to perform staging updates without the ability to recover the staged directory (e.g. making backups of existing files, etc.).

Updated

2 years ago
Attachment #8774911 - Flags: review?(spohl.mozilla.bugs)
Crash volume for signature 'RunWatchdog':
 - nightly (version 50): 54 crashes from 2016-06-06.
 - aurora  (version 49): 0 crash from 2016-06-07.
 - beta    (version 48): 0 crash from 2016-06-06.
 - release (version 47): 0 crash from 2016-05-31.
 - esr     (version 45): 0 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly         24         11         11          0          0          0          0
 - aurora           0          0          0          0          0          0          0
 - beta             0          0          0          0          0          0          0
 - release          0          0          0          0          0          0          0
 - esr              0          0          0          0          0          0          0

Affected platform: Linux
status-firefox50: --- → affected
Comment on attachment 8774911 [details]
Bug 1288321 - Avoid blocking where possible while waiting for the updater to stage

Moving this patch to bug 1272614. This bug only has one report that obviously relates to app update, and that one is specifically about app update, so the patch is more relevant there (it addresses all platforms).

After some e-mail discussion between me and spohl, we think the signature this bug is about only exists because of some strangeness with Socorro; I filed bug 1290967 about that. There isn't really anything we can do on this bug specifically until we have some result there.
Attachment #8774911 - Attachment is obsolete: true
Attachment #8774911 - Flags: review?(spohl.mozilla.bugs)

Updated

2 years ago
Assignee: mhowell → nobody

Updated

2 years ago
Attachment #8774911 - Attachment is obsolete: false

Updated

2 years ago
Attachment #8774911 - Attachment is obsolete: true
I just updated Firefox 47.0.1 to Firefox 48 by checking for updates via the about dialog. After the download completed, I clicked the button to restart. After about a minute, the main window disappeared but the about dialog and a private window remained visible. It finally crashed:
https://crash-stats.mozilla.com/report/index/5342a435-bc81-4675-a5f3-8793a2160808#allthreads

Socorro seems to remove URLs, but here's the link to the pastebin with the sampled process in the hung state (not sure if that adds much to what's available via Socorro directly):
http://pastebin.com/VQ9DZ0wL
Crash volume for signature 'RunWatchdog':
 - nightly (version 51): 48 crashes from 2016-08-01.
 - aurora  (version 50): 28 crashes from 2016-08-01.
 - beta    (version 49): 0 crashes from 2016-08-02.
 - release (version 48): 0 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       9      12      12
 - aurora        4      11       4
 - beta          0       0       0
 - release       0       0       0
 - esr           0       0       0

Affected platform: Linux

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #41
 - aurora  #87
 - beta
 - release
 - esr
status-firefox51: --- → affected
Keywords: regression
Version: unspecified → 50 Branch
Duplicate of this bug: 1284641
Depends on: 1290967
Whiteboard: see comment 8

Updated

2 years ago
Whiteboard: see comment 8 → see comment 8 [fce-active]
Matt, it looks like there are still quite a few reports coming in with builds well after your patch in bug 1272614 landed. Do you think this would still be happening in nsUpdateProcessor after your patch landed?

Weird there was one crash report where nsUpdateDriver.cpp code was in the stack. :(
Flags: needinfo?(mhowell)
The signature this bug is tracking is just the general shutdownhang watchdog timeout case, so it includes everything that causes shutdown hangs, but Socorro isn't recognizing these as such; normally it would rewrite the signature to start with "shutdownhang" but that isn't happening with these. I filed bug 1290967 when I realized that was happening, and it's just been marked fixed this morning, so I would expect this signature to start disappearing very soon.

Is there a recent report that includes nsUpdateDriver? I don't know a good way to search a bunch of these things for contents of call stacks.
Flags: needinfo?(mhowell) → needinfo?(robert.strong.bugs)
I no longer see RunWatchdog now that bug 1290967 and bug 1272614 was fixed so resolving wfm.

Crashes in app update should include nsUpdateDriver.cpp and after going through many crash reports I have only found 1 on Linux out of 44 signature shutdownhang in the last week on 51.0a1, 50.a2, and 49b and 1 on Mac. I'll file bugs for those after I go through more crash reports.
https://crash-stats.mozilla.com/report/index/e6b3a1e9-53e8-4109-bbaa-1bd2c2160908#allthreads
https://crash-stats.mozilla.com/report/index/6005970a-c1f1-4b5d-83d8-4e5f02160907#allthreads
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(robert.strong.bugs)
Resolution: --- → WORKSFORME
Also filed bug 1301572 which should at the very least fix the Mac crash.
Summary: Crash in RunWatchdog in the restart process of upgrading nightly → Crash in RunWatchdog in the restart process of upgrading nightly with nsUpdateDriver.cpp in the stack

Updated

5 months ago
Whiteboard: see comment 8 [fce-active] → see comment 8 [fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.