Closed Bug 1272614 Opened 9 years ago Closed 8 years ago

Browser crashing while being restarted after an update with nsUpdateDriver.cpp in the stack

Categories

(Toolkit :: Application Update, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- fixed
firefox-esr45 --- affected
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: clement.lefevre, Assigned: molly)

References

Details

(Keywords: crash, nightly-community, Whiteboard: [mozfr-community])

Crash Data

Attachments

(2 files, 1 obsolete file)

This can especially be seen with nightlies due to frequent updates, but happens with stable releases too. After doing an update, Firefox wants to restart. Users allow it to do so, and then the browser being very slow and long to restart (my HDD being in a bad shape can explain this slowness, but that slowness shouldn't finally result in a crash), it finally crashes poping-up the window offering to do a crash report and to restart Firefox.
I could add more crash signatures, but those ones are already identical, and probably the other ones would be too.
Crash Signature: d19d3da8-6b91-491d-86b6-938962160512, 032206aa-3c0a-4b29-b274-fbde72160513
Keywords: crash
Whiteboard: [mozfr-community][nightly-community]
Crash Signature: d19d3da8-6b91-491d-86b6-938962160512, 032206aa-3c0a-4b29-b274-fbde72160513 → d19d3da8-6b91-491d-86b6-938962160512 032206aa-3c0a-4b29-b274-fbde72160513
Crash Signature: d19d3da8-6b91-491d-86b6-938962160512 032206aa-3c0a-4b29-b274-fbde72160513 → shutdownhang | libsystem_kernel.dylib@0x15716
Crash Signature: shutdownhang | libsystem_kernel.dylib@0x15716 → [@ shutdownhang | libsystem_kernel.dylib@0x15716]
Crashing Thread (21) Frame Module Signature Source 0 XUL mozilla::(anonymous namespace)::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp:158 1 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:216 2 libsystem_pthread.dylib _pthread_body 3 libsystem_pthread.dylib _pthread_start 4 libsystem_pthread.dylib thread_start 5 libnss3.dylib libnss3.dylib@0x2392af
This seems to be a crash affecting 10.9 only.
Hi Clément, I have tested your issue on latest FF release (46.0.1) and latest Nightly build and could not reproduce it. I have updated Firefox 46.0 to Firefox 46.0.1 and it didn't crash. Did the same thing with Nightly and I haven't encountered any issues. Nevertheless, Socorro reports shows multiple crashes with the same signature, although there may be different steps or testcases to reproduce this. Is this still reproducible on your end ? If yes, can you please retest this using latest FF release and latest Nightly build (https://nightly.mozilla.org/) and report back the results ? When doing this, please use a new clean Firefox profile, maybe even safe mode, to eliminate custom settings as a possible cause (https://goo.gl/PNe90E). Thanks, Paul.
Flags: needinfo?(clement.lefevre)
(In reply to Paul Pasca[:PoollyMcklayn] from comment #4) > Hi Clément, > > I have tested your issue on latest FF release (46.0.1) and latest Nightly > build and could not reproduce it. I have updated Firefox 46.0 to Firefox > 46.0.1 and it didn't crash. Did the same thing with Nightly and I haven't > encountered any issues. Nevertheless, Socorro reports shows multiple crashes > with the same signature, although there may be different steps or testcases > to reproduce this. > > Is this still reproducible on your end ? If yes, can you please retest this > using latest FF release and latest Nightly build > (https://nightly.mozilla.org/) and report back the results ? When doing > this, please use a new clean Firefox profile, maybe even safe mode, to > eliminate custom settings as a possible cause (https://goo.gl/PNe90E). > > Thanks, > Paul. Okay, so after testing yesterday and today, I still can reproduce the bug with empty new profile and latest builds. However, I also tried before an "advice" I saw (sadly I don't remember where even if I think it was in a bug). I actually checked by doing the test on two different ways: - First I did as always, run a manual update from the 'About' window > when the message is here telling to restart, click it > waiting for the restart > crash window - Then I tested another way: run a manual update from the 'About' window > even after the message in the 'About' window have appeared, wait until the little green arrow appear on the sandwich button, the one telling an update have been automatically done and that you need to restart > only there, restarting. In this situation it did not crash. However, I didn't tested this a lot of times, so I might just have been lucky. But I think I've seen this advice in another bug related to those crashes… and it seems it have helped. I don't know if this information can be useful though. Feel free to ask for more informations if I'm able to provide them.
Flags: needinfo?(clement.lefevre)
Whiteboard: [mozfr-community][nightly-community] → [mozfr-community]
User Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:47.0) Gecko/20100101 Firefox/47.0 I was unable to reproduce this issue while updating through the release, aurora or nightly channels. However there are a lot of crashes in the last 3 days with this signature.
Component: Untriaged → Application Update
Product: Firefox → Toolkit
Extremely unlikely that this is app update (restart and startup code is outside of app update) and most likely a startup bug.
Component: Application Update → Startup and Profile System
rstrong, are you sure? This looks like nsUpdateProcessor taking forever (see thread 20 at https://crash-stats.mozilla.com/report/index/d19d3da8-6b91-491d-86b6-938962160512#allthreads)
Flags: needinfo?(robert.strong.bugs)
I wasn't sure and thanks!
Component: Startup and Profile System → Application Update
Flags: needinfo?(robert.strong.bugs)
Robert: Any ideas who might be able to investigate this further? It looks like there are a fair number of crashes in the last week - 1470 and several that affect 48.0b.
Flags: needinfo?(robert.strong.bugs)
Matt, could you take a look at this?
Flags: needinfo?(robert.strong.bugs) → needinfo?(mhowell)
I'm still not convinced that app update is directly involved in this; I can't find any other reports with this signature that have anything update-related on any of the thread stacks. But I don't know a way to search threads other than the crashing thread for every report, I just picked out a handful and checked them manually, so I could be wrong. I also don't have a 10.9 machine available, and it doesn't seem to reproduce on my machine with 10.11, so I'm not sure I can do much with this anyway. Lacking a better option, I'm redirecting to somebody that I know has more Mac expertise.
Flags: needinfo?(mhowell) → needinfo?(spohl.mozilla.bugs)
Since this is crashing after hanging I'm wondering if we can get some log output here. Could you set app.update.log to true in about:config before checking for an update, open the browser console (Tools > Web Developer > Browser Console) and copy everything in the console before the crash occurs? Since I haven't been able to reproduce myself yet I don't know if this is possible, or if the hang/crash is preventing copying from the console. In the meantime I will keep trying to reproduce myself and see if I can find out why we might be taking forever in nsUpdateProcessor.
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(clement.lefevre)
Matt, it looks to me like we might be failing to dispatch events[1] to the main thread during shutdown (I think this comment here[2] is meant as a ToDo), which means that we will fail to shut down the UpdateProcessor's watcher thread[3], which causes us to hang here[4], which eventually prompts the watchdog thread to crash the process[5]. Does this seem plausible to you? [1] https://dxr.mozilla.org/mozilla-central/source/toolkit/xre/nsUpdateDriver.cpp#1274 [2] https://hg.mozilla.org/mozilla-central/annotate/4c05938a64a7fde3ac2d7f4493aee1c5f2ad8a0a/xpcom/threads/nsThreadManager.cpp#l298 [3] https://dxr.mozilla.org/mozilla-central/source/toolkit/xre/nsUpdateDriver.cpp#1288 [4] https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/xpcom/threads/nsThread.cpp#l807 [5] https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/toolkit/components/terminator/nsTerminator.cpp#l158
Flags: needinfo?(mhowell)
Possible dupe: bug 1268528.
I might need some more explanation of how the shutdown procedure and the thread event queues work, but now that I've looked at this a little more deeply, I can't convince myself that the theory above makes sense. If dispatching to the main thread fails, then nsUpdateProcessor::WaitForProcess in the watcher thread will just ignore it and return; it doesn't block on that. That should leave the thread idle and with an empty queue (there are no other dispatches to it), so when nsThreadManager::Shutdown shuts down all the threads, there should be nothing for it to hang on. If there's a shutdown hang happening in the nsUpdateProcessor, my money's on the ::WaitForProcess call. It blocks until the updater process has exited, and that does seem kind of scary. I don't see how it can be waiting for the updater to exit when the about dialog thinks it's already finished (otherwise it wouldn't show the restart button), but that's the best I have right now.
Flags: needinfo?(mhowell)
Crash volume for signature 'shutdownhang | libsystem_kernel.dylib@0x15716': - nightly (50): 5 - aurora (49): 14 - beta (48): 177 - release (47): 5648 - esr (45): 539 Affected platform: Mac OS X
I came across Bug 1288321 yesterday - might be the Linux signature for this crash.
See Also: → 1288321
I've taken bug 1288321; I'll try the same fix for this one once I have it.
Assignee: nobody → mhowell
(In reply to Stephen A Pohl [:spohl] from comment #13) > Since this is crashing after hanging I'm wondering if we can get some log > output here. Could you set app.update.log to true in about:config before > checking for an update, open the browser console (Tools > Web Developer > > Browser Console) and copy everything in the console before the crash occurs? > Since I haven't been able to reproduce myself yet I don't know if this is > possible, or if the hang/crash is preventing copying from the console. > > In the meantime I will keep trying to reproduce myself and see if I can find > out why we might be taking forever in nsUpdateProcessor. I actually don't understand how could I gather such logs with Web console, as the crash is not really happening during the update installation, but rather after, during the restart process caused by the update (I don't know if it's more during the shutdown that the start exactly though). Will what you're asking to be able to act at that moment? Because I don't think so. It would be better to have a way to get those logs in a terminal or in a log file if possible I guess. But if this c
Flags: needinfo?(clement.lefevre) → needinfo?(spohl.mozilla.bugs)
(Sorry for the previous comment, it have been truncated due to accidental tab press…) But if this can help, I've a hard feeling that this may be due to the health of my HDD (SMART informations giving old and pre-fail for everything) causing some I/O operations to be slower than expected. It seems this is causing timeouts in the restart process because of this slowness, if this can be of any help… In the meantime, I started to notice crashes on the TorBrowser closing that can be related, but it looks like the TorBrowser doesn't have an about:crashes to take a look at.
It seems like you've answered my question and it is not possible to collect browser console output in this case. Per comment 19, Matt is looking into this. Thanks!
Flags: needinfo?(spohl.mozilla.bugs)
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; This patch is identical to bug 1288321 attachment 8774911 [details], so see the try push from there [1]. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=b7f195c934ff
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; https://reviewboard.mozilla.org/r/68382/#review66190 r- because I'd like to hear your thoughts re: my comments and see the next patch. ::: toolkit/xre/nsUpdateDriver.h:46 (Diff revision 1) > +#endif > + > #if defined(XP_WIN) > #include <windows.h> > typedef HANDLE ProcessType; > -#elif defined(XP_MACOSX) > +#elif defined(USE_EXECV) || defined(XP_MACOSX) Let's use |#elif defined(XP_UNIX)| here. ::: toolkit/xre/nsUpdateDriver.cpp:49 (Diff revision 1) > # define getpid() GetCurrentProcessId() > #elif defined(XP_UNIX) > # include <unistd.h> > #endif > > -using namespace mozilla; > +#ifdef USE_EXECV s/#ifdef USE_EXECV/#ifdef XP_UNIX/, no? If so, we can move this into the |defined(XP_UNIX)| block a few lines above. ::: toolkit/xre/nsUpdateDriver.cpp:926 (Diff revision 1) > #endif > > LOG(("spawning updater process [%s]\n", updaterPath.get())); > > #if defined(USE_EXECV) > // Don't use execv when staging updates. Let's remove this comment. ::: toolkit/xre/nsUpdateDriver.cpp:928 (Diff revision 1) > LOG(("spawning updater process [%s]\n", updaterPath.get())); > > #if defined(USE_EXECV) > // Don't use execv when staging updates. > if (restart) { > execv(updaterPath.get(), argv); |execv| can fail and returns -1 when it does. It might be prudent to safeguard against this throughout this file by checking the return value and call |exit(-1)| when it does. Or how about |exit( execv(updaterPath.get(), argv));|? ::: toolkit/xre/nsUpdateDriver.cpp:975 (Diff revision 1) > > /** > - * Wait for a process until it terminates. This call is blocking. > + * Wait briefly to see if a process terminates, then return true if it has. > */ > -static void > -WaitForProcess(ProcessType pt) > +static bool > +ProcessHasTerminated(ProcessType pt) nit: Since we're basically touching every line in this function, let's s/pt/pid/ as parameter name here. ::: toolkit/xre/nsUpdateDriver.cpp:978 (Diff revision 1) > */ > -static void > -WaitForProcess(ProcessType pt) > +static bool > +ProcessHasTerminated(ProcessType pt) > { > #if defined(XP_WIN) > - WaitForSingleObject(pt, INFINITE); > + if(WaitForSingleObject(pt, 1000)) { nit: space between |if| and opening parenthesis. ::: toolkit/xre/nsUpdateDriver.cpp:983 (Diff revision 1) > - WaitForSingleObject(pt, INFINITE); > + if(WaitForSingleObject(pt, 1000)) { > + return false; > + } > CloseHandle(pt); > -#elif defined(XP_MACOSX) > - waitpid(pt, 0, 0); > + return true; > +#elif defined(USE_EXECV) || defined(XP_MACOSX) s/#elif defined(USE_EXECV) || defined(XP_MACOSX)/#elif defined(XP_UNIX)/ ::: toolkit/xre/nsUpdateDriver.cpp:988 (Diff revision 1) > - waitpid(pt, 0, 0); > +#elif defined(USE_EXECV) || defined(XP_MACOSX) > + int exitStatus; > + bool exited = waitpid(pt, &exitStatus, WNOHANG) > 0; > + if (!exited) { > + sleep(1); > + exited = waitpid(pt, 0, WNOHANG) > 0; Can we rewrite this to: if (!exited) { sleep(1); } else if (WIFEXITED(exitStatus) && (WEXITSTATUS(exitStatus) != 0)) { LOG(("Error while running the updater process, check update.log")); } ::: toolkit/xre/nsUpdateDriver.cpp:994 (Diff revision 1) > + } > + if (WIFEXITED(exitStatus) && (WEXITSTATUS(exitStatus) != 0)) { > + LOG(("Error while running the updater process, check update.log")); > + } > + return exited; > #else nit: s/#else/#endif/ and remove the |#endif| at the bottom of this function. ::: toolkit/xre/nsUpdateDriver.cpp:1272 (Diff revision 1) > > void > nsUpdateProcessor::WaitForProcess() > { > MOZ_ASSERT(!NS_IsMainThread(), "main thread"); > - ::WaitForProcess(mUpdaterPID); > + if(ProcessHasTerminated(mUpdaterPID)) { nit: space between |if| and opening parenthesis.
Attachment #8776667 - Flags: review?(spohl.mozilla.bugs) → review-
https://reviewboard.mozilla.org/r/68382/#review66190 > Let's use |#elif defined(XP_UNIX)| here. I didn't do that because I didn't want to depend on "platforms where we use execv" always meaning "Unix and also Mac" as it happens to be right now; otherwise there's no point in defining USE_EXECV at all. > s/#ifdef USE_EXECV/#ifdef XP_UNIX/, no? If so, we can move this into the |defined(XP_UNIX)| block a few lines above. Whether we're going to use waitpid and its associated macros is determined by USE_EXECV, not by XP_UNIX. > Let's remove this comment. Yeah, not the most useful comment in the world. > |execv| can fail and returns -1 when it does. It might be prudent to safeguard against this throughout this file by checking the return value and call |exit(-1)| when it does. Or how about |exit( execv(updaterPath.get(), argv));|? Yeah, I agree. Wrapping the execv in an exit call feels dirty (my Windows feathers are getting rather ruffled), but probably makes the most sense. > nit: Since we're basically touching every line in this function, let's s/pt/pid/ as parameter name here. Well, it's still actually only a pid_t on 1 out of 3 branches; on the others calling it a pid feels misleading. > nit: space between |if| and opening parenthesis. Yep. > s/#elif defined(USE_EXECV) || defined(XP_MACOSX)/#elif defined(XP_UNIX)/ Discussed above. > Can we rewrite this to: > > if (!exited) { > sleep(1); > } else if (WIFEXITED(exitStatus) && (WEXITSTATUS(exitStatus) != 0)) { > LOG(("Error while running the updater process, check update.log")); > } Sure. > nit: s/#else/#endif/ and remove the |#endif| at the bottom of this function. It won't build like that; if the call to PR_WaitProcess is always there, pt won't be the right type for it. > nit: space between |if| and opening parenthesis. Doh.
https://reviewboard.mozilla.org/r/68382/#review66190 > Whether we're going to use waitpid and its associated macros is determined by USE_EXECV, not by XP_UNIX. Discussed via IRC, |waitpid| is used on OSX as well so we can move this into the |defined(XP_UNIX)| block a few lines above. > Well, it's still actually only a pid_t on 1 out of 3 branches; on the others calling it a pid feels misleading. Makes sense, thanks! > It won't build like that; if the call to PR_WaitProcess is always there, pt won't be the right type for it. Thanks for the explanation, makes sense!
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/68382/diff/1-2/
Attachment #8776667 - Flags: review- → review?(spohl.mozilla.bugs)
Attachment #8776667 - Flags: review?(spohl.mozilla.bugs)
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; https://reviewboard.mozilla.org/r/68382/#review66422 ::: toolkit/xre/nsUpdateDriver.cpp:573 (Diff revisions 1 - 2) > char workingDirPath[MAXPATHLEN]; > rv = GetCurrentWorkingDir(workingDirPath, sizeof(workingDirPath)); > if (NS_FAILED(rv)) > return; > > // Construct the PID argument for this process. If we are using execv, then nit: let's improve this comment, since it's no longer immediately clear where we're using execv. Something like: Construct the PID argument for this process. We're using execv on all Unix platforms with the exception of OSX. On those platforms, we pass "0" which is then ignored by the updater. ::: toolkit/xre/nsUpdateDriver.cpp:939 (Diff revisions 1 - 2) > + // since it is known to cause problems on the Mac. Windows has execv, but it > + // is a faked implementation that doesn't really replace the current process. > + // Instead it spawns a new process, so we gain nothing from using execv on > + // Windows. > if (restart) { > - execv(updaterPath.get(), argv); > + exit(execv(updaterPath.get(), argv)); Let's wrap all |execv| calls in |exit| in this file.
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/68382/diff/2-3/
Attachment #8776667 - Flags: review?(spohl.mozilla.bugs)
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; https://reviewboard.mozilla.org/r/68382/#review66472 Thank you!
Attachment #8776667 - Flags: review?(spohl.mozilla.bugs) → review+
Pushed by mhowell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d5fb267d0946 Avoid blocking where possible while waiting for the updater to stage; r=spohl
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Hi :mhowell, Do you want to uplift this for 49/50 if this patch is not too risky?
(In reply to Gerry Chang [:gchang] from comment #36) > Hi :mhowell, > Do you want to uplift this for 49/50 if this patch is not too risky? I think we can do that; the risk isn't too bad. I'll make the request.
Flags: needinfo?(mhowell)
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; Approval Request Comment [Feature/regressing bug #]: Bug 307181 introduced this code, but this crash would have gradually appeared over time as the application gets larger and updates take longer to apply. [User impact if declined]: Shutdown hang crashes [Describe test coverage new/current, TreeHerder]: Covered by existing tests https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=d5fb267d0946 [Risks and why]: The change affects detecting when the browser should be restarted to apply an update; the only risk is that indicator failing to appear. But the patch has been running on nightly for 10 days with no problems reported. [String/UUID change made/needed]: None
Attachment #8776667 - Flags: approval-mozilla-beta?
Attachment #8776667 - Flags: approval-mozilla-aurora?
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; Crash fix, still early in the Aurora cycle, let's uplift to Fx50.
Attachment #8776667 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8776667 [details] Bug 1272614 - Avoid blocking where possible while waiting for the updater to stage; I'd like this for beta as well. Matt, how can we judge its success or failure in beta 5 ?
Attachment #8776667 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #41) > I'd like this for beta as well. Matt, how can we judge its success or > failure in beta 5 ? Good question. I'd say this patch is succeeding if we see fewer shutdownhang crashes with nsUpdateProcessor on any of the thread stacks. I don't have a way to find out how many of those are happening before though; it might be below the noise floor already.
Needs rebasing for Beta uplift.
Flags: needinfo?(mhowell)
Attached patch Rebased patch for beta (obsolete) — Splinter Review
Flags: needinfo?(mhowell) → needinfo?(ryanvm)
Comment on attachment 8782186 [details] [diff] [review] Rebased patch for beta This looks like the wrong patch.
Flags: needinfo?(ryanvm)
Flags: needinfo?(mhowell)
... yep, that's because I attached the wrong patch. I actually looked at this one first. Sorry about that, and thanks for checking.
Attachment #8782186 - Attachment is obsolete: true
Flags: needinfo?(mhowell) → needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #48) > I found one crash on nightly in the last week in this code path > https://crash-stats.mozilla.com/report/index/6005970a-c1f1-4b5d-83d8- > 4e5f02160907#allthreads We have further changed the way we wait for the updater to terminate in bug 1250901 since the patch here landed. This crash report indicates that the updater fails to exit in a timely manner and it is not our "wait-for-process-to-terminate" logic in Firefox that's flawed.
Stephen, what determines "exit in a timely manner"?
Flags: needinfo?(spohl.mozilla.bugs)
The watchdog thread crashes the Firefox process during shutdown if the process is still running after a certain amount of time. The timeout is set in the toolkit.asyncshutdown.crash_timeout pref and it appears to be set to 60000ms by default. So it looks like the updater in this case fails to exit within one minute.
Flags: needinfo?(spohl.mozilla.bugs)
Thanks! I filed bug 1301572 in the hope that this approach will fix these types of crashes.
It appears that there are other cases than app update where this crash happens with this signature... not surprisingly.
Summary: Browser crashing while being restarted after an update → Browser crashing while being restarted after an update with nsUpdateDriver.cpp in the stack
Depends on: 1303834
See Also: → 1379619
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: