Closed
Bug 614958
Opened 14 years ago
Closed 14 years ago
Browser freezing on exit, requires forced quit after going offline/sleeping (event loop spinning forever)
Categories
(Firefox :: General, defect)
Tracking
()
RESOLVED
FIXED
Firefox 4.0b8
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta8+ |
People
(Reporter: rcampbell, Assigned: ehsan.akhgari)
References
Details
(Keywords: dogfood, hang, regression)
Attachments
(11 files, 1 obsolete file)
237.66 KB,
text/plain
|
Details | |
16.13 KB,
text/plain
|
Details | |
157.62 KB,
text/plain
|
Details | |
59.33 KB,
text/plain
|
Details | |
42.07 KB,
text/plain
|
Details | |
38.16 KB,
image/png
|
Details | |
6.48 KB,
application/x-gzip
|
Details | |
17.91 KB,
text/plain
|
Details | |
2.11 KB,
patch
|
benjamin
:
review+
Biesinger
:
review+
|
Details | Diff | Splinter Review |
2.39 KB,
patch
|
bzbarsky
:
review+
Biesinger
:
review+
|
Details | Diff | Splinter Review |
38.71 KB,
text/plain
|
Details |
buildid: 20101126 See attached stack from force quit.
Reporter | ||
Comment 1•14 years ago
|
||
Still seeing this on wake-from-sleep. Requesting blocking.
blocking2.0: --- → ?
Reporter | ||
Updated•14 years ago
|
Version: unspecified → Trunk
Comment 2•14 years ago
|
||
I was running into this as well, and removing Minefield from Applications and re-installing seemed to correct things.
Comment 3•14 years ago
|
||
Rob: when you come back from sleep, are you told that Firefox is offline?
Comment 4•14 years ago
|
||
I've run into this for at least my past few updates on OS X, including for today's update.
Comment 5•14 years ago
|
||
Blocking b8 for investigation; this could affect the upgrade path for our users, which is pretty serious.
blocking2.0: ? → beta8+
Comment 6•14 years ago
|
||
I was optimistic. After using it a little, it still hangs. Maybe plugin container, because when I quit, all UI disappears, but the container process is still running: kevin-needhams-imac-2:mozsrc kneedham$ ps -eaf |grep 63342 501 63342 1 0 0:37.80 ?? 3:35.40 /Applications/Minefield.app/Contents/MacOS/firefox-bin -foreground -foreground 501 63779 63342 0 0:03.39 ?? 0:15.15 /Applications/Minefield.app/Contents/MacOS/plugin-container.app/Contents/MacOS/plugin-container /Library/Internet Plug-Ins/Flash Player.plugin -omnijar /Applications/Minefield.app/Contents/MacOS/omni.jar 63342 gecko-crash-server-pipe.63342 org.mozilla.machname.1946510159 plugin 501 64094 56040 0 0:00.00 ttys001 0:00.00 grep 63342
Comment 7•14 years ago
|
||
I've been getting this too on Linux64 when restarting to get my nightly updates. The UI goes away, but firefox-bin and plugin-container are still running.
Reporter | ||
Comment 8•14 years ago
|
||
(In reply to comment #3) > Rob: when you come back from sleep, are you told that Firefox is offline? yes. Toggling the menu option doesn't change it.
Comment 9•14 years ago
|
||
Comment 10•14 years ago
|
||
Comment #7 was me signed in under the wrong account.
Comment 11•14 years ago
|
||
Can't reproduce with today's nightly. I just restarted a few times, both with and without suspending my laptop briefly.
blocking2.0: beta8+ → ?
Reporter | ||
Comment 12•14 years ago
|
||
catlee, did you mean to clear the blocking b8 flag?
Updated•14 years ago
|
blocking2.0: ? → beta8+
Comment 13•14 years ago
|
||
(In reply to comment #12) > catlee, did you mean to clear the blocking b8 flag? nope!
Assignee | ||
Comment 14•14 years ago
|
||
This is bug 614286, I think.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Comment 15•14 years ago
|
||
This has happened to me sometimes, but I have attributed to the number of tabs (40) and content that I have open before I quit and those consume a lot of memory. I've seen this happen before, maybe for two months now.
Comment 16•14 years ago
|
||
I'm using a tinderbox build with bug 614286 landed: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e1983b9db75d&tochange=1b4041e3c177 Built from http://hg.mozilla.org/mozilla-central/rev/1b4041e3c177 I tested this particular bug by going to File -> Work Offline, and the File menu stayed blue. Duping other bugs to this one as this has blocking beta8+.
Assignee: nobody → ehsan
Status: RESOLVED → REOPENED
Depends on: 614286
Keywords: dogfood,
regression
Resolution: DUPLICATE → ---
Summary: Browser freezing on exit, requires forced quit → Browser freezing on exit, requires forced quit after going offline/sleeping (event loop spinning forever)
Assignee | ||
Comment 24•14 years ago
|
||
(In reply to comment #16) > I'm using a tinderbox build with bug 614286 landed: > > http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e1983b9db75d&tochange=1b4041e3c177 > > Built from http://hg.mozilla.org/mozilla-central/rev/1b4041e3c177 > > I tested this particular bug by going to File -> Work Offline, and the File > menu stayed blue. > > Duping other bugs to this one as this has blocking beta8+. I tried reproducing this several times without success? Can you reproduce this?
Comment 25•14 years ago
|
||
I can't consistently reproduce this as per my comment from bug 613998 comment 3, but the way I test it is by just browsing normally and trying File -> Work Offline from time to time. If it hangs, then it's definitely bad, if not I browse some more and try again later. 20 ok? http://hg.mozilla.org/mozilla-central/rev/baa6cc2f72e4 21 bad http://hg.mozilla.org/mozilla-central/rev/be4b69a4babf Build from 1121 definitely is bad, but because I don't have a deterministic STR, 20 seems good so far. Doing File -> Work Offline can trigger this as well as it eventually calls SetOffline: nsIOService::SetOffline(int) nsSocketTransportService::Shutdown() nsThread::Shutdown() NS_ProcessNextEvent_P(nsIThread*, int) nsThread::ProcessNextEvent(int, int*) nsEventQueue::GetEvent(int, nsIRunnable**) PR_Wait PR_WaitCondVar _pthread_cond_wait __semwait_signal With that range, I've guessed at the TCP SYN change bug 592284. I'll re-download the build from november 20 to check again if I can hang.
Assignee | ||
Comment 26•14 years ago
|
||
So, reading comment 25, it seems that this doesn't have anything to do with bug 614286, right? And since your STR is not the same as comment 0, I'm not sure if you can just use the blocking flag set on this bug for this other problem. (FWIW, I haven't been able to reproduce this yet.)
Assignee | ||
Comment 27•14 years ago
|
||
OK, I just reproduced a case where the Firefox got stuck in offline mode, and when I attempted to quit, it just hung...
Comment 28•14 years ago
|
||
robcee, does this seem accurate? The STR in comment 0 is basically.. 1) wake the browser from sleep 2) notice the browser is stuck offline 3) quit firefox and notice it hanging I took a screenshot from several days ago when I hit this after putting my laptop to sleep: https://bug613998.bugzilla.mozilla.org/attachment.cgi?id=492586 I'm guessing some offline notification is being sent to eventually call nsIOService::SetOffline(int) which is the same ending call stack for when you do File -> Work Offline. (I originally only hit this when putting my laptop to sleep, but to come up with a STR that didn't involve closing the whole laptop, I used File -> Work Offline.) And yes, it doesn't seem to have anything to do with bug 614286 other than a bunch of duplicate stuck-offline bugs were marked potentially resolved by that when it initially landed.
Assignee | ||
Comment 29•14 years ago
|
||
Assignee | ||
Comment 30•14 years ago
|
||
Mardak, did you also get stuck in offline mode?
Assignee | ||
Updated•14 years ago
|
Status: REOPENED → ASSIGNED
Comment 31•14 years ago
|
||
Yes, after waking from sleep or doing File -> Work Offline, toggling it back to not Work Offline, loading pages in tabs results in "you're working offline!" as well as gmail showing its error message saying it can't reach the servers.
Comment 32•14 years ago
|
||
This is probably the cause for bug 573259 failing a lot more often recently then. That is caused by switching back online not working straight away for some reason.
Comment 33•14 years ago
|
||
Perhaps not useful by itself as we need to compare it to the number of pushes on those days.. But here's the count of recent comments posted on which days in bug 573259: 1 2010-11-02 2 2010-11-04 2 2010-11-05 1 2010-11-09 10 2010-11-10 5 2010-11-11 3 2010-11-12 3 2010-11-13 5 2010-11-14 6 2010-11-15 9 2010-11-16 7 2010-11-17 8 2010-11-18 9 2010-11-19 3 2010-11-21 7 2010-11-22 13 2010-11-23 8 2010-11-24 5 2010-11-25 4 2010-11-26 2 2010-11-28 15 2010-11-29 14 2010-11-30 11 2010-12-01 In october, there were almost 1 report a day on average (~26 reports).
Assignee | ||
Comment 34•14 years ago
|
||
(In reply to comment #32) > This is probably the cause for bug 573259 failing a lot more often recently > then. That is caused by switching back online not working straight away for > some reason. What's special about that test that could be affected by this?
Comment 35•14 years ago
|
||
(In reply to comment #34) > (In reply to comment #32) > > This is probably the cause for bug 573259 failing a lot more often recently > > then. That is caused by switching back online not working straight away for > > some reason. > > What's special about that test that could be affected by this? The test immediately before (browser_offline.js) flips offline and back online again. I have seen browser_relative.js fail and when it does the browser shows the offline error page even though by then it should be online.
Comment 36•14 years ago
|
||
I just got hit by this again, I did see some errors in the error console, screenshot coming up, and did a mac analysis, which looks somewhat different than the others here, so I'm attaching it. buildid 20101201030318, os/x 10.6.5
Comment 37•14 years ago
|
||
Comment 38•14 years ago
|
||
Comment 39•14 years ago
|
||
Comment on attachment 494658 [details] Live sample on Mac, different than others posted Whoops, just noticed my sample is the same/very similar to comment 25
Attachment #494658 -
Attachment is obsolete: true
Comment 40•14 years ago
|
||
I guess that the two notification.xml errors are from me pressing the "sync again" button on the sync-failed notification bar, fwiw.
Comment 41•14 years ago
|
||
Comment 42•14 years ago
|
||
Ehsan - this looks like a beta 8 long pole - do you have a plan of attack or, even better, a best/worst-case estimate for a fix?
Assignee | ||
Comment 43•14 years ago
|
||
(In reply to comment #35) > (In reply to comment #34) > > (In reply to comment #32) > > > This is probably the cause for bug 573259 failing a lot more often recently > > > then. That is caused by switching back online not working straight away for > > > some reason. > > > > What's special about that test that could be affected by this? > > The test immediately before (browser_offline.js) flips offline and back online > again. I have seen browser_relative.js fail and when it does the browser shows > the offline error page even though by then it should be online. Oh, right, makes a lot of sense. (In reply to comment #42) > Ehsan - this looks like a beta 8 long pole - do you have a plan of attack or, > even better, a best/worst-case estimate for a fix? Not yet, but I have managed to reproduce it, and I'm investigating it heavily right now (got distracted by the tree brokenness yesterday -- sorry).
Assignee | ||
Comment 44•14 years ago
|
||
OK, I figured out a big part of the mystery. Look at this stack (which is coming from my repeated clicks on File > Work Offline. Frame 61 shows the SetOffline(PR_TRUE) call. This function calls nsSocketTransportService::Shutdown which attempts to shut down the thread, which gets us on that event loop. But before the shutdown is finished, we start processing normal user events, which takes us to frame 0 where SetOffline(PR_FALSE) is called again, which just exits quickly because it has re-entrancy checks. This causes us to get stuck in offline mode, and will lead us to get trapped in the event loop initiated in frame 59 while shutting down, which is the root cause of this bug.
Assignee | ||
Comment 45•14 years ago
|
||
With the help of bent and sdwilsh, I understood what's happening here. Here's the problematic thread: (gdb) thread apply 80 bt Thread 80 (process 75871): #0 0x00007fff81080322 in semaphore_wait_signal_trap () #1 0x00007fff8108596d in pthread_mutex_lock () #2 0x0000000100b348e9 in PR_Lock (lock=0x1291dfc90) at /Users/ehsanakhgari/moz/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:206 #3 0x00000001008ca171 in nsAutoLock::nsAutoLock (this=0x12a77ea60, aLock=0x1291dfc90, _notifier=@0x12a77ea90) at nsAutoLock.h:222 #4 0x0000000112a0b16a in nsSocketOutputStream::CloseWithStatus (this=0x12c3cad10, reason=2152136706) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:654 #5 0x0000000112a0663f in nsSocketOutputStream::Close (this=0x12c3cad10) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:540 #6 0x0000000112a075d9 in nsSocketOutputStream::Release (this=0x12c3cad10) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:533 #7 0x00000001129dd6ce in nsCOMPtr<nsIAsyncOutputStream>::~nsCOMPtr (this=0x133d51980) at nsCOMPtr.h:533 #8 0x0000000112a9e89d in nsHttpConnection::~nsHttpConnection (this=0x133d518b0) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:118 #9 0x0000000112a9c3b7 in nsHttpConnection::Release (this=0x133d518b0) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:962 #10 0x00000001129dd979 in nsCOMPtr<nsITransportEventSink>::assign_assuming_AddRef (this=0x12c3cacd0, newPtr=0x0) at nsCOMPtr.h:518 #11 0x00000001129dda0e in nsCOMPtr<nsITransportEventSink>::assign_with_AddRef (this=0x12c3cacd0, rawPtr=0x0) at nsCOMPtr.h:1204 #12 0x00000001129ddf2b in nsCOMPtr<nsITransportEventSink>::operator= (this=0x12c3cacd0, rhs=0x0) at nsCOMPtr.h:663 #13 0x0000000112a0a668 in nsSocketTransport::OnSocketDetached (this=0x12c3cabb0, fd=0x11ebac6b0) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:1630 #14 0x0000000112a0ebec in nsSocketTransportService::DetachSocket (this=0x101099e00, sock=0x101099e60) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:188 #15 0x0000000112a0ee70 in nsSocketTransportService::Run (this=0x101099e00) at /Users/ehsanakhgari/moz/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:602 #16 0x00000001008ba96a in nsThread::ProcessNextEvent (this=0x130b43b30, mayWait=1, result=0x12a77ee3c) at /Users/ehsanakhgari/moz/mozilla-central/xpcom/threads/nsThread.cpp:626 #17 0x0000000100831b4e in NS_ProcessNextEvent_P (thread=0x130b43b30, mayWait=1) at nsThreadUtils.cpp:250 #18 0x00000001008bb3f4 in nsThread::ThreadFunc (arg=0x130b43b30) at /Users/ehsanakhgari/moz/mozilla-central/xpcom/threads/nsThread.cpp:278 #19 0x0000000100b3c157 in _pt_root (arg=0x12e90d170) at /Users/ehsanakhgari/moz/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:187 #20 0x00007fff810b9536 in _pthread_start () #21 0x00007fff810b93e9 in thread_start () It tries to lock mTransport->mLock in nsSocketOutputStream::CloseWithStatus, but that lock is already locked in nsSocketTransport::OnSocketDetached.
Comment 46•14 years ago
|
||
Potentially this was made worse by the 3x socket transport closes: http://hg.mozilla.org/mozilla-central/rev/be4b69a4babf#l2.149 Each mSocketTransport->Close(reason) calls both mInput.CloseWithStatus(reason) and mOutput.CloseWithStatus(reason).
Depends on: 592284
Assignee | ||
Comment 47•14 years ago
|
||
Attachment #494806 -
Flags: review?(benjamin)
Comment 48•14 years ago
|
||
Comment on attachment 494806 [details] [diff] [review] Patch (v1) r=me for landing in b8 conditional on you getting a real necko peer to review this later.
Attachment #494806 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 49•14 years ago
|
||
(In reply to comment #46) > Potentially this was made worse by the 3x socket transport closes: > http://hg.mozilla.org/mozilla-central/rev/be4b69a4babf#l2.149 > > Each mSocketTransport->Close(reason) calls both mInput.CloseWithStatus(reason) > and mOutput.CloseWithStatus(reason). It's possible, although maybe not as a direct result of that code, but as a result of us having to destroy three of these objects instead of one.
Assignee | ||
Comment 50•14 years ago
|
||
Comment on attachment 494806 [details] [diff] [review] Patch (v1) Asking bz for the real review, but I'll land it right now.
Attachment #494806 -
Flags: review?(bzbarsky)
Updated•14 years ago
|
Attachment #494806 -
Flags: review+
Assignee | ||
Comment 51•14 years ago
|
||
Comment on attachment 494806 [details] [diff] [review] Patch (v1) Clearing the request from bz as I now have a real review from a peer.
Attachment #494806 -
Flags: review?(bzbarsky)
Comment 52•14 years ago
|
||
fwiw the syn code won't actually give you 3x closes (some of those pointers are mutually exclusive).. the max is 2x and it realistically is less than that because a timer has to expire before we go past 1x.. you might consider if the patch queued in 613977 fixes this.. it's not clear to me that it would or would not. the patch is a follow on to the syn code which includes a problem where the socketservice global limit could be exceeded - I have noticed before that really odd lockups can occur when that happens. that patch is just an hour old and needs a review or two.
Comment 53•14 years ago
|
||
> Asking bz for the real review, but I'll land it right now.
Fwiw, I seem to recall someone asking me about hangs in this exact code a day or two back, and I told him to do just that for callbacks and event sink. Should we do this for callbacks as well in this bug?
Also, does that just fix the hang on shutdown? Or also the getting stuck in offline mode?
Assignee | ||
Comment 54•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/2c24ce211f2b (In reply to comment #52) > fwiw the syn code won't actually give you 3x closes (some of those pointers are > mutually exclusive).. the max is 2x and it realistically is less than that > because a timer has to expire before we go past 1x.. > > you might consider if the patch queued in 613977 fixes this.. it's not clear to > me that it would or would not. the patch is a follow on to the syn code which > includes a problem where the socketservice global limit could be exceeded - I > have noticed before that really odd lockups can occur when that happens. that > patch is just an hour old and needs a review or two. I don't see why it that patch would fix this problem. This is a deadlock situation that we can get into even without your tcp syn patch... (In reply to comment #53) > > Asking bz for the real review, but I'll land it right now. > > Fwiw, I seem to recall someone asking me about hangs in this exact code a day > or two back, and I told him to do just that for callbacks and event sink. > Should we do this for callbacks as well in this bug? I was going to do this for callbacks as well, but it wasn't clear to me if that's needed. Is it? > Also, does that just fix the hang on shutdown? Or also the getting stuck in > offline mode? Both, as they're the result of the same underlying problem.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 4.0b8
Comment 55•14 years ago
|
||
> Is it? I don't know, but it's easy to do, and clearly safer, right? > Both, as they're the result of the same underlying problem. How does the stuck offline problem result from the Release() being under the lock?
Assignee | ||
Comment 56•14 years ago
|
||
(In reply to comment #55) > > Is it? > > I don't know, but it's easy to do, and clearly safer, right? OK, I'll write a patch for that too. > > Both, as they're the result of the same underlying problem. > > How does the stuck offline problem result from the Release() being under the > lock? The original SetOffline call calls mSocketTransportService->Shutdown(), but that never returns, and spins its own event loop, so the second SetOffline call happens while nsIOService::mSettingOffline is true (because the original call hasn't returned yet), so it just returns without doing anything, which means that the offfline status, whatever it was, can't be changed any more. Since the mSocketTransportService->Shutdown() call only happens when going offline, this can only cause you to get stuck in offline mode.
Assignee | ||
Comment 57•14 years ago
|
||
Attachment #494873 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 58•14 years ago
|
||
Comment on attachment 494873 [details] [diff] [review] Part 2: protect mCallbacks as well I'd appreciate if someone can land this for me if I'm not around when this is reviewed either by bz or biesi, as I may not be around tonight.
Attachment #494873 -
Flags: review?(cbiesinger)
Comment 59•14 years ago
|
||
I took the hourly builds for a spin on Mac OS, trying out a few scenarios. Toggling offline/online browsing works fine every time. I tried turning wi-fi off, as well as having the computer go into sleep mode, and I was able to keep browsing and exit the application a couple of times with no hangs and normal exits. But I tried a few more times and at least a couple of times the application hanged on exit (after several minutes of waiting for it to quit), and I also got a seemingly unrelated crash: bp-b1395957-554b-43a7-8652-3d3a32101202 I'm attaching another sample where the application wouldn't exit.
Comment 60•14 years ago
|
||
I got this one too: bp-ad764f47-2ad5-477d-9c05-930792101202
Comment 61•14 years ago
|
||
Comment on attachment 494873 [details] [diff] [review] Part 2: protect mCallbacks as well r=me I don't think this needs additional review from biesi.
Attachment #494873 -
Flags: review?(cbiesinger)
Attachment #494873 -
Flags: review?(bzbarsky)
Attachment #494873 -
Flags: review+
Assignee | ||
Comment 62•14 years ago
|
||
Landed the second part as: http://hg.mozilla.org/mozilla-central/rev/f08d6796f9af
Assignee | ||
Comment 63•14 years ago
|
||
Juan, the crash links you posted don't seem relevant to this bug, but can you attach a sample of the hangs that you observed, please?
Comment 64•14 years ago
|
||
One sample is included in comment #59, but I can post a couple more if needed.
Assignee | ||
Comment 65•14 years ago
|
||
(In reply to comment #64) > One sample is included in comment #59, but I can post a couple more if needed. Hmm, yes, sorry, I somehow missed that. So, from a quick look, it doesn't look very similar to this bug. Do you have STRs for that (I assume not)? If not, can you please file a new bug on it? It seems to me that the PSM thread is failing to exit...
Comment 66•14 years ago
|
||
Ehsan, juan: there are already some bugs about that exact issue. Bug 430260 and bug 468736. In general, they all could be related or duplicates of bug 429304.
Updated•14 years ago
|
Attachment #494873 -
Flags: review+
You need to log in
before you can comment on or make changes to this bug.
Description
•