Closed Bug 355399 Opened 18 years ago Closed 18 years ago

unclean shutdown while safe browsing initializes (makes zombies)

Categories

(Toolkit :: Safe Browsing, defect)

2.0 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: tracy, Assigned: tony)

References

Details

(Keywords: regression, verified1.8.1.1)

Attachments

(2 files)

Seen on Windows, Mac and Linux 2.0rc2.2 (and older 2.0 builds)

-remove(rename) the firefox profile folder
-launch Firefox
-don't migrate a profile
-after launch, there should be the two default tabs for a first run profile. click the [X] window close button or File | Quit (Exit)
note: there may be timing issue involved in that step
-In the confirm close multiple tabs dialog, Leave the "Warn me..." checkbox checked, then click "Close tabs"

tested results:
window closes but the app is still running and present in processes.

expected results:
complete application shutdown.

note: unchecking the check box in the confirm close multiple tabs dialog the close tabs doesn't exhibit this bug.
This seems to be a regression in beta2 from beta1 when the additional first run tab was added. I haven't been able to reproduce with any other pair of tabs.  and changing the the content in one of the original two, say to the home page, makes the bug go away as well.
I've noticed that if you wait too long that this isn't reproducible.  On Windows at least, I can hear my hard drive still working on the initial launch. If I attempt to close with the given steps while the machine is still working, that is when the process gets left running.  If I wait until the hard drive is no longer writing, the bug isn't reproducible.
from ispiked's suggestion, I've found that creating a profile, then editing prefs.js so that user_pref("browser.safebrowsing.enabled", false); then trying to reproduce make this bug go away (or impossible to reproduce).
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1) Gecko/20061004 BonEcho/2.0 ID:2006100403

I see this problem too, where firefox is closed but left running in the process list. I use different reproduction steps tho; most noticeably the value of the "Warn me when closing multiple tabs" checkbox is irrelevant.

1. New profile
2. Start firefox with the newly created profile
3. (Optional) Tools > Options > Tabs > Untick 'Warn me when closing tabs'
4. Bookmarks > Mozilla Firefox > Open In New Tabs
5. Quickly hit alt-f4 to close the browser (as the tabs are loading, and before they have finished loading)
6. If you didn't do step 3, immediately hit enter (or click 'Close Tabs') at the 'You are about to close 4 tabs. Are you sure you want to continue?' prompt.

The result is a firefox.exe process left in the process list. If this does not work, then start again at step 2, but before step 4 choose Tools > Clear Private Data > Clear Private Data Now (the bug seems more likely to happen if you close the browser whilst tabs are loading; by clearing the cache you give yourself a bit more time for the tabs to load because they are not cached).

When doing what ispiked suggested, adding:
  user_pref("browser.safebrowsing.enabled", false);
to prefs.js, I am unable to reproduce the bug.
Component: Tabbed Browser → Phishing Protection
QA Contact: tabbed.browser → phishing.protection
Hmm, sounds like the background safe browsing process isn't dying. It might be easier to repro on a slow connection.  I'll look into it.
Repro-ed using this build: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2.0rc2-candidates/rc2/firefox-2.0.en-US.linux-i686.tar.gz

gdb on the stuck process gives me:
#0  0x401b10d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x401b0708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
#2  0x401acfab in pthread_cond_wait@GLIBC_2.0 () from /lib/i686/libpthread.so.0
#3  0x40181437 in PR_WaitCondVar () from ./libnspr4.so
#4  0x4018168e in PR_Wait () from ./libnspr4.so
#5  0x086c6770 in nsXPTCVariant::Init ()
#6  0x086c6290 in nsXPTCVariant::Init ()
#7  0x4014239b in XPTC_InvokeByIndex () from ./libxpcom_core.so
#8  0x080a6954 in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#9  0x080ac52e in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#10 0x40054e20 in js_Invoke () from ./libmozjs.so
#11 0x40059733 in js_Interpret () from ./libmozjs.so
#12 0x40054edb in js_Invoke () from ./libmozjs.so
#13 0x40050623 in js_fun_toString () from ./libmozjs.so
#14 0x40054e20 in js_Invoke () from ./libmozjs.so
#15 0x40059733 in js_Interpret () from ./libmozjs.so
#16 0x40054edb in js_Invoke () from ./libmozjs.so
#17 0x080a225a in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#18 0x080a0188 in nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser::INIValue> > >::~nsTHashtable ()
#19 0x401424c0 in XPTC_InvokeByIndex () from ./libxpcom_core.so
#20 0x086c69aa in nsXPTCVariant::Init ()
#21 0x4014239b in XPTC_InvokeByIndex () from ./libxpcom_core.so
#22 0x40132bb0 in nsProxyObject::Post () from ./libxpcom_core.so
#23 0x4012d11f in PL_HandleEvent () from ./libxpcom_core.so
#24 0x4012d072 in PL_ProcessPendingEvents () from ./libxpcom_core.so
#25 0x4012e69f in nsEventQueueImpl::CheckForDeactivation () from ./libxpcom_core.so
#26 0x400f66aa in NS_ShutdownXPCOM_P () from ./libxpcom_core.so


Still trying to repro with a debug build so I can see what's happening.
updating summary to clarify exact issue.
Summary: unclean shutdown with first run pages close multiple tabs dialog → unclean shutdown while safe browsing initializes
Oh, so the theory is that the xpcom-shutdown observer in alarm.js is the culprit, but I haven't been able to get a working test case to prove it.  The Google Browser Sync team had a similar problem and solved it by removing the observer which is unnecessary (nsITimers are automatically cleaned up?)  I'll try to post a patch for this on trunk this week.
Status: NEW → ASSIGNED
Ok, I can reproduce by setting setting the pref:
user_pref("browser.safebrowsing.provider.0.updateURL", "http://ponderer.org/tests/slow.py?");

and clearing all the urlclassifier.tableversion.* prefs.  The above url just simulates a slow download.

What happens is xpcom-shutdown closes the db service background thread.  Then the table download is canceled, which makes a call into the db service to notify it that a download failed.  At this point, the db service doesn't realize the background thread was shutdown and sits waiting for it forever.

The fix is to not try to wait on the background thread after xpcom-shutdown.
Assignee: nobody → tony
Attachment #243026 - Flags: review?(darin)
Also, this bug doesn't exist on trunk because the background thread check is different.
Comment on attachment 243026 [details] [diff] [review]
don't wait for thread after it's shutdown

Looks good, r=darin
Attachment #243026 - Flags: review?(darin) → review+
Flags: blocking1.8.1.1?
*** Bug 358249 has been marked as a duplicate of this bug. ***
Attachment #243026 - Flags: approval1.8.1.1?
*** Bug 359664 has been marked as a duplicate of this bug. ***
Blocks: ghostproc
Alias: zonbie
Alias: zonbie
Flags: blocking1.8.1.1? → blocking1.8.1.1+
Summary: unclean shutdown while safe browsing initializes → unclean shutdown while safe browsing initializes (makes zombies)
Darin/Tony:  Patch looks simple enough, but without any baking on Trunk (where this is not needed), I am hesitant to get this into 1.8.1.1 this late in the game.  How risky is this patch and how thoroughly has this been tested?  Please advise.  

If this can wait until 1.8.1.2, we should get this in after 1.8.1.1 (2.0.0.1) is out. 
(In reply to comment #14)
> Darin/Tony:  Patch looks simple enough, but without any baking on Trunk (where
> this is not needed), I am hesitant to get this into 1.8.1.1 this late in the
> game.  How risky is this patch and how thoroughly has this been tested?  Please
> advise.  

gKeepRunning is only reference at one other point in the code during shutdown.  So it seem that worst case scenario, it'll do something bad during shutdown, but certainly no worse than a hang.  I think it's pretty low risk.  I've tested with the example in comment #9.

This patch has been waiting for approval for about 3 weeks.  What should I have done to get it on branch sooner for more testing?
Comment on attachment 243026 [details] [diff] [review]
don't wait for thread after it's shutdown

a=mconnor on behalf of drivers for 1.8.1.1 checkin
Attachment #243026 - Flags: approval1.8.1.1? → approval1.8.1.1+
on branch
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Keywords: fixed1.8.1fixed1.8.1.1
I can reproduce this bug with
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.1pre) Gecko/20061124 BonEcho/2.0.0.1pre ID:2006112405 (before patch landed)

I cannot now reproduce this bug with
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.1pre) Gecko/20061124 BonEcho/2.0.0.1pre ID:2006112411 (post-patch landing)

So looks good to me!
Unable to reproduce:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.1pre) Gecko/20061130 BonEcho/2.0.0.1pre
Status: RESOLVED → VERIFIED
Keywords: fixed1.8.1.1
*** Bug 360534 has been marked as a duplicate of this bug. ***
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: