Closed Bug 548796 Opened 11 years ago Closed 11 years ago

nsIWifiMonitor causes deadlocks in OS X 10.6.x

Categories

(Core :: Widget: Cocoa, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta1+
blocking1.9.2 --- needed
status1.9.2 --- .2-fixed
status1.9.1 --- .9-fixed

People

(Reporter: electronic, Assigned: jaas)

Details

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6
Build Identifier: Firefox/3.5.5 and Firefox/3.6

This has been a mysterious problem which has been plaguing my plug-in on Macbooks and Macbook Pros ever since the release of Snow Leopard OS X 10.6.0.

I use nsIWifiMonitor in my extension to allow the plug-in to identify when the user returns home. 

After an extended period of use, if and only if I am using WiFi with my extension enabled, the machine enters a state where any application attempting to open a new network socket deadlocks (beachballs) to a halt.  Force-Quit does not even work in this state (and I have to reboot by holding the power button).  This ONLY happens when Firefox is running and when my plug-in is loaded with Wifi scanning turned on.

The problem feels like a race condition, and thus is not directly (deterministically) reproducible using a sequence of steps.  

BUT, it occurs reliably, usually 3-5 times during a 12 hour period. I have isolated the fault to Firefox and the wifi monitor by carefully trying completely separate machines (including a brand new Macbook Pro without any preinstalled software from the store).

It seems "wrong" that a bug in Firefox could cause the whole machine to freeze.  Thus I think it's an interaction between Firefox and a bug in snow leopard that is violating process isolation.  It "feels" like Firefox (or something) is not successfully leaving a critical section of code pertaining to the network (sys call?) which, when other apps try to call, they end up queuing up resulting in a deadlock

To reproduce:
  Get a Macbook or Macbook Pro running 10.6.0 or newer
  Set up an extension with an nsIWifiMonitor scanning.  
  Use actively for a few hours.  

Please advise.

Reproducible: Sometimes

Steps to Reproduce:
I have detailed the steps t
Sounds like something is leaking processes or sockets or some other such kernel resource...

The relevant 10.6 code seems to be http://mxr.mozilla.org/mozilla-central/source/netwerk/wifi/src/osx_corewlan.mm#60 for what it's worth.  My objc is not good enough to see if there's an obvious issue there.
Status: UNCONFIRMED → NEW
Component: General → Widget: Cocoa
Ever confirmed: true
QA Contact: general → cocoa
That code leaks just about everything and apparently both I reviewed it at one point. I can only assume I totally forgot to look at the file, ew.
Assignee: nobody → joshmoz
Yes, it creates an autorelease pool without releasing it :-)
Attached patch fix v1.0 (obsolete) — Splinter Review
We don't release the autorelease pool or the bundle.
Attachment #429156 - Flags: review?(smichaud)
I suggest we block on 1.9.1, 1.9.2, and 1.9.3. This is bad.
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: --- → beta1
Attachment #429156 - Flags: review?(smichaud) → review+
Attached patch fix v1.1Splinter Review
This is a more paranoid patch which should release the pool even if the main corewlan code throws an exceptions.
Attachment #429171 - Flags: review?(smichaud)
Comment on attachment 429171 [details] [diff] [review]
fix v1.1

Yes, this is better.
Attachment #429171 - Flags: review?(smichaud) → review+
Attachment #429156 - Attachment is obsolete: true
pushed to mozilla-central

http://hg.mozilla.org/mozilla-central/rev/1e30b2e41326
Reporter - thanks for the great bug report. I'm really glad we caught this.

Can you confirm that my patch here fixes the problem? The fix will be in tomorrow's trunk (Minefield) nightly build. Leaving this bug open until the reporter confirms the fix.
Wow, thanks for the speedy response and fix.  I will test it tomorrow (Saturday my time) and get back to you all.  Testing will probably take 24-48 hours since it normally takes a couple hours to reproduce.

Anyway thanks everyone and will be shortly in touch ~
Attachment #429171 - Flags: review?(jduell.mcbugs)
Not going to "block" on it, but do want the patch once you've proved it fixes the problem. Ask for branch approval when you're ready.
blocking1.9.1: ? → ---
blocking1.9.2: ? → ---
Comment on attachment 429171 [details] [diff] [review]
fix v1.1

Necko review:  there's no network logic changed by this patch, just obj memory management, so +r
Attachment #429171 - Flags: review?(jduell.mcbugs) → review+
I need a bit of clarification on which build(s) have this patch (sorry! I'm not an expert on your nightly-build process).

Minefield (3.7a2) for OS X seems to work without crashing.  So if this is the version that has the patches described in this thread, then it looks like it fixed it! 

Nightly releases of 3.62 (Namoroka for OS X) candidates seem to still exhibit the crashing bug (the last one I downloaded was from 4am March 2) and it's locked up my machine twice.  Does this tree not have the fix yet?

Please lmk.
The 3.7a2 builds are the ones that have this patch, yes.  Resolving fixed based on comment 14.  Thanks for testing those builds!

The 3.6.x builds don't have this fix yet; the fix needs to be approved for that branch first.  Josh, do you want to ask for approvals?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Attachment #429171 - Flags: approval1.9.2.2?
Reporter - can you post a stack trace or a crash report ID for a crash from this bug? I'd like to see what it looks like so I can see if it shows up in our crash reports database.
Hi Josh,

It is rare that it results in a Firefox crash; it usually results in a "beach-ball" (colourwheel) hang of the whole system requiring me to hard-restart (it does not even respond to force-quits!)

Do you know how I might get a stack trace in that condition? Is there a kernel-hotkey for example? (Would it even be relevant if I could get one?)

Thanks
Yikes. Don't worry about the stack, thanks.
blocking1.9.2: --- → ?
Attachment #429171 - Flags: approval1.9.2.2? → approval1.9.2.2+
a=beltzner
blocking1.9.2: ? → needed
(In reply to comment #6)
> I suggest we block on 1.9.1, 1.9.2, and 1.9.3. This is bad.

Did you mean to request approval1.9.1.9? on attachment 429171 [details] [diff] [review] as well, do you need a different 1.9.1 patch, or did this turn out to be less bad than you thought on that branch?

Roughly 3x more 3.5.x users than 3.6.x at the moment (of course that will change once we start serving the upgrade prompt, but 1.9.1 will still have lot of users). If "this is bad" don't we want it fixed?
pushed to mozilla-1.9.2

http://hg.mozilla.org/releases/mozilla-1.9.2
I do want this fixed in 1.9.1 but we need a new patch there.
Attached patch 1.9.1 branch fixSplinter Review
Fix for 1.9.1 branch. Synced to mozilla-central version.
Attachment #431291 - Flags: approval1.9.1.9?
Comment on attachment 431291 [details] [diff] [review]
1.9.1 branch fix

a1.9.1.9=beltzner, please land immediately
Attachment #431291 - Flags: approval1.9.1.9? → approval1.9.1.9+
pushed to mozilla-1.9.1:

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bccec907649a

and here is the correct link to the 1.9.2 commit:

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/d4bc405a33b9
(In reply to comment #14)
> I need a bit of clarification on which build(s) have this patch (sorry! I'm not
> an expert on your nightly-build process).
> 
> Minefield (3.7a2) for OS X seems to work without crashing.  So if this is the
> version that has the patches described in this thread, then it looks like it
> fixed it! 
> 
> Nightly releases of 3.62 (Namoroka for OS X) candidates seem to still exhibit
> the crashing bug (the last one I downloaded was from 4am March 2) and it's
> locked up my machine twice.  Does this tree not have the fix yet?
> 
> Please lmk.

Can you try reproducing the problem with our release candidates for Firefox 3.5.9 and 3.6.2? 

You can get the 3.6.2 beta at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.6.2-candidates/build3/mac/en-US/.

You can get the 3.5.9 beta at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.9-candidates/build1/mac/en-US/.
Ok! I will test it using the above betas myself and will also get my Mac-equipped students at MIT to run the betas on their machines.  Since reproducing the bug takes time, I will get back to you in 48 or so hours.
You need to log in before you can comment on or make changes to this bug.