nsIWifiMonitor causes deadlocks in OS X 10.6.x

RESOLVED FIXED

Status

()

Core
Widget: Cocoa
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: electronic Max, Assigned: Josh Aas)

Tracking

unspecified
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(blocking2.0 beta1+, blocking1.9.2 needed, status1.9.2 .2-fixed, status1.9.1 .9-fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

2.55 KB, patch
smichaud
: review+
jduell
: review+
Details | Diff | Splinter Review
4.44 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

7 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6
Build Identifier: Firefox/3.5.5 and Firefox/3.6

This has been a mysterious problem which has been plaguing my plug-in on Macbooks and Macbook Pros ever since the release of Snow Leopard OS X 10.6.0.

I use nsIWifiMonitor in my extension to allow the plug-in to identify when the user returns home. 

After an extended period of use, if and only if I am using WiFi with my extension enabled, the machine enters a state where any application attempting to open a new network socket deadlocks (beachballs) to a halt.  Force-Quit does not even work in this state (and I have to reboot by holding the power button).  This ONLY happens when Firefox is running and when my plug-in is loaded with Wifi scanning turned on.

The problem feels like a race condition, and thus is not directly (deterministically) reproducible using a sequence of steps.  

BUT, it occurs reliably, usually 3-5 times during a 12 hour period. I have isolated the fault to Firefox and the wifi monitor by carefully trying completely separate machines (including a brand new Macbook Pro without any preinstalled software from the store).

It seems "wrong" that a bug in Firefox could cause the whole machine to freeze.  Thus I think it's an interaction between Firefox and a bug in snow leopard that is violating process isolation.  It "feels" like Firefox (or something) is not successfully leaving a critical section of code pertaining to the network (sys call?) which, when other apps try to call, they end up queuing up resulting in a deadlock

To reproduce:
  Get a Macbook or Macbook Pro running 10.6.0 or newer
  Set up an extension with an nsIWifiMonitor scanning.  
  Use actively for a few hours.  

Please advise.

Reproducible: Sometimes

Steps to Reproduce:
I have detailed the steps t
Sounds like something is leaking processes or sockets or some other such kernel resource...

The relevant 10.6 code seems to be http://mxr.mozilla.org/mozilla-central/source/netwerk/wifi/src/osx_corewlan.mm#60 for what it's worth.  My objc is not good enough to see if there's an obvious issue there.
Status: UNCONFIRMED → NEW
Component: General → Widget: Cocoa
Ever confirmed: true
QA Contact: general → cocoa
(Assignee)

Comment 2

7 years ago
That code leaks just about everything and apparently both I reviewed it at one point. I can only assume I totally forgot to look at the file, ew.
Assignee: nobody → joshmoz
Yes, it creates an autorelease pool without releasing it :-)
(Assignee)

Comment 4

7 years ago
Created attachment 429156 [details] [diff] [review]
fix v1.0
(Assignee)

Comment 5

7 years ago
We don't release the autorelease pool or the bundle.
(Assignee)

Updated

7 years ago
Attachment #429156 - Flags: review?(smichaud)
(Assignee)

Comment 6

7 years ago
I suggest we block on 1.9.1, 1.9.2, and 1.9.3. This is bad.
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: --- → beta1
Attachment #429156 - Flags: review?(smichaud) → review+
(Assignee)

Comment 7

7 years ago
Created attachment 429171 [details] [diff] [review]
fix v1.1

This is a more paranoid patch which should release the pool even if the main corewlan code throws an exceptions.
Attachment #429171 - Flags: review?(smichaud)
Comment on attachment 429171 [details] [diff] [review]
fix v1.1

Yes, this is better.
Attachment #429171 - Flags: review?(smichaud) → review+
(Assignee)

Updated

7 years ago
Attachment #429156 - Attachment is obsolete: true
(Assignee)

Comment 9

7 years ago
pushed to mozilla-central

http://hg.mozilla.org/mozilla-central/rev/1e30b2e41326
(Assignee)

Comment 10

7 years ago
Reporter - thanks for the great bug report. I'm really glad we caught this.

Can you confirm that my patch here fixes the problem? The fix will be in tomorrow's trunk (Minefield) nightly build. Leaving this bug open until the reporter confirms the fix.
(Reporter)

Comment 11

7 years ago
Wow, thanks for the speedy response and fix.  I will test it tomorrow (Saturday my time) and get back to you all.  Testing will probably take 24-48 hours since it normally takes a couple hours to reproduce.

Anyway thanks everyone and will be shortly in touch ~
(Assignee)

Updated

7 years ago
Attachment #429171 - Flags: review?(jduell.mcbugs)
Not going to "block" on it, but do want the patch once you've proved it fixes the problem. Ask for branch approval when you're ready.
blocking1.9.1: ? → ---
blocking1.9.2: ? → ---
status1.9.1: --- → wanted
status1.9.2: --- → wanted
Comment on attachment 429171 [details] [diff] [review]
fix v1.1

Necko review:  there's no network logic changed by this patch, just obj memory management, so +r
Attachment #429171 - Flags: review?(jduell.mcbugs) → review+
(Reporter)

Comment 14

7 years ago
I need a bit of clarification on which build(s) have this patch (sorry! I'm not an expert on your nightly-build process).

Minefield (3.7a2) for OS X seems to work without crashing.  So if this is the version that has the patches described in this thread, then it looks like it fixed it! 

Nightly releases of 3.62 (Namoroka for OS X) candidates seem to still exhibit the crashing bug (the last one I downloaded was from 4am March 2) and it's locked up my machine twice.  Does this tree not have the fix yet?

Please lmk.
The 3.7a2 builds are the ones that have this patch, yes.  Resolving fixed based on comment 14.  Thanks for testing those builds!

The 3.6.x builds don't have this fix yet; the fix needs to be approved for that branch first.  Josh, do you want to ask for approvals?
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Updated

7 years ago
Attachment #429171 - Flags: approval1.9.2.2?
(Assignee)

Comment 16

7 years ago
Reporter - can you post a stack trace or a crash report ID for a crash from this bug? I'd like to see what it looks like so I can see if it shows up in our crash reports database.
(Reporter)

Comment 17

7 years ago
Hi Josh,

It is rare that it results in a Firefox crash; it usually results in a "beach-ball" (colourwheel) hang of the whole system requiring me to hard-restart (it does not even respond to force-quits!)

Do you know how I might get a stack trace in that condition? Is there a kernel-hotkey for example? (Would it even be relevant if I could get one?)

Thanks
(Assignee)

Comment 18

7 years ago
Yikes. Don't worry about the stack, thanks.
blocking1.9.2: --- → ?
Attachment #429171 - Flags: approval1.9.2.2? → approval1.9.2.2+
a=beltzner
blocking1.9.2: ? → needed
(In reply to comment #6)
> I suggest we block on 1.9.1, 1.9.2, and 1.9.3. This is bad.

Did you mean to request approval1.9.1.9? on attachment 429171 [details] [diff] [review] as well, do you need a different 1.9.1 patch, or did this turn out to be less bad than you thought on that branch?

Roughly 3x more 3.5.x users than 3.6.x at the moment (of course that will change once we start serving the upgrade prompt, but 1.9.1 will still have lot of users). If "this is bad" don't we want it fixed?
(Assignee)

Comment 21

7 years ago
pushed to mozilla-1.9.2

http://hg.mozilla.org/releases/mozilla-1.9.2
status1.9.2: wanted → .2-fixed
(Assignee)

Comment 22

7 years ago
I do want this fixed in 1.9.1 but we need a new patch there.
(Assignee)

Comment 23

7 years ago
Created attachment 431291 [details] [diff] [review]
1.9.1 branch fix

Fix for 1.9.1 branch. Synced to mozilla-central version.
Attachment #431291 - Flags: approval1.9.1.9?
Comment on attachment 431291 [details] [diff] [review]
1.9.1 branch fix

a1.9.1.9=beltzner, please land immediately
Attachment #431291 - Flags: approval1.9.1.9? → approval1.9.1.9+
(Assignee)

Comment 25

7 years ago
pushed to mozilla-1.9.1:

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bccec907649a

and here is the correct link to the 1.9.2 commit:

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/d4bc405a33b9
status1.9.1: wanted → .9-fixed
(In reply to comment #14)
> I need a bit of clarification on which build(s) have this patch (sorry! I'm not
> an expert on your nightly-build process).
> 
> Minefield (3.7a2) for OS X seems to work without crashing.  So if this is the
> version that has the patches described in this thread, then it looks like it
> fixed it! 
> 
> Nightly releases of 3.62 (Namoroka for OS X) candidates seem to still exhibit
> the crashing bug (the last one I downloaded was from 4am March 2) and it's
> locked up my machine twice.  Does this tree not have the fix yet?
> 
> Please lmk.

Can you try reproducing the problem with our release candidates for Firefox 3.5.9 and 3.6.2? 

You can get the 3.6.2 beta at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.6.2-candidates/build3/mac/en-US/.

You can get the 3.5.9 beta at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.9-candidates/build1/mac/en-US/.
(Reporter)

Comment 27

7 years ago
Ok! I will test it using the above betas myself and will also get my Mac-equipped students at MIT to run the betas on their machines.  Since reproducing the bug takes time, I will get back to you in 48 or so hours.
You need to log in before you can comment on or make changes to this bug.