Closed Bug 1311251 Opened 3 years ago Closed 3 years ago

Crash in PR_Connect | mozilla::net::nsUDPSocket::Connect

Categories

(Core :: Networking, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- fixed
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: kanru, Assigned: schien)

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-c903d537-e670-4ed9-925e-d5d042161018.
=============================================================

17 crashes on nightly since build 20161014060324
but total 84 crashes since 20160702030219

Correlation on aurora shows:

(100.0% in signature vs 12.02% overall) address = 0x0
(75.00% in signature vs 09.02% overall) Addon "Video DownloadHelper" = true
(62.50% in signature vs 08.78% overall) useragent_locale = fr

https://mozilla.github.io/stab-crashes/correlations.html?product=Firefox&channel=aurora&signature=PR_Connect%20|%20mozilla::net::nsUDPSocket::Connect
SC, when you find time, could you take a look at this?  I know you have some experience with our UDP code.

Thanks.
Flags: needinfo?(schien)
Whiteboard: [necko-next]
This seems to be issue of null pointer mFD when calling PR_Connect().
(In reply to Shian-Yow Wu [:swu] from comment #2)
> This seems to be issue of null pointer mFD when calling PR_Connect().

Thanks for the information! It's still not obvious to me why nsUDPSocket::Connect is called if previous socket binding is failed. However we should always check the validity of mFD before using it.
Flags: needinfo?(schien)
Whiteboard: [necko-next] → [necko-active]
Assignee: nobody → schien
Comment on attachment 8815154 [details]
Bug 1311251 - null check for mFD before use it.

Before I can r this patch I need to ask: on which thread can nsUDPSocket::CloseSocket() be called?  

the author didn't bother to add a thread assertion to that method, so hard to say and maybe it's intended to be called on more than a single thread.  If it can be called on the main thread, which probably can (as it's called from an IDL Close() method), then we have a bigger problem and this patch is not enough.
Note that nsUDPSocket::Close calls CloseSocket() under a lock, but it seems like it just protects the listener and not mFD.  This needs some larger cleanup.
Comment on attachment 8815154 [details]
Bug 1311251 - null check for mFD before use it.

Sure, let me check if any other assertion/error handling need to add as well.
Attachment #8815154 - Flags: review?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #5)
> Comment on attachment 8815154 [details]
> Bug 1311251 - null check for mFD before use it.
> 
> Before I can r this patch I need to ask: on which thread can
> nsUDPSocket::CloseSocket() be called?  
> 
> the author didn't bother to add a thread assertion to that method, so hard
> to say and maybe it's intended to be called on more than a single thread. 
> If it can be called on the main thread, which probably can (as it's called
> from an IDL Close() method), then we have a bigger problem and this patch is
> not enough.

The thread model is complicated because nsIUDPSocket allows to be used on either main thread or PBackground thread. So, we can separate functions in nsUDPSocket into three groups:
 1) runs on main thread or PBackground thread, i.e. IDL methods/attributes
 2) runs on STS thread, i.e. socket control / IO
 3) don't care about thread, i.e. utility functions
Any updates on this? We're running out of time to get a fix landed on 51.
Flags: needinfo?(schien)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Any updates on this? We're running out of time to get a fix landed on 51.

Didn't notice we need this bug fixed on 51 in the beginning. To provide a minimal patch, I'm thinking about just making nsUDPSocket::Close thread-safe (which addressed the concern in comment #5).
Flags: needinfo?(schien)
Comment on attachment 8822644 [details]
make sure PR_Close is on STS thread

@Honza, is making nsUDPSocket::Close thread-safe addressing your previous concern in comment #5?
Attachment #8815154 - Flags: review?(honzab.moz)
Attachment #8822644 - Flags: review?(honzab.moz)
Keywords: regression
I understand this is an urgent bug, but the fix is still not complete.  I believe we should use the same mechanism as we have in nsSocketTransport - a class that wraps FD with a refcounter and let the code accesses the mFD member only under a lock, while doing any IO on it outside the lock.  Close of the FD is then done when the ref count drops to zero.

It's more work.  But I'm afraid the two patches here (and mainly the CloseSocket one) will just shift the problem, not fix it.

Or, would just the FD non-null check in nsUDPSocket::Connect be enough to quick-fix the obvious crash here?  That might be OK to uplift (to stop this being a top-crash, at least) and solve the rest in a followup.
Flags: needinfo?(schien)
Comment on attachment 8815154 [details]
Bug 1311251 - null check for mFD before use it.

https://reviewboard.mozilla.org/r/96184/#review104262

Let's uplift this up to beta.  It should at least fix the immediate crash here.
Attachment #8815154 - Flags: review?(honzab.moz) → review+
Comment on attachment 8822644 [details]
make sure PR_Close is on STS thread

https://reviewboard.mozilla.org/r/101518/#review104264

No, let's rather do this properly and let it ride the trains.  See comment 15.
Attachment #8822644 - Flags: review?(honzab.moz)
Attachment #8822644 - Attachment is obsolete: true
Thanks for the advise. I filed Bug 1330180 as a follow-up to comment #15.
Flags: needinfo?(schien)
Pushed by schien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b29345665e1d
null check for mFD before use it. r=mayhemer
https://hg.mozilla.org/mozilla-central/rev/b29345665e1d
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Please request Aurora/Beta approval on this when you get a chance.
Flags: needinfo?(schien)
Comment on attachment 8815154 [details]
Bug 1311251 - null check for mFD before use it.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1194259
[User impact if declined]: user might hit top 1 firefox crash
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: no
[Why is the change risky/not risky?]: simply do null check to prevent crash
[String changes made/needed]: n/a
Flags: needinfo?(schien)
Attachment #8815154 - Flags: approval-mozilla-beta?
Attachment #8815154 - Flags: approval-mozilla-aurora?
Comment on attachment 8815154 [details]
Bug 1311251 - null check for mFD before use it.

Fix a crash. Beta51+ & Aurora52+. Should be in 51 beta 14.
Attachment #8815154 - Flags: approval-mozilla-beta?
Attachment #8815154 - Flags: approval-mozilla-beta+
Attachment #8815154 - Flags: approval-mozilla-aurora?
Attachment #8815154 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.