Closed Bug 846137 Opened 7 years ago Closed 7 years ago

intermittent test_secureAll.html, test_unsecureIframeRedirect.html, test_stricttransportsecurity.html, test_bug215405.xul | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread

Categories

(Core :: Networking, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox22 --- fixed
firefox23 --- fixed

People

(Reporter: dbaron, Assigned: jesup)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 1 obsolete file)

I believe this is a low-enough frequency intermittent assertion that it should be filed and starred rather than annotated as known to assert.

https://tbpl.mozilla.org/php/getParsedLog.php?id=20166809&tree=Mozilla-Inbound
Rev4 MacOSX Lion 10.7 mozilla-inbound debug test mochitest-5 on 2013-02-27 18:46:17 PST for push 2186eacb635c
slave: talos-r4-lion-084

18:57:42 INFO - 163435 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_unsecureIframeRedirect.html | Assertion count 1 is greater than expected range 0-0 assertions.

18:57:40     INFO -  ^G###!!! ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread', file ../../../../netwerk/base/src/NetworkActivityMonitor.cpp, line 273
18:57:40     INFO -  ssl_DefSend [security/nss/lib/ssl/ssldef.c:95]
18:57:40     INFO -  ssl3_SendRecord [security/nss/lib/ssl/ssl3con.c:2556]
18:57:40     INFO -  SSL3_SendAlert [security/nss/lib/ssl/ssl3con.c:2847]
18:57:40     INFO -  ssl_SecureClose [security/nss/lib/ssl/sslsecur.c:1061]
18:57:40     INFO -  ssl_Close [security/nss/lib/ssl/sslsock.c:2050]
18:57:40     INFO -  nsSSLIOLayerClose [security/manager/ssl/src/nsNSSIOLayer.cpp:721]
18:57:40     INFO -  nsSocketTransport::GetSelfAddr(mozilla::net::NetAddr*) [netwerk/base/src/nsSocketTransport2.cpp:1432]
18:57:40     INFO -  mozilla::net::nsHttpChannel::OnTransportStatus(nsITransport*, tag_nsresult, unsigned long long, unsigned long long) [obj-firefox/dist/include/nsCOMPtr.h:783]
18:57:40     INFO -  _ZThn776_N7mozilla3net13nsHttpChannel17OnTransportStatusEP12nsITransport12tag_nsresultyy [netwerk/protocol/http/nsHttpChannel.cpp:5192]
18:57:40     INFO -  nsTransportStatusEvent::Run() [netwerk/base/src/nsTransportUtils.cpp:78]
18:57:40     INFO -  nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:627]
18:57:40     INFO -  NS_ProcessPendingEvents_P(nsIThread*, unsigned int) [obj-firefox/xpcom/build/nsThreadUtils.cpp:188]
18:57:40     INFO -  nsBaseAppShell::NativeEventCallback() [widget/xpwidgets/nsBaseAppShell.cpp:98]
18:57:40     INFO -  nsAppShell::ProcessGeckoEvents(void*) [widget/cocoa/nsAppShell.mm:388]
18:57:40     INFO -  CoreFoundation + 0x12b51
18:57:40     INFO -  CoreFoundation + 0x123bd
18:57:40     INFO -  CoreFoundation + 0x391a9
18:57:40     INFO -  CoreFoundation + 0x38ae6
18:57:40     INFO -  HIToolbox + 0x23d3
18:57:40     INFO -  HIToolbox + 0x963d
18:57:40     INFO -  HIToolbox + 0x94ca
18:57:40     INFO -  AppKit + 0x93f1
18:57:40     INFO -  -AppKit + 0x8cf5
18:57:40     INFO -  -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] [widget/cocoa/nsAppShell.mm:164]
18:57:40     INFO -  -AppKit + 0x562d
18:57:40     INFO -  nsAppShell::Run() [widget/cocoa/nsAppShell.mm:741]
18:57:40     INFO -  nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:288]
18:57:40     INFO -  XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:3885]
18:57:40     INFO -  XREMain::XRE_main(int, char**, nsXREAppData const*) [toolkit/xre/nsAppRunner.cpp:3952]
18:57:40     INFO -  XRE_main [toolkit/xre/nsAppRunner.cpp:4155]
18:57:40     INFO -  main [browser/app/nsBrowserApp.cpp:224]
https://tbpl.mozilla.org/php/getParsedLog.php?id=20173190&tree=Mozilla-Inbound


Rev3 Fedora 12x64 mozilla-inbound debug test mochitest-5 on 2013-02-27 22:24:16 PST for push 49b4afb3ab2f
slave: talos-r3-fed64-051

22:32:15     INFO -  163534 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_secureAll.html | Assertion count 1 is greater than expected range 0-0 assertions.


22:32:13     INFO -  ^G###!!! ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread', file ../../../../netwerk/base/src/NetworkActivityMonitor.cpp, line 273
22:32:13     INFO -  nsNetMon_Send [netwerk/base/src/NetworkActivityMonitor.cpp:89]
22:32:14     INFO -  ssl_DefSend [security/nss/lib/ssl/ssldef.c:95]
22:32:14     INFO -  ssl3_SendRecord [security/nss/lib/ssl/ssl3con.c:2556]
22:32:14     INFO -  SSL3_SendAlert [security/nss/lib/ssl/ssl3con.c:2847]
22:32:14     INFO -  ssl_SecureClose [security/nss/lib/ssl/sslsecur.c:1063]
22:32:14     INFO -  ssl_Close [security/nss/lib/ssl/sslsock.c:2050]
22:32:14     INFO -  nsNSSSocketInfo::CloseSocketAndDestroy(nsNSSShutDownPreventionLock const&) [security/manager/ssl/src/nsNSSIOLayer.cpp:729]
22:32:14     INFO -  nsSSLIOLayerClose [security/manager/ssl/src/nsNSSIOLayer.cpp:702]
22:32:14     INFO -  nsSocketTransport::ReleaseFD_Locked(PRFileDesc*) [netwerk/base/src/nsSocketTransport2.cpp:1432]
22:32:14     INFO -  nsSocketTransport::GetSelfAddr(mozilla::net::NetAddr*) [netwerk/base/src/nsSocketTransport2.cpp:1957]
22:32:14     INFO -  mozilla::net::nsHttpChannel::OnTransportStatus(nsITransport*, tag_nsresult, unsigned long, unsigned long) [netwerk/protocol/http/nsHttpChannel.cpp:5166]
22:32:14     INFO -  nsTransportStatusEvent::Run() [netwerk/base/src/nsTransportUtils.cpp:81]
22:32:14     INFO -  nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:627]
22:32:14     INFO -  NS_ProcessNextEvent_P(nsIThread*, bool) [obj-firefox/xpcom/build/nsThreadUtils.cpp:238]
22:32:14     INFO -  mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:83]
22:32:14     INFO -  MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:217]
22:32:14     INFO -  MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:503]
22:32:14     INFO -  nsBaseAppShell::Run() [widget/xpwidgets/nsBaseAppShell.cpp:165]
22:32:14     INFO -  nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:289]
22:32:14     INFO -  XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:3886]
22:32:14     INFO -  XREMain::XRE_main(int, char**, nsXREAppData const*) [toolkit/xre/nsAppRunner.cpp:3952]
22:32:14     INFO -  XRE_main [toolkit/xre/nsAppRunner.cpp:4156]
22:32:14     INFO -  do_main [browser/app/nsBrowserApp.cpp:216]
22:32:14     INFO -  main [browser/app/nsBrowserApp.cpp:524]
Summary: intermittent test_unsecureIframeRedirect.html | Assertion count 1 is greater than expected range 0-0 assertions. → intermittent test_secureAll.html, test_unsecureIframeRedirect.html | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread
https://tbpl.mozilla.org/php/getParsedLog.php?id=20197164&tree=Mozilla-Inbound

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound debug test mochitest-other on 2013-02-28 14:14:41 PST for push 57fc2a37629f
slave: talos-mtnlion-r5-040

14:16:45     INFO -  945 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/content/base/test/chrome/test_csp_bug768029.html | Assertion count 1 is greater than expected range 0-0 assertions.

14:16:29     INFO -  ^G###!!! ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread', file ../../../../netwerk/base/src/NetworkActivityMonitor.cpp, line 273
14:16:29     INFO -  ssl_DefSend [security/nss/lib/ssl/ssldef.c:95]
14:16:29     INFO -  ssl3_SendRecord [security/nss/lib/ssl/ssl3con.c:2556]
14:16:29     INFO -  SSL3_SendAlert [security/nss/lib/ssl/ssl3con.c:2847]
14:16:29     INFO -  ssl_SecureClose [security/nss/lib/ssl/sslsecur.c:1061]
14:16:29     INFO -  ssl_Close [security/nss/lib/ssl/sslsock.c:2050]
14:16:45     INFO -  nsSSLIOLayerClose [security/manager/ssl/src/nsNSSIOLayer.cpp:721]
14:16:45     INFO -  nsSocketTransport::GetSelfAddr(mozilla::net::NetAddr*) [netwerk/base/src/nsSocketTransport2.cpp:1432]
14:16:45     INFO -  mozilla::net::nsHttpChannel::OnTransportStatus(nsITransport*, tag_nsresult, unsigned long long, unsigned long long) [obj-firefox/dist/include/nsCOMPtr.h:783]
14:16:45     INFO -  _ZThn776_N7mozilla3net13nsHttpChannel17OnTransportStatusEP12nsITransport12tag_nsresultyy [netwerk/protocol/http/nsHttpChannel.cpp:5192]
14:16:45     INFO -  nsTransportStatusEvent::Run() [netwerk/base/src/nsTransportUtils.cpp:78]
14:16:45     INFO -  nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:627]
14:16:45     INFO -  NS_ProcessPendingEvents_P(nsIThread*, unsigned int) [obj-firefox/xpcom/build/nsThreadUtils.cpp:188]
14:16:45     INFO -  nsBaseAppShell::NativeEventCallback() [widget/xpwidgets/nsBaseAppShell.cpp:98]
14:16:45     INFO -  nsAppShell::ProcessGeckoEvents(void*) [widget/cocoa/nsAppShell.mm:388]
14:16:45     INFO -  CoreFoundation + 0x12841
14:16:45     INFO -  CoreFoundation + 0x12165
14:16:45     INFO -  CoreFoundation + 0x354e5
14:16:45     INFO -  CoreFoundation + 0x34dd2
14:16:45     INFO -  HIToolbox + 0x5f774
14:16:45     INFO -  HIToolbox + 0x5f512
14:16:45     INFO -  HIToolbox + 0x5f3a3
14:16:45     INFO -  AppKit + 0x156fa3
14:16:45     INFO -  -AppKit + 0x156862
14:16:45     INFO -  -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] [widget/cocoa/nsAppShell.mm:164]
14:16:45     INFO -  -AppKit + 0x14dc03
14:16:45     INFO -  nsAppShell::Run() [widget/cocoa/nsAppShell.mm:741]
14:16:45     INFO -  nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:288]
14:16:45     INFO -  XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:3885]
14:16:45     INFO -  XREMain::XRE_main(int, char**, nsXREAppData const*) [toolkit/xre/nsAppRunner.cpp:3952]
14:16:45     INFO -  XRE_main [toolkit/xre/nsAppRunner.cpp:4155]
14:16:45     INFO -  main [browser/app/nsBrowserApp.cpp:224]
https://tbpl.mozilla.org/php/getParsedLog.php?id=20188640&tree=Mozilla-Inbound
Summary: intermittent test_secureAll.html, test_unsecureIframeRedirect.html | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread → intermittent test_secureAll.html, test_unsecureIframeRedirect.html, test_stricttransportsecurity.html | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread
https://tbpl.mozilla.org/php/getParsedLog.php?id=20244500&tree=Mozilla-Inbound
Summary: intermittent test_secureAll.html, test_unsecureIframeRedirect.html, test_stricttransportsecurity.html | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread → intermittent test_secureAll.html, test_unsecureIframeRedirect.html, test_stricttransportsecurity.html, test_bug215405.xul | Assertion count 1 is greater than expected range 0-0 assertions. ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread
Blocks: 851962
Blocks: 854519
I landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db744bd0ec5b
adding a comment reflecting that one of the more common places this bug occurs is actually annotated.
Assignee: dbaron → nobody
Assignee: nobody → michal.novotny
Some investigation:

Initial read is someone is playing fast-and-loose with thread requirements for NSS sockets by having Transport Events go to the consumer on MainThread, which will seem to work "usually" until NSS has to close a socket and still has data to send.  NSS *has* to run on the STS thread, and this violates that contract.

The actual consumer is hidden behind a couple of layers of proxys/runnables, they need to ensure that the target is the STS thread.  Or, if you prefer, make it impossible for OnTransportStatus to do any actions that require being on the STS thread (like closing a FD).

Adding 
    NS_ASSERTION(PR_GetCurrentThread() == gSocketThread, "wrong thread");
to ReleaseFD_Locked() will help ferret out anyone causing PR_Close off STS and may help find most or all of the MainThread watchers of Transport status.

Right now this looks like the HTTP transport is the culprit.  Bad things can happen when NSS is run off-main-thread.
Severity: normal → critical
So, is such an assertion in ReleaseFD_Locked() correct?  If so, we've got problems; loading a simple page generated like 10 assertions and a bunch more off a timer later.
Per discussion with Patrick in #necko:

This is a race between Socket shutdown and other things, and the Runnable has the last pseudo-reference to the socket, so when it's done with GetSelfAddr(), it calls PR_Close(): but it's not on STS thread, which is a no-no.  Patrick's suggestion is to set mFD to nullptr, but dispatch a runnable to STS to actually do the PR_Close().

I'll take a quick crack at a patch
Proposed fix to thunk PR_Close() to STS thread
Attachment #735409 - Flags: review?(mcmanus)
Component: Security: PSM → Networking
Attachment #735409 - Flags: review?(mcmanus) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2d34d74ca35

I think the affects all targets...
Target Milestone: --- → mozilla23
move nested class out of function to avoid visibility warning from NS_IMETHOD
Attachment #735409 - Attachment is obsolete: true
Comment on attachment 735510 [details] [diff] [review]
Don't call PR_Close() off STS thread

carry forward r=mcmanus
review on bustage to smaug
Attachment #735510 - Flags: review?(bugs)
Comment on attachment 735510 [details] [diff] [review]
Don't call PR_Close() off STS thread


>+class thunkPRClose : public nsRunnable
Class names tend to start with capital letter.

>+{
>+public:
>+  thunkPRClose(PRFileDesc *fd) : mFD(fd) {}
>+
>+  NS_IMETHOD Run() {
{ should apparently be on the next line even in necko. Can't comment other coding style :)
Attachment #735510 - Flags: review?(bugs) → review+
I mean, if there isn't ns* or some other prefix, class names tend to start with capital letter.
https://hg.mozilla.org/mozilla-central/rev/2e924c300aae
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee: michal.novotny → rjesup
Comment on attachment 735510 [details] [diff] [review]
Don't call PR_Close() off STS thread

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A -- though something might have made this more likely to be hit

User impact if declined: Random oranges and shutdown-time crashes.  Possible security impacts

Testing completed (on m-c, etc.): on m-c for some time; oranges are gone.

Risk to taking this patch (and alternatives if risky): low risk; few things would depend on a network socket actually closing synchronously when released on the MainThread (which is an odd edge-case to begin with)

String or IDL/UUID changes made by this patch: none
Attachment #735510 - Flags: approval-mozilla-aurora?
(In reply to Randell Jesup [:jesup] from comment #35)
> Comment on attachment 735510 [details] [diff] [review]
> Don't call PR_Close() off STS thread
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): N/A -- though something might have
> made this more likely to be hit
> 
> User impact if declined: Random oranges and shutdown-time crashes.  Possible
> security impacts
> 
> Testing completed (on m-c, etc.): on m-c for some time; oranges are gone.
> 
> Risk to taking this patch (and alternatives if risky): low risk; few things
> would depend on a network socket actually closing synchronously when
> released on the MainThread (which is an odd edge-case to begin with)

Based on the risk here, this looks ok to be uplifted on aurora given it fixes the intermittent failure. But before than, can you please further explain on how (un)common is a user scenario as mentioned above. We have often seen enterprise users hitting edge cases like these where a regular FX user may not hit day-day . If there is any additional testing needed or not?

> 
> String or IDL/UUID changes made by this patch: none
It's very hard for something to depend on when the actual socket closes, since the code that causes this case (release on MainThread) is supposed to happen, and in particular, this must be happening in a runnable dispatched with a "reference" (not an XPCOM reference, but equivalent) from STS thread to MainThread, and before the Main Thread can process the runnable, the STS thread manages to release it's own "reference" to the socket (perhaps only possible at shutdown).

I can't really think of any way something could depend on this.  It may change some timings slightly, but releasing the socket on main-thread could cause random crashes and other undefined behavior, as NSS doesn't want to be called off-STS-thread.  This makes me concerned from a security aspect, though I have no proof anything can come of it.

If you want a second opinion, I'd suggest Patrick McManus.

The sheriffs want it to reduce random oranges.
Comment on attachment 735510 [details] [diff] [review]
Don't call PR_Close() off STS thread

Thanks for the explanation Randell.Based on your comment's this looks good to be uplifted on aurora.
Attachment #735510 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.