Intermittent Assertion failure: OnSocketThread() (not on socket thread), at z:/build/build/src/netwerk/base/nsSocketTransport2.cpp:278

RESOLVED FIXED in Firefox 61

Status

()

defect
P3
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: intermittent-bug-filer, Assigned: valentin)

Tracking

({assertion, intermittent-failure})

unspecified
mozilla62
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox60 unaffected, firefox61 fixed, firefox62 fixed)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 attachment)

The assertion stack is not that pretty helpful but for postery here the details:

07:29:20     INFO -  #01: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xccb787]
07:29:20     INFO -  #02: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xcd6583]
07:29:20     INFO -  #03: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbe5bdb]
07:29:20     INFO -  #04: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbf3b5b]
07:29:20     INFO -  #05: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbebe5c]
07:29:20     INFO -  #06: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbf5a67]
07:29:20     INFO -  #07: DumpFrameArray[Z:\task_1522824245\build\application\firefox\xul.dll +0x4ada77a]
07:29:20     INFO -  #08: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xcd6e72]
07:29:20     INFO -  #09: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbe5bdb]
07:29:20     INFO -  #10: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbf3b5b]
07:29:20     INFO -  #11: soundtouch::SoundTouch::operator=[Z:\task_1522824245\build\application\firefox\xul.dll +0x1157a2e]
07:29:20     INFO -  #12: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0x111bf3e]
07:29:20     INFO -  #13: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0x111bb12]
07:29:20     INFO -  #14: ???[Z:\task_1522824245\build\application\firefox\xul.dll +0xbed21e]
07:29:20     INFO -  #15: PR_NativeCreateThread[Z:\task_1522824245\build\application\firefox\nss3.dll +0xc6b1e]
07:29:20     INFO -  #16: PR_MD_WAIT_CV[Z:\task_1522824245\build\application\firefox\nss3.dll +0xb827e]
07:29:20     INFO -  #17: o__strtoui64[C:\Windows\System32\ucrtbase.dll +0x20369]
07:29:20     INFO -  #18: BaseThreadInitThunk[C:\Windows\System32\KERNEL32.DLL +0x12774]
07:29:20     INFO -  #19: RtlUserThreadStart[C:\Windows\SYSTEM32\ntdll.dll +0x70d61]
07:2
I extracted the trace from the logs:

Thread 9 (crashed)
mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) [nsSocketTransport2.cpp:278]
mozilla::net::nsSocketTransport::OnSocketEvent(unsigned int,nsresult,nsISupports *) [nsSocketTransport2.cpp:2070]
mozilla::net::nsSocketEvent::Run() [nsSocketTransport2.cpp:90]
nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:1096]
NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:519]
nsThread::Shutdown() [nsThread.cpp:814]
nsThreadPool::Shutdown() [nsThreadPool.cpp:344]
mozilla::psm::StopSSLServerCertVerificationThreads() [SSLServerCertVerification.cpp:196]
mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:1072]
nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:1096]
NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:519]
mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [MessagePump.cpp:364]
MessageLoop::RunHandler() [message_loop.cc:319]
MessageLoop::Run() [message_loop.cc:299]
nsThread::ThreadFunc(void *) [nsThread.cpp:425]
nss3.dll!PR_NativeRunThread [pruthr.c:397]
nss3.dll!static unsigned int pr_root(void *) [w95thred.c:137]
ucrtbase.dll!RtlGetFullPathName_Ustr

From what I can tell, when shutting down, in nsSocketTransportService::Run we first do
gSocketThread = nullptr;
and then we call psm::StopSSLServerCertVerificationThreads()
This causes us to hit the assertion when shutting down the thread, as the current thread no longer equals gSocketTread.

Not sure why this crash is only popping up now. That code hasn't changed for about 5 years.
I'll submit a quick fix to Honza for review.
Assignee: nobody → valentin.gosu
Priority: P5 → P3
Whiteboard: [necko-triaged]

Comment 4

Last year
mozreview-review
Comment on attachment 8964833 [details]
Bug 1451262 - Call NS_ProcessPendingEvents after StopSSLServerCertVerificationThreads

https://reviewboard.mozilla.org/r/233584/#review239634
Attachment #8964833 - Flags: review?(honzab.moz) → review+

Comment 5

Last year
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1093016046ef
Clear gSocketThread after finishing all operations so we don't trigger thread assertions r=mayhemer

Comment 6

Last year
bugherder
https://hg.mozilla.org/mozilla-central/rev/1093016046ef
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
This is still happening.
Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=172598100&repo=autoland&lineNumber=62564
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla61 → ---
Comment hidden (Intermittent Failures Robot)
Please also note the line before the assertion:

> 23:34:01     INFO -  [Parent 5316, Socket Thread] WARNING: cannot post event if not initialized: file z:/build/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 301
> 23:34:01     INFO -  Assertion failure: OnSocketThread() (not on socket thread), at z:/build/build/src/netwerk/base/nsSocketTransport2.cpp:278
2
(In reply to Henrik Skupin (:whimboo) from comment #9)
> Please also note the line before the assertion:
> 
> > 23:34:01     INFO -  [Parent 5316, Socket Thread] WARNING: cannot post event if not initialized: file z:/build/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 301
> > 23:34:01     INFO -  Assertion failure: OnSocketThread() (not on socket thread), at z:/build/build/src/netwerk/base/nsSocketTransport2.cpp:278
> 2

we might turn the warning to an assertion temporarily to get a stack or maybe MOZ_ASSERT in nsHttpConnectionMgr.cpp all the time at that place?
Flags: needinfo?(valentin.gosu)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 16

Last year
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b859a16d7450
Call NS_ProcessPendingEvents after StopSSLServerCertVerificationThreads r=mayhemer
Flags: needinfo?(valentin.gosu)

Comment 17

Last year
bugherder
https://hg.mozilla.org/mozilla-central/rev/b859a16d7450
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Please request Beta approval on this when you get a chance.
Flags: needinfo?(valentin.gosu)
Keywords: assertion
Comment on attachment 8964833 [details]
Bug 1451262 - Call NS_ProcessPendingEvents after StopSSLServerCertVerificationThreads

Approval Request Comment
[Feature/Bug causing the regression]: Unknown.
[User impact if declined]: Debug assertions may be triggered in some conditions.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: It just landed.
[Needs manual test from QE? If yes, steps to reproduce]: No. 
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It just reorders some events in order to avoid assertions. No actual functionality should be changed.
[String changes made/needed]: None.
Flags: needinfo?(valentin.gosu)
Attachment #8964833 - Flags: approval-mozilla-beta?
Comment on attachment 8964833 [details]
Bug 1451262 - Call NS_ProcessPendingEvents after StopSSLServerCertVerificationThreads

Minor code reordering to avoid an assertion from triggering intermittently in CI. Approved for 61.0b5.
Attachment #8964833 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.