Closed Bug 1396307 Opened 7 years ago Closed 7 years ago

Crash in RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock

Categories

(Core :: Networking: HTTP, defect, P1)

56 Branch
x86_64
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: philipp, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(4 files, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-b5956322-c77f-403e-9f74-6cb3a0170831. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 ntdll.dll RtlEnterCriticalSection 1 xul.dll mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock(mozilla::RecursiveMutex&) obj-firefox/dist/include/mozilla/RecursiveMutex.h:77 2 xul.dll nsInputStreamPump::Suspend() netwerk/base/nsInputStreamPump.cpp:233 3 xul.dll mozilla::net::nsHttpChannel::ContinueProcessResponse2(nsresult) netwerk/protocol/http/nsHttpChannel.cpp:2546 4 xul.dll mozilla::net::nsHttpChannel::ContinueProcessResponse1() netwerk/protocol/http/nsHttpChannel.cpp:2401 5 xul.dll mozilla::net::nsHttpChannel::AsyncContinueProcessResponse() netwerk/protocol/http/nsHttpChannel.cpp:2315 6 xul.dll mozilla::detail::RunnableMethodImpl<mozilla::net::nsHttpChannel*, void ( mozilla::net::nsHttpChannel::*)(void), 1, 0>::Run() obj-firefox/dist/include/nsThreadUtils.h:1172 7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446 8 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:97 9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 11 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 12 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:278 13 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:287 14 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:4608 15 xul.dll XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4772 16 xul.dll XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4867 17 firefox.exe NS_internal_main(int, char**, char**) browser/app/nsBrowserApp.cpp:309 18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 19 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 20 kernel32.dll BaseThreadInitThunk 21 ntdll.dll RtlUserThreadStart crash reports with this signature are newly showing up in firefox 56 on 64bit versions of the browser on windows. the comment to the referenced crash report says "Systematically crashes when logging on to .htaccess password protected websites. Same problem on separate Win 10 PC's. Disabling add-ons resolves issue." three quarters of the reports have the following extension present: https://addons.mozilla.org/en-US/firefox/addon/passifox/
I see mozilla::net::nsHttpChannel::ContinueProcessResponse2 in all of these stacks.
Component: XPCOM → Networking: HTTP
Gary, is it related to the fix in bug 1382178?
Flags: needinfo?(xeonchen)
If the password protected page could be served from the cache, the cause could be that the cache won and mTransactionPump was nulled out in nsHttpChannel::ReadFromCache. Valentin, could you please have a look at it?
Assignee: nobody → valentin.gosu
Whiteboard: [necko-active]
(In reply to Michal Novotny (:michal) from comment #3) > If the password protected page could be served from the cache, the cause > could be that the cache won and mTransactionPump was nulled out in > nsHttpChannel::ReadFromCache. Valentin, could you please have a look at it? Theoretically racing shouldn't be enabled for beta. Maybe we are wrongly setting mRaceCacheWithNetwork to true, even though we shouldn't.
See Also: → 1397037
Comment on attachment 8904918 [details] Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled https://reviewboard.mozilla.org/r/176700/#review181676 ::: netwerk/protocol/http/nsHttpChannel.cpp:3979 (Diff revision 1) > > if (!mCacheOpenDelay) { > MOZ_ASSERT(NS_IsMainThread(), "Should be called on the main thread"); > mCacheAsyncOpenCalled = true; > if (mNetworkTriggered) { > - mRaceCacheWithNetwork = true; > + mRaceCacheWithNetwork = sRCWNEnabled; How is this going to fix the crash once we enable racing?
Attachment #8904918 - Flags: review?(michal.novotny) → review-
(In reply to Shian-Yow Wu [:swu] (56 Regression Engineering support) from comment #2) > Gary, is it related to the fix in bug 1382178? It's unlikely related to bug 1382178, and looks like Michal and Valentin are working at this (thank you!).
Flags: needinfo?(xeonchen)
Valentin any updates here? We are going to run out of time for 56 soon.
Flags: needinfo?(valentin.gosu)
Michal, can we consider the patch for beta? I haven't been able to figure out the exact flow that triggers this bug, and uplifting the fix in the last days of beta is quite risky anyway. You are correct in saying that the patch is working around the bug of why we are setting mRaceCacheWithNetwork to true, but we will not be enabling RCWN on beta anyway.
Flags: needinfo?(valentin.gosu) → needinfo?(michal.novotny)
I'm still not sure how this patch will fix the problem. If you mean that we set mRaceCacheWithNetwork incorrectly, so it's true even if we're not actually racing, then it won't fix the problem, because we would null out the pump only if we really race. And if you mean that we really race in case we shouldn't (because it's disabled), then this patch would set mRaceCacheWithNetwork to false in case we really race and it's IMO pretty bad.
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #10) > I'm still not sure how this patch will fix the problem. Just preventing mRaceCacheWithNetwork from being set to true when racing is not enabled. We still need separate patch for fx57 to fix the root cause. > If you mean that we set mRaceCacheWithNetwork incorrectly, so it's true even > if we're not actually racing, then it won't fix the problem, because we > would null out the pump only if we really race. If we are setting mRacheCacheWithNetwork incorrectly then we would also null out the pump incorrectly in nsHttpChannel::ReadFromCache. > And if you mean that we really race in case we shouldn't (because it's > disabled), then this patch would set mRaceCacheWithNetwork to false in case > we really race and it's IMO pretty bad. I don't think that's the case. In any case, we should not be racing on beta anyway.
(In reply to Valentin Gosu [:valentin] from comment #11) > > And if you mean that we really race in case we shouldn't (because it's > > disabled), then this patch would set mRaceCacheWithNetwork to false in case > > we really race and it's IMO pretty bad. > > I don't think that's the case. In any case, we should not be racing on beta > anyway. Users can enable it manually and obviously some of them did it: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-09-11&keys=__none__!__none__!__none__&max_channel_version=beta%252F56&measure=NETWORK_RACE_CACHE_WITH_NETWORK_USAGE_2&min_channel_version=null&processType=*&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2017-08-29&table=1&trim=1&use_submission_date=0
(In reply to Michal Novotny (:michal) from comment #12) > (In reply to Valentin Gosu [:valentin] from comment #11) > > > And if you mean that we really race in case we shouldn't (because it's > > > disabled), then this patch would set mRaceCacheWithNetwork to false in case > > > we really race and it's IMO pretty bad. > > > > I don't think that's the case. In any case, we should not be racing on beta > > anyway. > > Users can enable it manually and obviously some of them did it: We've fixed a bunch of other bugs with RCWN and not uplifted them to beta. I think we should either disable it completely on beta (ignore the pref) - or let those users that flipped the pref deal with the crashes (as we do for other bugs). Still, I want to make sure mRaceCacheWithNetwork is never true when it shouldn't. At the moment it seems that is a possibility.
(In reply to Valentin Gosu [:valentin] from comment #13) > Still, I want to make sure mRaceCacheWithNetwork is never true when it > shouldn't. At the moment it seems that is a possibility. Why we don't simply add a null check as in bugs 1351340 and 1382178? AFAICS, Gary forgot to add a null check here in bug 1382178.
(In reply to Michal Novotny (:michal) from comment #14) > (In reply to Valentin Gosu [:valentin] from comment #13) > > Still, I want to make sure mRaceCacheWithNetwork is never true when it > > shouldn't. At the moment it seems that is a possibility. > > Why we don't simply add a null check as in bugs 1351340 and 1382178? AFAICS, > Gary forgot to add a null check here in bug 1382178. We can do that too. I'm worried there might be other crashes caused by setting mRaceCacheWithNetwork to true incorrectly.
So add MOZ_RELEASE_ASSERT(sRCWNEnabled) where we set mRaceCacheWithNetwork = true. Then we would know for sure there is some problem.
See Also: → 1399541
Priority: -- → P1
So, I managed to create a test case that triggers the cache. The stack trace is exactly as one on crash-stats. This crash happens when we have a cached 200 response for a page that now returns a 401. We race, network wins, but for some reason mRaceCacheWithNetwork isn't set. Then OnCacheEntryCheck is triggered, with a valid cache entry. In OnCacheEntryAvailable we call ReadFromCache, which nulls out the transaction pump. When we get OnAuthAvailable, the pump is null, so we crash.
Blocks: RCWN
Comment on attachment 8908311 [details] Bug 1396307 - Add null check in nsHttpChannel::OnAuthAvailable https://reviewboard.mozilla.org/r/179940/#review185154
Attachment #8908311 - Flags: review?(michal.novotny) → review+
Comment on attachment 8904918 [details] Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled https://reviewboard.mozilla.org/r/176700/#review185160
Attachment #8904918 - Flags: review?(michal.novotny) → review+
Comment on attachment 8908322 [details] Bug 1396307 - Set mCacheAsyncOpenCalled back to flase when AsyncOpenURI fails https://reviewboard.mozilla.org/r/179960/#review185166
Attachment #8908322 - Flags: review?(michal.novotny) → review+
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/77cd8fa838ba Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled r=michal https://hg.mozilla.org/integration/autoland/rev/d4ec49c6206d Add null check in nsHttpChannel::OnAuthAvailable r=michal https://hg.mozilla.org/integration/autoland/rev/ea0aae8835fe Set mCacheAsyncOpenCalled back to flase when AsyncOpenURI fails r=michal
Attachment #8907838 - Attachment is obsolete: true
Crash Signature: [@ RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock] → [@ RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock] [@ RtlEnterCriticalSection | nsInputStreamPump::Resume ]
Comment on attachment 8908426 [details] [diff] [review] [beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled Approval Request Comment [Feature/Bug causing the regression]: bug 1325081 [User impact if declined]: possible crashes when the race condition happens, most usually with pages that require user authentication. [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: just landed [Needs manual test from QE? If yes, steps to reproduce]: no, it is unlikely the race can be intentionally triggered. [List of other uplifts needed for the feature/fix]: attachment 8908311 [details] and attachment 8908322 [details] in this same bug apply cleanly over this one. [Is the change risky?]: no. [Why is the change risky/not risky?]: each of these patches contains a small change that ensures we don't mistakenly set a variable to true, and that we check pointers before using them. [String changes made/needed]: none
Attachment #8908426 - Flags: approval-mozilla-beta?
Comment on attachment 8908426 [details] [diff] [review] [beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled 56 is on m-r now.
Attachment #8908426 - Flags: approval-mozilla-beta? → approval-mozilla-release?
(In reply to Michal Novotny (:michal) from comment #23) > Comment on attachment 8904918 [details] > Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the > feature is enabled > > https://reviewboard.mozilla.org/r/176700/#review185160 This patch was a hotfix for releases where is RCWN disabled. Valentin, please file a follow up bug to revert the change once RCWN is enabled.
Comment on attachment 8908426 [details] [diff] [review] [beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled This crash is not high volume, but it newly appeared in 56 and may become a worse problem with release users. Seems worth the risk to take this last minute fix (for the 56 RC build)
Attachment #8908426 - Flags: approval-mozilla-release? → approval-mozilla-release+
(In reply to Valentin Gosu [:valentin] from comment #29) > [Is this code covered by automated tests?]: yes > [Has the fix been verified in Nightly?]: just landed > [Needs manual test from QE? If yes, steps to reproduce]: no, it is unlikely > the race can be intentionally triggered. Setting qe-verify- based on Valentin's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
If this is this bug: https://crash-stats.mozilla.com/report/index/1b27e2ca-5b2e-495d-84c2-fc9bf0170926#tab-bugzilla - I can reproduce this with 100% reliability on a TFS-system - thus if there is any way I can help to get this crunched, please feel free to contact me.
Blocks: 1403137
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: