Closed
Bug 1396307
Opened 7 years ago
Closed 7 years ago
Crash in RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | fixed |
firefox57 | --- | fixed |
People
(Reporter: philipp, Assigned: valentin)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [necko-active])
Crash Data
Attachments
(4 files, 1 obsolete file)
59 bytes,
text/x-review-board-request
|
michal
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
michal
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
michal
:
review+
|
Details |
2.22 KB,
patch
|
lizzard
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-b5956322-c77f-403e-9f74-6cb3a0170831.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 ntdll.dll RtlEnterCriticalSection
1 xul.dll mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock(mozilla::RecursiveMutex&) obj-firefox/dist/include/mozilla/RecursiveMutex.h:77
2 xul.dll nsInputStreamPump::Suspend() netwerk/base/nsInputStreamPump.cpp:233
3 xul.dll mozilla::net::nsHttpChannel::ContinueProcessResponse2(nsresult) netwerk/protocol/http/nsHttpChannel.cpp:2546
4 xul.dll mozilla::net::nsHttpChannel::ContinueProcessResponse1() netwerk/protocol/http/nsHttpChannel.cpp:2401
5 xul.dll mozilla::net::nsHttpChannel::AsyncContinueProcessResponse() netwerk/protocol/http/nsHttpChannel.cpp:2315
6 xul.dll mozilla::detail::RunnableMethodImpl<mozilla::net::nsHttpChannel*, void ( mozilla::net::nsHttpChannel::*)(void), 1, 0>::Run() obj-firefox/dist/include/nsThreadUtils.h:1172
7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446
8 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:97
9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319
10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299
11 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156
12 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:278
13 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:287
14 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:4608
15 xul.dll XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4772
16 xul.dll XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4867
17 firefox.exe NS_internal_main(int, char**, char**) browser/app/nsBrowserApp.cpp:309
18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115
19 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
20 kernel32.dll BaseThreadInitThunk
21 ntdll.dll RtlUserThreadStart
crash reports with this signature are newly showing up in firefox 56 on 64bit versions of the browser on windows.
the comment to the referenced crash report says "Systematically crashes when logging on to .htaccess password protected websites. Same problem on separate Win 10 PC's. Disabling add-ons resolves issue."
three quarters of the reports have the following extension present: https://addons.mozilla.org/en-US/firefox/addon/passifox/
Comment 1•7 years ago
|
||
I see mozilla::net::nsHttpChannel::ContinueProcessResponse2 in all of these stacks.
Component: XPCOM → Networking: HTTP
Comment 3•7 years ago
|
||
If the password protected page could be served from the cache, the cause could be that the cache won and mTransactionPump was nulled out in nsHttpChannel::ReadFromCache. Valentin, could you please have a look at it?
Assignee: nobody → valentin.gosu
Whiteboard: [necko-active]
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #3)
> If the password protected page could be served from the cache, the cause
> could be that the cache won and mTransactionPump was nulled out in
> nsHttpChannel::ReadFromCache. Valentin, could you please have a look at it?
Theoretically racing shouldn't be enabled for beta.
Maybe we are wrongly setting mRaceCacheWithNetwork to true, even though we shouldn't.
Comment hidden (mozreview-request) |
Comment 6•7 years ago
|
||
mozreview-review |
Comment on attachment 8904918 [details]
Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled
https://reviewboard.mozilla.org/r/176700/#review181676
::: netwerk/protocol/http/nsHttpChannel.cpp:3979
(Diff revision 1)
>
> if (!mCacheOpenDelay) {
> MOZ_ASSERT(NS_IsMainThread(), "Should be called on the main thread");
> mCacheAsyncOpenCalled = true;
> if (mNetworkTriggered) {
> - mRaceCacheWithNetwork = true;
> + mRaceCacheWithNetwork = sRCWNEnabled;
How is this going to fix the crash once we enable racing?
Attachment #8904918 -
Flags: review?(michal.novotny) → review-
Comment 7•7 years ago
|
||
(In reply to Shian-Yow Wu [:swu] (56 Regression Engineering support) from comment #2)
> Gary, is it related to the fix in bug 1382178?
It's unlikely related to bug 1382178, and looks like Michal and Valentin are working at this (thank you!).
Flags: needinfo?(xeonchen)
Comment 8•7 years ago
|
||
Valentin any updates here? We are going to run out of time for 56 soon.
Flags: needinfo?(valentin.gosu)
Assignee | ||
Comment 9•7 years ago
|
||
Michal, can we consider the patch for beta?
I haven't been able to figure out the exact flow that triggers this bug, and uplifting the fix in the last days of beta is quite risky anyway.
You are correct in saying that the patch is working around the bug of why we are setting mRaceCacheWithNetwork to true, but we will not be enabling RCWN on beta anyway.
Flags: needinfo?(valentin.gosu) → needinfo?(michal.novotny)
Comment 10•7 years ago
|
||
I'm still not sure how this patch will fix the problem.
If you mean that we set mRaceCacheWithNetwork incorrectly, so it's true even if we're not actually racing, then it won't fix the problem, because we would null out the pump only if we really race.
And if you mean that we really race in case we shouldn't (because it's disabled), then this patch would set mRaceCacheWithNetwork to false in case we really race and it's IMO pretty bad.
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #10)
> I'm still not sure how this patch will fix the problem.
Just preventing mRaceCacheWithNetwork from being set to true when racing is not enabled.
We still need separate patch for fx57 to fix the root cause.
> If you mean that we set mRaceCacheWithNetwork incorrectly, so it's true even
> if we're not actually racing, then it won't fix the problem, because we
> would null out the pump only if we really race.
If we are setting mRacheCacheWithNetwork incorrectly then we would also null out the pump incorrectly in nsHttpChannel::ReadFromCache.
> And if you mean that we really race in case we shouldn't (because it's
> disabled), then this patch would set mRaceCacheWithNetwork to false in case
> we really race and it's IMO pretty bad.
I don't think that's the case. In any case, we should not be racing on beta anyway.
Comment 12•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #11)
> > And if you mean that we really race in case we shouldn't (because it's
> > disabled), then this patch would set mRaceCacheWithNetwork to false in case
> > we really race and it's IMO pretty bad.
>
> I don't think that's the case. In any case, we should not be racing on beta
> anyway.
Users can enable it manually and obviously some of them did it:
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-09-11&keys=__none__!__none__!__none__&max_channel_version=beta%252F56&measure=NETWORK_RACE_CACHE_WITH_NETWORK_USAGE_2&min_channel_version=null&processType=*&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2017-08-29&table=1&trim=1&use_submission_date=0
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #12)
> (In reply to Valentin Gosu [:valentin] from comment #11)
> > > And if you mean that we really race in case we shouldn't (because it's
> > > disabled), then this patch would set mRaceCacheWithNetwork to false in case
> > > we really race and it's IMO pretty bad.
> >
> > I don't think that's the case. In any case, we should not be racing on beta
> > anyway.
>
> Users can enable it manually and obviously some of them did it:
We've fixed a bunch of other bugs with RCWN and not uplifted them to beta.
I think we should either disable it completely on beta (ignore the pref) - or let those users that flipped the pref deal with the crashes (as we do for other bugs).
Still, I want to make sure mRaceCacheWithNetwork is never true when it shouldn't. At the moment it seems that is a possibility.
Comment 14•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #13)
> Still, I want to make sure mRaceCacheWithNetwork is never true when it
> shouldn't. At the moment it seems that is a possibility.
Why we don't simply add a null check as in bugs 1351340 and 1382178? AFAICS, Gary forgot to add a null check here in bug 1382178.
Assignee | ||
Comment 15•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #14)
> (In reply to Valentin Gosu [:valentin] from comment #13)
> > Still, I want to make sure mRaceCacheWithNetwork is never true when it
> > shouldn't. At the moment it seems that is a possibility.
>
> Why we don't simply add a null check as in bugs 1351340 and 1382178? AFAICS,
> Gary forgot to add a null check here in bug 1382178.
We can do that too. I'm worried there might be other crashes caused by setting mRaceCacheWithNetwork to true incorrectly.
Comment 16•7 years ago
|
||
So add MOZ_RELEASE_ASSERT(sRCWNEnabled) where we set mRaceCacheWithNetwork = true. Then we would know for sure there is some problem.
Updated•7 years ago
|
Comment 17•7 years ago
|
||
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Assignee | ||
Comment 18•7 years ago
|
||
So, I managed to create a test case that triggers the cache. The stack trace is exactly as one on crash-stats.
This crash happens when we have a cached 200 response for a page that now returns a 401.
We race, network wins, but for some reason mRaceCacheWithNetwork isn't set.
Then OnCacheEntryCheck is triggered, with a valid cache entry. In OnCacheEntryAvailable we call ReadFromCache, which nulls out the transaction pump. When we get OnAuthAvailable, the pump is null, so we crash.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8908311 [details]
Bug 1396307 - Add null check in nsHttpChannel::OnAuthAvailable
https://reviewboard.mozilla.org/r/179940/#review185154
Attachment #8908311 -
Flags: review?(michal.novotny) → review+
Comment hidden (mozreview-request) |
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8904918 [details]
Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled
https://reviewboard.mozilla.org/r/176700/#review185160
Attachment #8904918 -
Flags: review?(michal.novotny) → review+
Comment hidden (mozreview-request) |
Comment 25•7 years ago
|
||
mozreview-review |
Comment on attachment 8908322 [details]
Bug 1396307 - Set mCacheAsyncOpenCalled back to flase when AsyncOpenURI fails
https://reviewboard.mozilla.org/r/179960/#review185166
Attachment #8908322 -
Flags: review?(michal.novotny) → review+
Comment 26•7 years ago
|
||
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/77cd8fa838ba
Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled r=michal
https://hg.mozilla.org/integration/autoland/rev/d4ec49c6206d
Add null check in nsHttpChannel::OnAuthAvailable r=michal
https://hg.mozilla.org/integration/autoland/rev/ea0aae8835fe
Set mCacheAsyncOpenCalled back to flase when AsyncOpenURI fails r=michal
Assignee | ||
Updated•7 years ago
|
Attachment #8907838 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Crash Signature: [@ RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock] → [@ RtlEnterCriticalSection | mozilla::RecursiveMutexAutoLock::RecursiveMutexAutoLock]
[@ RtlEnterCriticalSection | nsInputStreamPump::Resume ]
Assignee | ||
Comment 28•7 years ago
|
||
MozReview-Commit-ID: FrLjfuExmYV
Assignee | ||
Comment 29•7 years ago
|
||
Comment on attachment 8908426 [details] [diff] [review]
[beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled
Approval Request Comment
[Feature/Bug causing the regression]: bug 1325081
[User impact if declined]: possible crashes when the race condition happens, most usually with pages that require user authentication.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: just landed
[Needs manual test from QE? If yes, steps to reproduce]: no, it is unlikely the race can be intentionally triggered.
[List of other uplifts needed for the feature/fix]: attachment 8908311 [details] and attachment 8908322 [details] in this same bug apply cleanly over this one.
[Is the change risky?]: no.
[Why is the change risky/not risky?]: each of these patches contains a small change that ensures we don't mistakenly set a variable to true, and that we check pointers before using them.
[String changes made/needed]: none
Attachment #8908426 -
Flags: approval-mozilla-beta?
Comment 30•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/77cd8fa838ba
https://hg.mozilla.org/mozilla-central/rev/d4ec49c6206d
https://hg.mozilla.org/mozilla-central/rev/ea0aae8835fe
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 31•7 years ago
|
||
Comment on attachment 8908426 [details] [diff] [review]
[beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled
56 is on m-r now.
Attachment #8908426 -
Flags: approval-mozilla-beta? → approval-mozilla-release?
Comment 32•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #23)
> Comment on attachment 8904918 [details]
> Bug 1396307 - Make sure we only set mRaceCacheWithNetwork to true when the
> feature is enabled
>
> https://reviewboard.mozilla.org/r/176700/#review185160
This patch was a hotfix for releases where is RCWN disabled. Valentin, please file a follow up bug to revert the change once RCWN is enabled.
Comment 33•7 years ago
|
||
Comment on attachment 8908426 [details] [diff] [review]
[beta] Make sure we only set mRaceCacheWithNetwork to true when the feature is enabled
This crash is not high volume, but it newly appeared in 56 and may become a worse problem with release users. Seems worth the risk to take this last minute fix (for the 56 RC build)
Attachment #8908426 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Comment 34•7 years ago
|
||
bugherder uplift |
Comment 35•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #29)
> [Is this code covered by automated tests?]: yes
> [Has the fix been verified in Nightly?]: just landed
> [Needs manual test from QE? If yes, steps to reproduce]: no, it is unlikely
> the race can be intentionally triggered.
Setting qe-verify- based on Valentin's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment 36•7 years ago
|
||
If this is this bug: https://crash-stats.mozilla.com/report/index/1b27e2ca-5b2e-495d-84c2-fc9bf0170926#tab-bugzilla - I can reproduce this with 100% reliability on a TFS-system - thus if there is any way I can help to get this crunched, please feel free to contact me.
You need to log in
before you can comment on or make changes to this bug.
Description
•