Closed Bug 1351340 Opened 3 years ago Closed 3 years ago

Crash in mozilla::net::nsHttpChannel::ContinueProcessResponse3

Categories

(Core :: Networking, defect, critical)

52 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr45 --- unaffected
firefox52 --- wontfix
firefox-esr52 --- fixed
firefox53 --- wontfix
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: philipp, Assigned: xeonchen)

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-ebbad5ee-3305-42dd-9af2-ca9ac2170327.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::nsHttpChannel::ContinueProcessResponse3(nsresult) 	netwerk/protocol/http/nsHttpChannel.cpp:2313
1 	xul.dll 	mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) 	netwerk/protocol/http/nsHttpChannel.cpp:7696
2 	xul.dll 	mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() 	netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40
3 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1240
4 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
5 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
6 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
7 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
8 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:262
9 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:283
10 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4477
11 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4654
12 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4745
13 	xul.dll 	mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/Bootstrap.cpp:45
14 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
15 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
16 	kernel32.dll 	BaseThreadInitThunk 	
17 	ntdll.dll 	__RtlUserThreadStart 	
18 	ntdll.dll 	_RtlUserThreadStart

this cross-platform crash signature is regressing in firefox 52 and upwards.
many of the user comments indicate that they have just submitted their email address in a web form field when the crash occurred.
potentially proxy. gary?
Flags: needinfo?(xeonchen)
I'm not that familiar with this part, so I may be wrong.

In the first glance, it crashed while calling |ProcessFailedProxyConnect(mRedirectType)|, where |mRedirectType| is |uint32_t|.
So it is probably because |this| is deleted?
But the operation of |mTransaction| above this line seems working, as well as |OnRedirectVerifyCallback|.

The potential call path from [1] to [2], the |mOldChan| is held by smart pointers all the time, so I can't see the reason it crashed.

I think we need help from someone more experienced.

[1] https://dxr.mozilla.org/mozilla-beta/rev/62467dd7218ffb48be7d4406d98ded5d0909fcfc/netwerk/base/nsBaseChannel.cpp#136
[2] https://dxr.mozilla.org/mozilla-beta/rev/62467dd7218ffb48be7d4406d98ded5d0909fcfc/netwerk/base/nsAsyncRedirectVerifyHelper.cpp#196
Flags: needinfo?(xeonchen) → needinfo?(mcmanus)
daniel, can you help gary?
Flags: needinfo?(mcmanus) → needinfo?(daniel)
I'll assist from my end
Ok, there are crashes on all OSes on all versions basically. Not terribly many, but still...

The crash addresses are usually in the 0x17f - 0x18f area, which I think hints at a null pointer object being used with a class member  referenced at that index. I suspect 'mTransacion' at the end of the nsHttpChannel::ContinueProcessResponse3() function in nsHttpChannel.cpp:

    LOG(("ContinueProcessResponse3 got failure result [rv=%" PRIx32 "]\n",
         static_cast<uint32_t>(rv)));
>>> if (mTransaction->ProxyConnectFailed()) {
        return ProcessFailedProxyConnect(mRedirectType);
    }
    return ProcessNormal();

I don't claim to see how the flow goes to *how* it ends up there with a null pointer, but I also see that there are a lot of functions in this source file that checks for mTransaction being null before using it...

Or am I barking up the completely wrong tree here?
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #5)
> Ok, there are crashes on all OSes on all versions basically. Not terribly
> many, but still...
> 
> The crash addresses are usually in the 0x17f - 0x18f area, which I think
> hints at a null pointer object being used with a class member  referenced at
> that index. I suspect 'mTransacion' at the end of the
> nsHttpChannel::ContinueProcessResponse3() function in nsHttpChannel.cpp:
> 
>     LOG(("ContinueProcessResponse3 got failure result [rv=%" PRIx32 "]\n",
>          static_cast<uint32_t>(rv)));
> >>> if (mTransaction->ProxyConnectFailed()) {
>         return ProcessFailedProxyConnect(mRedirectType);
>     }
>     return ProcessNormal();
> 

Maybe I'm asking stupid questions...

What I can't understand here is how the crash addresses can help us?
I mean what exactly does that value mean in these crashes?

Or we can just feel it like a null-pointer access violation because the the range of the crash addresses are limited?

> I don't claim to see how the flow goes to *how* it ends up there with a null
> pointer, but I also see that there are a lot of functions in this source
> file that checks for mTransaction being null before using it...
> 

Yeah, that's easier, thanks!

> Or am I barking up the completely wrong tree here?
(In reply to Gary Chen [:xeonchen] (use ni? please) from comment #6)

> What I can't understand here is how the crash addresses can help us?
> I mean what exactly does that value mean in these crashes?

Let me try to explain my reasoning:

The crash is an EXCEPTION_ACCESS_VIOLATION_READ. Firefox read memory that we weren't allowed to. Where?

The reported address this happened on is 0x18f or 0x17f. Why would we ever read from such an address? I can think of one really good explanation: that's the offset into a object/struct to a member that we want to access. If the object pointer is NULL, the class is fairly big and the specific class member we trying to access is a bit towards the end of that big class it isn't inconceivable to imagine that 0x017f is the index into the object where "our" member lies.

Then we look at the stack trace where the crash occurs and it repeatedly points to a particular place in the code. And what do you know, just the source line *before* that spot in the code that keeps being identified as the crashing point there's a reference to a member of a class/object. Maybe that's the one?

Admittedly I haven't yet tried to verify that the pointer to 'ProxyConnectFailed()' is actually stored at index 0x17f (or 0x18f) of the 'mTransaction' object, but given a cursory look it seems to be the most likely explanation for what I've seen so far in the crash reports.

*If* ProxyConnectFailed() is at that index of the nsHttpTransaction class, then mTransaction pretty much has to have been NULL there to cause this.

This is based on reasoning around the data we have. It isn't conclusive or certain and I might very well have missed something important, but it seems reasonable to me.
Comment on attachment 8853609 [details]
Bug 1351340 - check null pointer before using;

https://reviewboard.mozilla.org/r/125690/#review128380
Attachment #8853609 - Flags: review?(daniel) → review+
Pushed by gachen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6e2af1bec3fd
check null pointer before using; r=bagder
Assignee: nobody → xeonchen
leave this open for a while and see if the crash rate decreases.
Keywords: leave-open
Whiteboard: [necko-active]
Comment on attachment 8853609 [details]
Bug 1351340 - check null pointer before using;

In the crash-stats, no more crash on Nightly for past 7 days, I think we can give it a try on aurora/beta.

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: May randomly crash
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: no
[Is the change risky?]: not at all
[Why is the change risky/not risky?]: simply checks pointer before access.
[String changes made/needed]: no
Attachment #8853609 - Flags: approval-mozilla-beta?
Attachment #8853609 - Flags: approval-mozilla-aurora?
Too late for beta uplift, we are already building the release candidate and this doesn't look like a high volume crash or security issue that would block 53 release.
Attachment #8853609 - Flags: approval-mozilla-beta?
Attachment #8853609 - Flags: approval-mozilla-beta-
Attachment #8853609 - Flags: approval-mozilla-aurora?
Attachment #8853609 - Flags: approval-mozilla-aurora+
Calling this fixed based on comment 13.
Status: NEW → RESOLVED
Closed: 3 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Attachment #8853609 - Flags: approval-mozilla-esr52?
(In reply to Gary Chen [:xeonchen] (use ni? please) from comment #13)
> [Is this code covered by automated tests?]: no
> [Has the fix been verified in Nightly?]: yes
> [Needs manual test from QE? If yes, steps to reproduce]: no

Setting qe-verify- based on Gary's assessment on manual testing needs.
Flags: qe-verify-
Comment on attachment 8853609 [details]
Bug 1351340 - check null pointer before using;

null check, there are a few crash reports on esr52, ESR52.2+
Attachment #8853609 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.