Closed Bug 1206199 Opened 4 years ago Closed 4 years ago

Extend channelwrapper to mediate OnStartRequest, OnStopRequest, OnDataAvailable

Categories

(Core :: DOM: Security, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- fixed
firefox43 --- fixed
firefox44 --- fixed

People

(Reporter: ckerschb, Assigned: ckerschb)

References

Details

Attachments

(7 files, 8 obsolete files)

3.60 KB, patch
sicking
: review+
Details | Diff | Splinter Review
33.37 KB, text/plain
Details
88.36 KB, text/plain
Details
89.47 KB, text/plain
Details
6.04 KB, text/plain
Details
4.26 KB, text/plain
Details
5.98 KB, text/plain
Details
No description provided.
Assignee: nobody → mozilla
Blocks: 1120487
Blocks: 1205410
I added an interception class called SecWrapChannelStreamListener which allows us to mediate calls to:
* OnStartRequest()
* OnStopRequest()
* OnDataAvailable()

I manually tested several addons and they seem to work fine. Obviously we should test really carefully. Matt and Kamil from our QA team will help to verify that this patch works before we can consider uplifting to 42.

Anyway, Jonas what do you think? Are we missing something?
Attachment #8663237 - Flags: review?(jonas)
Comment on attachment 8663237 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel.patch

Review of attachment 8663237 [details] [diff] [review]:
-----------------------------------------------------------------

Have you thought through in detail what happens when there is a redirect? I.e. when we redirect from one channel to another, and either (or both) of the channels are wrapped with a secwrapper?
Attachment #8663237 - Flags: review?(jonas) → review+
Kamil, can you help verify this please? Christoph can help point you to the right addons to verify that it works.
QA Contact: kjozwiak
(In reply to Jonas Sicking (:sicking) from comment #2)
> Comment on attachment 8663237 [details] [diff] [review]
> bug_1206199_fix_nssecwrapchannel.patch
> 
> Review of attachment 8663237 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Have you thought through in detail what happens when there is a redirect?
> I.e. when we redirect from one channel to another, and either (or both) of
> the channels are wrapped with a secwrapper?

I thought about it and I think that shouldn't make a difference, because either way the channel is going to be created through our own iOService. So in case the addon implements it's own protocolhandler then the initial channel as well as the redirected channel are going to be created through the same protocolhandler. So if the first channel is wrapped, then the second channel is also wrapped. The only time that wouldn't be the case if an addon implements it's own protocolhandler but then immediately redirects to e.g. an http channel so that the first channel is wrapped but the redirected channel is not. I am not sure how common that is. Also, if a channel gets redirected, then the lifetime of the old channel ends at that time. I think we are just fine either way.
So I applied the patch from comment # 4 into my newly pulled m-c and made sure it was applied using "hg qapplied". As per our earlier conversation on IRC/Vidyo, Christoph also mentioned that I need to comment out the following line:

* http://mxr.mozilla.org/mozilla-central/source/netwerk/base/nsIOService.cpp#845

Once I commented the above line, the build compiled without any problems using ./mach build. I installed the geolocation add-on from bug # 1205410 comment # 15 and played around with that for a few minutes without any problems but once I installed the vwc_cocoon.xpi add-on from bug # 1205410 comment # 10 I got an instant crash. Whenever I try to re-run fx using ./mach run, I get an instant crash:

- bp-09be630f-d411-432d-8f64-3a4af2150929
- bp-13e9e964-16c2-4a1d-a67e-92cd42150929
- bp-41387cfa-30c7-4e30-848c-90f952150929
- bp-44c3c4b5-9789-4e32-88e6-aa7fb2150929
- bp-c1476533-93fb-4952-8121-f47812150929
- bp-d0b1bd60-4d58-4d90-9333-128a82150929

Christoph, could this be an issue with the patch or maybe due to me missing something? I'm going to try removing the patch and restore nsIOService.cpp#845 and see if I still get the above crashes.
I basically went through the same thing once again and got the following crash when I clicked on "restart" once the vwc_cocoon.xpi add-on was installed. At that point, fx became none responsive and ends up crashing: (seen this a few times already but not always)

- 29637867-53c2-482f-bd00-c74282150929

However, when I re-ran fx again using ./mach run, it didn't crash instantly this time around. However the crashes are not really consistent as I can't reproduce it 100% when I remove/re-install the vwc_cocoon.xpi add-on.
I also instantly crashed when I visit google.ca/maps but at least fx launched every single time without instantly crashing:

- 288762d2-187f-426b-9d01-9e2fc2150929
- e5d6aec2-2cdb-4dfd-9d98-403de2150929
- 6342811e-09aa-4e0f-a4d2-e34302150929

Once I removed geolocater which I got from bug # 1205410 comment # 15, fx started crashing instantly once launched with vwc_cocoon.xpi being the only add-on currently installed:

- bp-c1476533-93fb-4952-8121-f47812150929
- bp-d0b1bd60-4d58-4d90-9333-128a82150929
- bp-e5d6aec2-2cdb-4dfd-9d98-403de2150929

Apologies for the spam :) I'll let Christoph take a look at the crashes and see if he can make any sense from this.
Because these are local builds, the crash reports are unfortunately useless. I think we need to reproduce with a debug build and look at the crash stack.
Attached file bt.txt
I've attached the bt info as well, hopefully it's more helpful than the crashes I added above. Christoph recommended the following change into HttpBaseChannel.cpp which definitely fixed the instant crash I was getting. However I'm not sure if there's similar issues hiding somewhere else.

>> if (controller && !mForceNoIntercept && mLoadInfo) {

(lldb) process launch
Process 663 launched: '/Users/kjozwiak/code/moz/mozilla-central/objdir-ff-release/dist/Nightly.app/Contents/MacOS/firefox' (x86_64)
Done
Process 663 stopped
* thread #1: tid = 0x43e2, 0x00000001015e98b4 XUL`mozilla::net::HttpBaseChannel::ShouldIntercept(this=0x000000011dd3d800) + 212 at HttpBaseChannel.cpp:2276, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001015e98b4 XUL`mozilla::net::HttpBaseChannel::ShouldIntercept(this=0x000000011dd3d800) + 212 at HttpBaseChannel.cpp:2276
   2273	  GetCallback(controller);
   2274	  bool shouldIntercept = false;
   2275	  if (controller && !mForceNoIntercept) {
-> 2276	    nsContentPolicyType type = mLoadInfo->InternalContentPolicyType();
   2277	    nsresult rv = controller->ShouldPrepareForIntercept(mURI,
   2278	                                                        IsNavigation(),
   2279	                                                        type,
Yeah, that explains why it was working for me but not anymore. In the meantime Bug 1184798 landed which accesses mLoadInfo without doing a nullcheck. I filed Bug 1209694 to get that fixed.
Depends on: 1209694
Status: NEW → ASSIGNED
Depends on: 1208755
Probably related but I also get the following crash when clicking and typing inside the URL while the vwc_cocoon.xpi add-on is installed even with the fix in HttpBaseChannel.cpp as per comment # 9:

(lldb) process launch
Process 12794 launched: '/Users/kjozwiak/code/moz/mozilla-central/objdir-ff-release/dist/Nightly.app/Contents/MacOS/firefox' (x86_64)
Done
Process 12794 stopped
* thread #1: tid = 0xed06, 0x00000001014846c3 XUL`nsCOMPtr_base::assign_with_AddRef(nsISupports*) [inlined] nsCOMPtr_base::assign_assuming_AddRef(this=<unavailable>, aNewPtr=0x0000000134b37b20) + 11 at nsCOMPtr.h:335, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001014846c3 XUL`nsCOMPtr_base::assign_with_AddRef(nsISupports*) [inlined] nsCOMPtr_base::assign_assuming_AddRef(this=<unavailable>, aNewPtr=0x0000000134b37b20) + 11 at nsCOMPtr.h:335
   332 	    NSCAP_LOG_ASSIGNMENT(this, aNewPtr);
   333 	    NSCAP_LOG_RELEASE(this, oldPtr);
   334 	    if (oldPtr) {
-> 335 	      NSCAP_RELEASE(this, oldPtr);
   336 	    }
   337 	  }
   338 	};
Aurora/BETA Results:
====================

With the vwc_cocoon.xpi add-on, once it's installed via about:addons, pressing the "Restart" button under the extension doorhanger will cause the browser to instantly crash. Once you get the initial crash, fx will crash every single time it's launched until you launch fx into safe mode and disable the add-on. Now if you go through the same process but rather than restarting the browser, you close it via the "Hamburger" menu, fx will not crash once it's re-opened. However, visiting google.com/maps in this example will crash fx instantly.

When you're going through the same process while in debug mode, you'll get the same behaviour and the crashes point to the same one I mentioned in comment # 11. The only difference is when you restart via the doorhanger once the add-on is installed, you'll get an OS crash report rather than a fx crash report. I've attached the report to the bug as *OSXRestartCrashAurora.txt*.

I'm also seeing this happen in m-b but it's a lot harder to reproduce. I see the same crash I got in comment # 11 but it definitely doesn't happen as often under beta. Takes me several minutes to reproduce it (opening tabs, quickly typing into the URL etc..) rather than getting it instantly as I do in m-c and m-a. However I also got a OSX crash under beta as well and attached the report as *OSXCrashBETA.txt*.
Same thing is happening with the COMPUTERBILD-Abzockschutz-1.0.59.xpi add-on from bug # 1205410 comment # 10 under m-a and m-b. It either produces a OSX crash like I mentioned in comment # 12 (attached two .txt files earlier) or you receive the crash that's mentioned in comment # 11.
Philipp, this is another issue with addons that we may need your help to test soon. I know you helped us really well with the addons related fix in bug 1205410. N-I'ing you to see if we can get similar help on try builds that Christoph can share tomm/day after. Christoph can provide guidance on what to test and how.
Flags: needinfo?(madperson)
hi, my available time during the next days will be quite sparse. if it's something quick to test i'll gladly try to help - if it's not that straightforward i can take a closer look in the second half of next week.
Christoph, looks like the same crash from comment # 11 is still happening even after bug # 1208755 has been fixed. I double checked both "/libjar/nsJARChannel.cpp" and "/netwerk/protocol/http/HttpBaseChannel.cpp" and made sure the "&& mLoadInfo" changes were present in both files that I'm using. Looks like this is a different crash.

The easiest way to reproduce is installing "COMPUTERBILD-Abzockschutz-1.0.59.xpi", select the add-on icon under the toolbar and click on "Abzock-Seite melden". It will open a new tab and you'll get the crash in the same area as comment # 11.
Flags: needinfo?(mozilla)
(In reply to Kamil Jozwiak [:kjozwiak] from comment #17)
> Christoph, looks like the same crash from comment # 11 is still happening
> even after bug # 1208755 has been fixed. I double checked both
> "/libjar/nsJARChannel.cpp" and "/netwerk/protocol/http/HttpBaseChannel.cpp"
> and made sure the "&& mLoadInfo" changes were present in both files that I'm
> using. Looks like this is a different crash.
> 
> The easiest way to reproduce is installing
> "COMPUTERBILD-Abzockschutz-1.0.59.xpi", select the add-on icon under the
> toolbar and click on "Abzock-Seite melden". It will open a new tab and
> you'll get the crash in the same area as comment # 11.

Thanks Kamil for the update! I will investigate myself today and will provide a detailed list of how to test again so other people can help us out testing different addons next week.
Flags: needinfo?(mozilla)
Flags: needinfo?(madperson)
Attached file btComment11.txt
Thanks Christoph. I've attached the stacktrace of the crash that I've been getting from comment #11.

If anyone else is interested in testing this right now (philipp?) and see if they get the same results, you can use the following steps:

* NOTE: these steps are for OSX but should be similar to other OS's *

- hg pull http://hg.mozilla.org/mozilla-central/
- once you've pulled the code, apply Christoph's patch using "hg qimport bug_1206199_fix_nssecwrapchannel.patch"
- once the patch has been applied, comment out the following line: http://mxr.mozilla.org/mozilla-central/source/netwerk/base/nsIOService.cpp#845
- ensure that the patch is applied using "hg qapplied"
- ensure that nsIOService.cpp is being listed as a change file using "hg status"
- once you've applied the patch and commented out the line from nsIOService.cpp, build using "./mach build > out.txt 2>&1"
- "./mach run" once the build is complete and fx should launch

If you end up receiving a crash while installing/using the add-ons, please attach the stacktrace's using the following steps:

- ./mach run --debug (this should launch (lldb))
- once inside "(lldb)", launch fx using "process launch"
- reproduce the crash and you should see it appear under "(lldb)"
- get the stacktrace by typing "bt" after you receive the crash in lldb
(In reply to Kamil Jozwiak [:kjozwiak] from comment #19)
> Created attachment 8669064 [details]
> btComment11.txt
> 
> Thanks Christoph. I've attached the stacktrace of the crash that I've been
> getting from comment #11.
> 
> If anyone else is interested in testing this right now (philipp?) and see if
> they get the same results, you can use the following steps:
> 
> * NOTE: these steps are for OSX but should be similar to other OS's *
> 
> - hg pull http://hg.mozilla.org/mozilla-central/
> - once you've pulled the code, apply Christoph's patch using "hg qimport
> bug_1206199_fix_nssecwrapchannel.patch"
> - once the patch has been applied, comment out the following line:
> http://mxr.mozilla.org/mozilla-central/source/netwerk/base/nsIOService.
> cpp#845
> - ensure that the patch is applied using "hg qapplied"
> - ensure that nsIOService.cpp is being listed as a change file using "hg
> status"
> - once you've applied the patch and commented out the line from
> nsIOService.cpp, build using "./mach build > out.txt 2>&1"
> - "./mach run" once the build is complete and fx should launch
> 
> If you end up receiving a crash while installing/using the add-ons, please
> attach the stacktrace's using the following steps:
> 
> - ./mach run --debug (this should launch (lldb))
> - once inside "(lldb)", launch fx using "process launch"
> - reproduce the crash and you should see it appear under "(lldb)"
> - get the stacktrace by typing "bt" after you receive the crash in lldb

Looking at the stacktrace I realized that you have e10s enabled. I agree getting it fixed and working in e10s is important, but at the moment it seems to me we should test in regular mode so we can get the release version up and running first.
Jason, in case addons implement their own protocolHandler but do not provide an implementation for newChannel2 we are wrapping the channel so that we can attach a Loadinfo on the wrapped channel. We were facing a problem with our wrapper because we did not overwrite on OnStartRequest/OnStopRequest/OnDataAvailable and implementations were falling apart because the first argument of those methods was not the wrapped channel but rather the inner channel [see also 1 for more details]. Within this bug we are going to fix that problem. So far that seems to work (at least I did not see any crashes in regular mode), but we are still facing problems for e10s.

Looking at the stacktrace underneath could you guess what might go wrong? Interesting fact is that Kamil gets different crashes on OSX but also around HttpChannelParent::DoAsyncOpen() (see stacktrace from comment 19). Probably we need more help from Necko folks to get this problem resolved on current beta. Thanks!

[1] http://mxr.mozilla.org/mozilla-central/source/netwerk/base/nsIOService.cpp#767
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1205410#c1


0x00007f0fceb9bf3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f0fceb9bdd4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f0fc142ba26 in ah_crap_handler (signum=11) at /home/ckerschb/moz/mc/toolkit/xre/nsSigHandlers.cpp:103
#3  0x00007f0fc1409fd0 in nsProfileLock::FatalSignalHandler (signo=11, info=0x7ffed97a66f0, context=0x7ffed97a65c0) at /home/ckerschb/moz/mc/toolkit/profile/nsProfileLock.cpp:195
#4  0x00007f0fc2450cbf in AsmJSFaultHandler (signum=11, info=0x7ffed97a66f0, context=0x7ffed97a65c0) at /home/ckerschb/moz/mc/js/src/asmjs/AsmJSSignalHandlers.cpp:1160
#5  <signal handler called>
#6  0x00007f0fbc6b379c in NS_LogCOMPtrRelease (aCOMPtr=0x7f0f9f56bbd8, aObject=0x7f0f9e6ea7a0) at /home/ckerschb/moz/mc/xpcom/base/nsTraceRefcnt.cpp:1318
#7  0x00007f0fbc74ac08 in nsCOMPtr<nsIURI>::assign_assuming_AddRef (this=0x7f0f9f56bbd8, aNewPtr=0x7f0f9c160700) at ../../dist/include/nsCOMPtr.h:378
#8  0x00007f0fbc7a7b96 in nsCOMPtr<nsIURI>::assign_with_AddRef (this=0x7f0f9f56bbd8, aRawPtr=0x7f0f9c160700) at ../dist/include/nsCOMPtr.h:1066
#9  0x00007f0fbc7a64af in nsCOMPtr<nsIURI>::operator= (this=0x7f0f9f56bbd8, aRhs=0x7f0f9c160700) at ../dist/include/nsCOMPtr.h:578
#10 0x00007f0fbcb2e460 in mozilla::net::HttpBaseChannel::SetDocumentURI (this=0x7f0f9f56bb08, aDocumentURI=0x7f0f9c160700)
    at /home/ckerschb/moz/mc/netwerk/protocol/http/HttpBaseChannel.cpp:1722
#11 0x00007f0fbcb435eb in mozilla::net::HttpChannelParent::DoAsyncOpen (this=0x7f0f9bf4a700, aURI=..., aOriginalURI=..., aDocURI=..., aReferrerURI=...,
    aReferrerPolicy=@0x7ffed97a8838: 0, aAPIRedirectToURI=..., aTopWindowURI=..., aLoadFlags=@0x7ffed97a8860: 6881280, requestHeaders=..., requestMethod=..., uploadStream=...,
    uploadStreamHasHeaders=@0x7ffed97a8890: false, priority=@0x7ffed97a8892: 0, classOfService=@0x7ffed97a8894: 0, redirectionLimit=@0x7ffed97a8898: 20 '\024',
    allowPipelining=@0x7ffed97a8899: true, allowSTS=@0x7ffed97a889a: true, thirdPartyFlags=@0x7ffed97a889c: 4, doResumeAt=@0x7ffed97a88a0: false,
    startPos=@0x7ffed97a88a8: 18446744073709551615, entityID=..., chooseApplicationCache=@0x7ffed97a88c0: true, appCacheClientID=..., allowSpdy=@0x7ffed97a88d8: true,
    allowAltSvc=@0x7ffed97a88d9: true, aFds=..., aLoadInfoArgs=..., aSynthesizedResponseHead=..., aSecurityInfoSerialization=..., aCacheKey=@0x7ffed97a8a28: 0,
    aSchedulingContextID=..., aCorsPreflightArgs=..., aInitialRwin=@0x7ffed97a8aa0: 0) at /home/ckerschb/moz/mc/netwerk/protocol/http/HttpChannelParent.cpp:434
#12 0x00007f0fbcb42b88 in mozilla::net::HttpChannelParent::Init (this=0x7f0f9bf4a700, aArgs=...) at /home/ckerschb/moz/mc/netwerk/protocol/http/HttpChannelParent.cpp:123
#13 0x00007f0fbcc515d1 in mozilla::net::NeckoParent::RecvPHttpChannelConstructor (this=0x7f0f96d73180, aActor=0x7f0f9bf4a700, aBrowser=..., aSerialized=..., aOpenArgs=...)
    at /home/ckerschb/moz/mc/netwerk/ipc/NeckoParent.cpp:249
#14 0x00007f0fbd0a96e4 in mozilla::net::PNeckoParent::OnMessageReceived (this=0x7f0f96d73180, msg__=...) at ./PNeckoParent.cpp:1058
#15 0x00007f0fbd4f6a28 in mozilla::dom::PContentParent::OnMessageReceived (this=0x7f0fb0284c00, msg__=...) at ./PContentParent.cpp:3364
#16 0x00007f0fbce1a1a6 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0x7f0fb0284c68, aMsg=...) at /home/ckerschb/moz/mc/ipc/glue/MessageChannel.cpp:1353
#17 0x00007f0fbce19555 in mozilla::ipc::MessageChannel::DispatchMessage (this=0x7f0fb0284c68, aMsg=...) at /home/ckerschb/moz/mc/ipc/glue/MessageChannel.cpp:1279
#18 0x00007f0fbce153cf in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=0x7f0fb0284c68) at /home/ckerschb/moz/mc/ipc/glue/MessageChannel.cpp:1250
#19 0x00007f0fbce35393 in DispatchToMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)()> (obj=0x7f0fb0284c68,
    method=(bool (mozilla::ipc::MessageChannel::*)(mozilla::ipc::MessageChannel * const)) 0x7f0fbce15240 <mozilla::ipc::MessageChannel::OnMaybeDequeueOne()>, arg=...)
    at /home/ckerschb/moz/mc/ipc/chromium/src/base/tuple.h:387
#20 0x00007f0fbce3528e in RunnableMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run (this=0x7f0fa28d6480)
    at /home/ckerschb/moz/mc/ipc/chromium/src/base/task.h:323
#21 0x00007f0fbce37508 in mozilla::ipc::MessageChannel::RefCo
Flags: needinfo?(jduell.mcbugs)
Investigating further I actually hit:
>  ENSURE_CALLED_BEFORE_CONNECT();
How can I resolve this? And can I resolve this at all? Probably that's not even related to the channelwrapper. I see it was added over in Bug 811669. Not sure what to do about this, but definitely worth mentioning here int he bug.
Philipp, Kamil, I was not able to reproduce any of the errors in regular (non e10s) mode. Here is a try push (where I also commented out the ASSERTION so we can test debug builds as well):
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c92c25e202f

We should soon have a bigger list of addons we can test, but for now we can focus on:
* vwc_cocoon.xpi
* COMPUTERBILD-Abzockschutz-1.0.59.xpi
* https://addons.mozilla.org/firefox/addon/geolocater/ 

Thanks for your help and please keep in mind that stacktraces are most useful.
Flags: needinfo?(madperson)
Flags: needinfo?(kjozwiak)
Attached file linuxCrashBT.txt
Thanks Christoph, I'll go through the try builds and see if I can reproduce any crashes. BTW, I tried the same thing on the weekend with my Ubuntu machine and got the following crash several times:

[5542] ###!!! ASSERTION: Got a channel redirect for an unknown channel!: 'mChannel == oldChannel', file /home/kamil/code/mozilla-central/image/imgRequest.cpp, line 1205

I'm not sure if it's related to this issue but I've attached the trace just in case.
Flags: needinfo?(kjozwiak)
(In reply to Kamil Jozwiak [:kjozwiak] from comment #24)
> Created attachment 8669777 [details]
> linuxCrashBT.txt
> 
> Thanks Christoph, I'll go through the try builds and see if I can reproduce
> any crashes. BTW, I tried the same thing on the weekend with my Ubuntu
> machine and got the following crash several times:
> 
> [5542] ###!!! ASSERTION: Got a channel redirect for an unknown channel!:
> 'mChannel == oldChannel', file
> /home/kamil/code/mozilla-central/image/imgRequest.cpp, line 1205
> 
> I'm not sure if it's related to this issue but I've attached the trace just
> in case.

Kamil, this is exactly that kind of crash we need to fix - can you somehow reproduce that problem reliably? If so, please post it here so I can try that myself.
Flags: needinfo?(kjozwiak)
Keywords: leave-open
Here is a WIP patch on how we could mediate AsyncOnChannelRedirect. It's not working properly yet.

Steps to reproduce the error:
* Comment out the assertion within ioService:
  http://mxr.mozilla.org/mozilla-central/source/netwerk/base/nsIOService.cpp#852
* Install COMPUTERBILD-Abzockschutz-1.0.59.xpi
* Click the addon button in the right corner and select any of the items in the drop down menu.
* Observe that the assertion within imgRequest gets triggered:
  http://mxr.mozilla.org/mozilla-central/source/image/imgRequest.cpp#1204

Expected results:
* The assertion should not be triggered.
> Kamil, this is exactly that kind of crash we need to fix - can you somehow
> reproduce that problem reliably? If so, please post it here so I can try
> that myself.

I tried reproducing the crash reliably but I only get it here and there. However, I did get a new crash several times when trying to reproduce the one I mentioned in comment # 24.

Program received signal SIGSEGV, Segmentation fault.
0x00007fffea269854 in mozilla::net::HttpChannelParent::DoAsyncOpen (this=this@entry=0x7fffb3e9fe80, aURI=..., aOriginalURI=..., aDocURI=..., aReferrerURI=..., 
    aReferrerPolicy=@0x7fffffffbfe0: 0, aAPIRedirectToURI=..., aTopWindowURI=..., aLoadFlags=@0x7fffffffc008: 6898368, requestHeaders=..., requestMethod=..., uploadStream=..., 
    uploadStreamHasHeaders=@0x7fffffffc038: false, priority=@0x7fffffffc03a: 20, classOfService=@0x7fffffffc03c: 0, redirectionLimit=@0x7fffffffc040: 20 '\024', 
    allowPipelining=@0x7fffffffc041: true, allowSTS=@0x7fffffffc042: true, thirdPartyFlags=@0x7fffffffc044: 4, doResumeAt=@0x7fffffffc048: false, 
    startPos=@0x7fffffffc050: 18446744073709551615, entityID=..., chooseApplicationCache=@0x7fffffffc068: true, appCacheClientID=..., allowSpdy=@0x7fffffffc080: true, 
    allowAltSvc=@0x7fffffffc081: true, aFds=..., aLoadInfoArgs=..., aSynthesizedResponseHead=..., aSecurityInfoSerialization=..., aCacheKey=@0x7fffffffc1d0: 0, 
    aSchedulingContextID=..., aCorsPreflightArgs=..., aInitialRwin=@0x7fffffffc248: 0) at /home/kamil/code/mozilla-central/netwerk/protocol/http/HttpChannelParent.cpp:423
423	  mChannel->SetTimingEnabled(true);

Here's the steps that I used for the above crash and the crash in comment # 24:

- setup the environment using the steps I added in comment # 19
- disabled e10s via about:preferences (double checked to make sure it was disabled)
- xpinstall.signatures.required;false
- install both vwc_cocoon.xpi and COMPUTERBILD-Abzockschutz-1.0.59.xpi and restart fx
- Click the COMPUTERBILD-Abzockschutz-1.0.59.xpi add-on button in the right corner and select any of the items in the drop down menu
- quickly close the tab that was opened and re-open a new tab using the "+"

Usually doing the above will cause some kind of crash.
Flags: needinfo?(kjozwiak)
We also tested across platforms using steps provided by Kamil in comment 19 on Mac and Linux.
We installed the three add-ons (geolocater, cocoon, and abzockschutz), and we managed to get some crashes.
We had some technical difficulties building Firefox on Mac OS X 10.8.5 but I think we can provide some stacks there as well later today. Please check this etherpad for the stacks: https://etherpad-mozilla.org/1206199.
(In reply to Kamil Jozwiak [:kjozwiak] from comment #28)
> Created attachment 8670072 [details]
> HttpChannelParent.cpp:423.txt
> 
> > Kamil, this is exactly that kind of crash we need to fix - can you somehow
> > reproduce that problem reliably? If so, please post it here so I can try
> > that myself.
> 
> I tried reproducing the crash reliably but I only get it here and there.
> However, I did get a new crash several times when trying to reproduce the
> one I mentioned in comment # 24.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007fffea269854 in mozilla::net::HttpChannelParent::DoAsyncOpen
> (this=this@entry=0x7fffb3e9fe80, aURI=..., aOriginalURI=..., aDocURI=...,
> aReferrerURI=..., 
>     aReferrerPolicy=@0x7fffffffbfe0: 0, aAPIRedirectToURI=...,
> aTopWindowURI=..., aLoadFlags=@0x7fffffffc008: 6898368, requestHeaders=...,
> requestMethod=..., uploadStream=..., 
>     uploadStreamHasHeaders=@0x7fffffffc038: false, priority=@0x7fffffffc03a:
> 20, classOfService=@0x7fffffffc03c: 0, redirectionLimit=@0x7fffffffc040: 20
> '\024', 
>     allowPipelining=@0x7fffffffc041: true, allowSTS=@0x7fffffffc042: true,
> thirdPartyFlags=@0x7fffffffc044: 4, doResumeAt=@0x7fffffffc048: false, 
>     startPos=@0x7fffffffc050: 18446744073709551615, entityID=...,
> chooseApplicationCache=@0x7fffffffc068: true, appCacheClientID=...,
> allowSpdy=@0x7fffffffc080: true, 
>     allowAltSvc=@0x7fffffffc081: true, aFds=..., aLoadInfoArgs=...,
> aSynthesizedResponseHead=..., aSecurityInfoSerialization=...,
> aCacheKey=@0x7fffffffc1d0: 0, 
>     aSchedulingContextID=..., aCorsPreflightArgs=...,
> aInitialRwin=@0x7fffffffc248: 0) at
> /home/kamil/code/mozilla-central/netwerk/protocol/http/HttpChannelParent.cpp:
> 423
> 423	  mChannel->SetTimingEnabled(true);
> 
> Here's the steps that I used for the above crash and the crash in comment #
> 24:
> 
> - setup the environment using the steps I added in comment # 19
> - disabled e10s via about:preferences (double checked to make sure it was
> disabled)
> - xpinstall.signatures.required;false
> - install both vwc_cocoon.xpi and COMPUTERBILD-Abzockschutz-1.0.59.xpi and
> restart fx
> - Click the COMPUTERBILD-Abzockschutz-1.0.59.xpi add-on button in the right
> corner and select any of the items in the drop down menu
> - quickly close the tab that was opened and re-open a new tab using the "+"
> 
> Usually doing the above will cause some kind of crash.


Sorry for asking again, are you sure that e10s is disabled? HttpChannelParent::DoAsyncOpen should not be called if e10s is not on, really not - if it is called this is a bigger problem.
When you disable e10s in preference, have you restarted firefox? (the pref is change only if it is restared after changeing pref)
Flags: needinfo?(kjozwiak)
> Sorry for asking again, are you sure that e10s is disabled?
> HttpChannelParent::DoAsyncOpen should not be called if e10s is not on,
> really not - if it is called this is a bigger problem.
> When you disable e10s in preference, have you restarted firefox? (the pref
> is change only if it is restared after changeing pref)

Yup, 100% that it was disabled. I've had the same problem on OSX when I was helping Christoph debug. This is usually the workflow before I test the add-ons:

- pull in the new changes
- apply the comment/patch and build using ./mach build (make sure everything is applied using hg status & hg qapplied)
- ./mach run once the build is finished and disable e10s via about:preferences and close fx
- ./mach run --debug and than "r" under "gdb"
- install the add-ons (restart each time it's installed)
- checked about:preferences once again to make sure that e10s is disabled
- once fx gets into a state where it crashes on startup, I run ./mach run -safe-mode and remove the add-ons (also check and make sure e10s is still disabled)
- once the add-ons have been removed, I go through the same thing once again

Now I'm not sure if there's a build flag I'm supposed to use under .mozconfig if I want to completely disable e10s? Is there a better way to make sure it's being disabled? Initially I thought it could be because ./mach might use different profiles, but when I change "xpinstall.signatures.required;false" so I can install the add-ons, it sticks around so I'm pretty sure it's using the same profile.

Suggestions? Christoph and myself tried to figure out why it was happening but we couldn't figure it out. I've ran into the same thing with OSX. Sometimes I get crashes that included HttpChannelParent (which Chris mentioned are e10s related), but about:preferences shows that the feature has being disabled.
Flags: needinfo?(kjozwiak)
(In reply to Dragana Damjanovic [:dragana] from comment #31)
> HttpChannelParent::DoAsyncOpen should not be called if e10s is not on,
> really not - if it is called this is a bigger problem.

I can confirm that I have seen appearing HttpChannelParent::DoAsyncOpen() in the stacktrace and I am a 100% certain I have switched off e10s. So something must go wrong there as well. Thanks for investigating Dragana. Where you able to reproduce that as well? The stacktraces from Comment 29 also indicates that there is something fundamentally wrong.
Can you please help me debug this, I cannot reproduce it. It is really not good that there is HttpChannelParent on non-e10s build.

Can you set a break point at:
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#1909
This is the place there we decide if it is e10s or not e10s.

and step into function IsNeckoChild()

or just make break inside here http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/NeckoCommon.h#98
and see what value is set to "amChild" variable.

thanks a lot.
Flags: needinfo?(kjozwiak)
(In reply to Dragana Damjanovic [:dragana] from comment #34)
> Can you please help me debug this, I cannot reproduce it. It is really not
> good that there is HttpChannelParent on non-e10s build.
> 
> Can you set a break point at:
> http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> nsHttpHandler.cpp#1909
> This is the place there we decide if it is e10s or not e10s.
> 
> and step into function IsNeckoChild()

Breakpoint 1, mozilla::net::nsHttpHandler::NewProxiedChannel2 (this=0x7fffdc44f800, uri=0x7fffc5f8db00, givenProxyInfo=0x0, proxyResolveFlags=0, proxyURI=0x0, aLoadInfo=
    0x7fffc548fe10, result=0x7fffffff7380) at /home/kamil/code/mozilla-central/netwerk/protocol/http/nsHttpHandler.cpp:1909
1909	    if (IsNeckoChild()) {
(gdb) step
mozilla::net::IsNeckoChild () at ../../dist/include/mozilla/net/NeckoCommon.h:92
92	  if (!didCheck) {
(gdb) p amChild
$1 = false
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007fffea246564 in mozilla::net::nsHttpHandler::NewProxiedChannel2(nsIURI*, nsIProxyInfo*, unsigned int, nsIURI*, nsILoadInfo*, nsIChannel**) at /home/kamil/code/mozilla-central/netwerk/protocol/http/nsHttpHandler.cpp:1909
	breakpoint already hit 3 times
(gdb) del 1
(gdb) continue

After opening a few tabs and closing them right away, I ran into the same crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffea269854 in mozilla::net::HttpChannelParent::DoAsyncOpen (this=this@entry=0x7fffcf293300, aURI=..., aOriginalURI=..., aDocURI=..., aReferrerURI=..., 
    aReferrerPolicy=@0x7fffffffbfe0: 0, aAPIRedirectToURI=..., aTopWindowURI=..., aLoadFlags=@0x7fffffffc008: 6898368, requestHeaders=..., requestMethod=..., uploadStream=..., 
    uploadStreamHasHeaders=@0x7fffffffc038: false, priority=@0x7fffffffc03a: 20, classOfService=@0x7fffffffc03c: 0, redirectionLimit=@0x7fffffffc040: 20 '\024', 
    allowPipelining=@0x7fffffffc041: true, allowSTS=@0x7fffffffc042: true, thirdPartyFlags=@0x7fffffffc044: 4, doResumeAt=@0x7fffffffc048: false, 
    startPos=@0x7fffffffc050: 18446744073709551615, entityID=..., chooseApplicationCache=@0x7fffffffc068: true, appCacheClientID=..., allowSpdy=@0x7fffffffc080: true, 
    allowAltSvc=@0x7fffffffc081: true, aFds=..., aLoadInfoArgs=..., aSynthesizedResponseHead=..., aSecurityInfoSerialization=..., aCacheKey=@0x7fffffffc1d0: 0, 
    aSchedulingContextID=..., aCorsPreflightArgs=..., aInitialRwin=@0x7fffffffc248: 0) at /home/kamil/code/mozilla-central/netwerk/protocol/http/HttpChannelParent.cpp:423
423	  mChannel->SetTimingEnabled(true);

> or just make break inside here
> http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/NeckoCommon.h#98
> and see what value is set to "amChild" variable.

Breakpoint 1, IsNeckoChild () at ../../dist/include/mozilla/net/NeckoCommon.h:98
98	      amChild = XRE_IsContentProcess();
(gdb) whatis amChild
type = bool
(gdb) p amChild
$1 = false
(gdb) continue

After opening a few tabs and closing them right away, I ran into the same crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffea269854 in mozilla::net::HttpChannelParent::DoAsyncOpen (this=this@entry=0x7fffb426cfc0, aURI=..., aOriginalURI=..., aDocURI=..., aReferrerURI=..., 
    aReferrerPolicy=@0x7fffffffbfe0: 0, aAPIRedirectToURI=..., aTopWindowURI=..., aLoadFlags=@0x7fffffffc008: 6898368, requestHeaders=..., requestMethod=..., uploadStream=..., 
    uploadStreamHasHeaders=@0x7fffffffc038: false, priority=@0x7fffffffc03a: 20, classOfService=@0x7fffffffc03c: 0, redirectionLimit=@0x7fffffffc040: 20 '\024', 
    allowPipelining=@0x7fffffffc041: true, allowSTS=@0x7fffffffc042: true, thirdPartyFlags=@0x7fffffffc044: 4, doResumeAt=@0x7fffffffc048: false, 
    startPos=@0x7fffffffc050: 18446744073709551615, entityID=..., chooseApplicationCache=@0x7fffffffc068: true, appCacheClientID=..., allowSpdy=@0x7fffffffc080: true, 
    allowAltSvc=@0x7fffffffc081: true, aFds=..., aLoadInfoArgs=..., aSynthesizedResponseHead=..., aSecurityInfoSerialization=..., aCacheKey=@0x7fffffffc1d0: 0, 
    aSchedulingContextID=..., aCorsPreflightArgs=..., aInitialRwin=@0x7fffffffc248: 0) at /home/kamil/code/mozilla-central/netwerk/protocol/http/HttpChannelParent.cpp:423
423	  mChannel->SetTimingEnabled(true);

Hopefully this helps! Let me know if I've missed something :)
Flags: needinfo?(kjozwiak)
I also checked if e10s was disabled via the browser console using the following method:

- ./mach run --debug
- (gdb) r
- go into about:config and set "devtools.chrome.enabled;true"
- launch the browser console from the about:config tab via the Dev Tools
- Services.appinfo.browserTabsRemoteAutostart --> false
Sorry once again: can you see what  is the value of amChild at the end of function IsNeckoChild (http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/NeckoCommon.h#101)

I am interested if httpChannel = new HttpChannelChild();
 at http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#1910 is call or 2 lines down httpChannel = new nsHttpChannel();


if you print amChild at the line 98 it is still not set so you need to print it after that line.


Thanks a lot!!!
Flags: needinfo?(kjozwiak)
When we do redirect callbacks are moved from one channel to the other and if there is SecWrapChannelEventSink on the old  channel callbacks chain it will be attached to the new channel as well and it should not be attached.
Verifying the check for imgRequest, I have notice this:
[37123] ###!!! ASSERTION: Wrong Document Channel: 'request == mDocumentRequest', file /home/dragana/dragana_work/necko/mozilla-central/uriloader/base/nsDocLoader.cpp, line 1384

I have noScript addon installed as well,(it does a redirect manually from the addon so I wanted to check if this new redirect wrapper does not brake it). 

We should also find an addod that uses nsIHttpChannel.redirectTo just to be sure that we do not break something.
Maybe we should check https Everywhere and donottrackplus as well. I think they do redirects too, but I have not look at their code..
Comment on attachment 8670535 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel_mediate_asynconchannelredirect.patch fix redirect

Review of attachment 8670535 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Dragana for localizing the problem and updating the patch. As a general note I am not sure about the assumptions I made within ::SetNotificationCallbacks and also ::GetNotificationCallbacks, maybe that is not the right thing to do there.

::: netwerk/base/nsSecCheckWrapChannel.cpp
@@ +208,5 @@
> +
> +NS_IMETHODIMP
> +SecWrapChannelEventSink::GetCallbacks(nsIInterfaceRequestor **aCallbacks)
> +{
> +  NS_IF_ADDREF(*aCallbacks = mNotificationCallbacks);

return NS_OK;

::: netwerk/protocol/http/HttpBaseChannel.cpp
@@ +2436,5 @@
> +  nsCOMPtr<nsISecCheckEventSinkWrap> wrap = do_QueryInterface(mCallbacks);
> +  if (wrap) {
> +    wrap->GetCallbacks(getter_AddRefs(callbacks));
> +  }
> +  newChannel->SetNotificationCallbacks(callbacks);

I think ideally we would have a fix within the channelwrapper code, because otherwise we might have to chase so many locations in the codebase (and will definitely miss one or the other).

I think your feedback in comment 39 also indicates that there is another similar issue somewhere else in the codebase.

Jonas, can you have a look at the patch and provide some feedback as well?
Attachment #8670535 - Flags: feedback?(jonas)
Comment on attachment 8663237 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel.patch

> Approval Request Comment
This is going to be the first patch that we need to uplift to aurora and beta so addons implementing their own protocolHandlers do not crash. We already verified that this patch solves many corner cases so the browser does not crash anymore. Please not there are still some other corner cases we are tracing in the other patches within that bug, but we should not wait with uplifting verified fixes.

> [Feature/regressing bug #]:
https://bugzilla.mozilla.org/show_bug.cgi?id=1120487

Please not that we backed out the channelwrapper from 41 within:
https://bugzilla.mozilla.org/show_bug.cgi?id=1205410

> [User impact if declined]:
Users installing addons that implement their own protocolHandlers will crash the browser.

> [Describe test coverage new/current, TreeHerder]:
We manually tested several addons including:
* vwc_cocoon.xpi
* COMPUTERBILD-Abzockschutz-1.0.59.xpi
* https://addons.mozilla.org/firefox/addon/geolocater/ 
and verified fixes of the known issue within onStartRequest/OnStopRequest/OnDataAvailable.

> [Risks and why]:
Low, because the patch only applies to users that install addons which implement their own protocol handlers (on AMO we where able to find ~150 such addons).

> [String/UUID change made/needed]:
no
Attachment #8663237 - Flags: approval-mozilla-beta?
Attachment #8663237 - Flags: approval-mozilla-aurora?
Hey Kamil, interestingly I can't reproduce the issue with 'HttpChannelParent::DoAsyncOpen' when e10s is disabled anymore. Here is a patch which you could apply which would provide a stacktrace which would help us localizing that issue. Thanks again for helping us out here.
Attachment #8670535 - Attachment is obsolete: true
Attachment #8670535 - Flags: feedback?(jonas)
Honza, can you review this. You are the best person to review redirects.
Attachment #8670736 - Flags: feedback?(jonas)
Honza, can you review this. Thanks.
Attachment #8670736 - Attachment is obsolete: true
Attachment #8670736 - Flags: feedback?(jonas)
Flags: needinfo?(honzab.moz)
Attachment #8670738 - Flags: feedback?(jonas)
I am putting it here not to get lost. Probably the reason is new fake LoadInfo

With COMPUTERBILD-Abzockschutz-1.0.59.xpi, I can reproduce this crash:

0x00007fffe98b4b5c in mozilla::BasePrincipal::GetAppId (this=<optimized out>, 
    aAppId=<optimized out>)
    at /home/dragana/dragana_work/necko/mozilla-central/caps/BasePrincipal.cpp:389
389	    MOZ_ASSERT(false);
(gdb) bt
#0  0x00007fffe98b4b5c in mozilla::BasePrincipal::GetAppId (this=<optimized out>, 
    aAppId=<optimized out>)
    at /home/dragana/dragana_work/necko/mozilla-central/caps/BasePrincipal.cpp:389
#1  0x00007fffe913662a in NS_IsAppOffline (principal=<optimized out>)
    at /home/dragana/dragana_work/necko/mozilla-central/netwerk/base/nsNetUtil.cpp:1419
#2  0x00007fffea4dd0f6 in nsHTMLDNSPrefetch::IsAllowed (aDocument=0x7fffca5c2600)
    at /home/dragana/dragana_work/necko/mozilla-central/dom/html/nsHTMLDNSPrefetch.cpp:100
#3  0x00007fffea463e4d in mozilla::dom::HTMLAnchorElement::BindToTree (this=0x7fffcf6ee100, 
    aDocument=<optimized out>, aParent=0x7fffca311060, aBindingParent=0x0, 
    aCompileEventHandlers=<optimized out>)
    at /home/dragana/dragana_work/necko/mozilla-central/dom/html/HTMLAnchorElement.cpp:163
#4  0x00007fffe9c49137 in nsINode::doInsertChildAt (this=0x7fffca311060, aKid=0x7fffcf6ee100, 
    aIndex=5, aNotify=<optimized out>, aChildArray=...)
    at /home/dragana/dragana_work/necko/mozilla-central/dom/base/nsINode.cpp:1589
#5  0x00007fffe990cef1 in nsHtml5TreeOperation::Append (aNode=0x7fffcf6ee100, 
    aParent=0x7fffca311060, aBuilder=aBuilder@entry=0x7fffca565b00)
    at /home/dragana/dragana_work/necko/mozilla-central/parser/html/nsHtml5TreeOperation.cpp:181
#6  0x00007fffe990eaeb in nsHtml5TreeOperation::Perform (this=this@entry=0x7fffc18cbf98, 
    aBuilder=aBuilder@entry=0x7fffca565b00, 
    aScriptElement=aScriptElement@entry=0x7fffffffcc98)
    at /home/dragana/dragana_work/necko/mozilla-central/parser/html/nsHtml5TreeOperation.cpp:647
#7  0x00007fffe9901a5f in nsHtml5TreeOpExecutor::RunFlushLoop (this=0x7fffca565b00)
    at /home/dragana/dragana_work/necko/mozilla-central/parser/html/nsHtml5TreeOpExecutor.cpp:448
#8  0x00007fffe990c0d2 in nsHtml5ExecutorFlusher::Run (this=0x7fffc320fbe0)
    at /home/dragana/dragana_work/necko/mozilla-central/parser/html/nsHtml5StreamParser.cpp:127
#9  0x00007fffe90a91aa in nsThread::ProcessNextEvent (this=0x7ffff6876400, 
    aMayWait=<optimized out>, aResult=0x7fffffffcd9f)
Comment on attachment 8670535 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel_mediate_asynconchannelredirect.patch fix redirect

Review of attachment 8670535 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsSecCheckWrapChannel.cpp
@@ +185,5 @@
> +  // if it wants to get notifications for them.  But we
> +  // need to see these notifications for proper functioning.
> +  if (aIID.Equals(NS_GET_IID(nsIChannelEventSink))) {
> +    mChannelEventSink = do_GetInterface(mNotificationCallbacks);
> +    *aResult = static_cast<nsIChannelEventSink*>(this);

Shouldn't we only return something here if the do_GetInterface above returned a non-null value?

I guess for the most part it doesn't change behavior to have a dummy nsIChannelEventSink which doesn't do anything, so it's probably ok.

But theoretically someone could attach a nsIChannelEventSink to the notification callbacks on the loadgroup, which we wouldn't fall back to if GetInterface on the channel succeeds.

@@ +189,5 @@
> +    *aResult = static_cast<nsIChannelEventSink*>(this);
> +    NS_ADDREF_THIS();
> +    return NS_OK;
> +  }
> +  return QueryInterface(aIID, aResult);

Shouldn't we defer to mNotificationCallbacks here? I.e.

return mNotificationCallbacks->GetInterface(aIID, aResult);

@@ +198,5 @@
> +                                                nsIChannel* newChannel,
> +                                                uint32_t redirFlags,
> +                                                nsIAsyncVerifyRedirectCallback *cb)
> +{
> +  if (!mChannelEventSink) {

If you do the above, then mChannelEventSink should always be non-null here I think.

::: netwerk/protocol/http/HttpBaseChannel.cpp
@@ +2436,5 @@
> +  nsCOMPtr<nsISecCheckEventSinkWrap> wrap = do_QueryInterface(mCallbacks);
> +  if (wrap) {
> +    wrap->GetCallbacks(getter_AddRefs(callbacks));
> +  }
> +  newChannel->SetNotificationCallbacks(callbacks);

Hmm.. yeah.. this doesn't feel right.

What if the channel that we redirect to is also getting a wrapper?

I wonder if the right solution here would be to make SecWrapChannelEventSink::AsyncOnChannelRedirect only replace the first argument with mOuterChannel if oldChannel==<the wrapped channel>.

So make SecWrapChannelEventSink::AsyncOnChannelRedirect something like

return mChannelEventSink->AsyncOnChannelRedirect(
  oldChannel = mInnerChannel ? mOuterChannel : oldChannel,
  newChannel, redirFlags, cb);
Attachment #8670535 - Attachment is obsolete: false
The patch above was obsolete and I canceled feedback request, because, I did another better version that I have marked for feedback where this all is fixed.
Attachment #8670535 - Attachment is obsolete: true
Attachment #8670740 - Flags: feedback?(jonas)
>>>> LOCALIZING THE PROBLEM OF HTTPCHANNELPARENT::DOASYNCOPEN() <<<<

For lots of our testing we disabled e10s but still got HttpChannelParent::foo() and such. Reason is that we do thumbnail capturing in the background which fires up the e10s machinery. If an addon uses it's own protoclHandler then the following line becomes problematic:

> mChannel = static_cast<nsHttpChannel *>(channel.get());
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelParent.cpp#422

Which is also why we have seen slightly different crashes, because the static_cast succeeds but the memory access shortly after crashes, but not deterministic.

I think this static casting might also be problematic without our channelwrapper, so we should get that fixed. But I don't know how at this point. Dragana any suggestions?


Steps to reproduce:
* Fire up Firefox using the following command > MOZ_DEBUG_CHILD_PROCESS=1 ./firefox -P -no-remote
* Got to a webpage you have not been before
* Hit Ctrl+T to open a new tab
* Now you should see something like:
    CHILDCHILDCHILDCHILD
    debug me @ 12209
* Open a new tab in your console, fire up gdb an attach to that id > attach 12209
* That should already crash; if not, visit another webpage you have never visited before and open a new tab again
Flags: needinfo?(dd.mozilla)
I think we can dramatically simplify things here.

Rather than trying to make it work to have a nsHttpChannel which is wrapped with a secwrapper, we can do something like the following:

1. Call protocolHandler->NewChannel2
2. If it succeeds, return the resulting channel.
3. Call protocolHandler->NewChannel
4. Check if the resulting channel is an nsHttpChannel
5. If it is, just call SetLoadInfo on it and return the nsHttpChannel
6. Wrap the channel in a secwrapper

This would mean that a addon-provided protocolHandler which sometimes calls into the gecko http-protocolhandler to return a gecko-provided nsHttpChannel, wouldn't result in a wrapper being created. This is safe since the gecko-provided nsHttpChannel has a perfectly working implementation of AsyncOpen2.

What I'm not 100% sure of is how to do step 4. I.e. how do we check that a given nsIChannel is a gecko nsHttpChannel? The easiest way would be to QI to some non-scriptable interface only implemented by nsHttpChannel.

I'm not sure if such an interface already exists, or if we should add a new one.
> For lots of our testing we disabled e10s but still got
> HttpChannelParent::foo() and such. Reason is that we do thumbnail capturing
> in the background which fires up the e10s machinery. If an addon uses it's
> own protoclHandler then the following line becomes problematic:
> 
> > mChannel = static_cast<nsHttpChannel *>(channel.get());
> http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> HttpChannelParent.cpp#422

Looks like that's definitely the case. Disabling thumbnail capturing via the "browser.pagethumbnails.capturing_disabled;true" pref completely stops the crashes that I was previously getting in comment # 11 and comment # 28. I checked both OSX and Ubuntu and couldn't get the crashes to happen after I applied the pref.
Flags: needinfo?(kjozwiak)
(In reply to Jonas Sicking (:sicking) from comment #51)
> I think we can dramatically simplify things here.
> 
> Rather than trying to make it work to have a nsHttpChannel which is wrapped
> with a secwrapper, we can do something like the following:
> 
> 1. Call protocolHandler->NewChannel2
> 2. If it succeeds, return the resulting channel.
> 3. Call protocolHandler->NewChannel
> 4. Check if the resulting channel is an nsHttpChannel
> 5. If it is, just call SetLoadInfo on it and return the nsHttpChannel
> 6. Wrap the channel in a secwrapper
> 
> This would mean that a addon-provided protocolHandler which sometimes calls
> into the gecko http-protocolhandler to return a gecko-provided
> nsHttpChannel, wouldn't result in a wrapper being created. This is safe
> since the gecko-provided nsHttpChannel has a perfectly working
> implementation of AsyncOpen2.
> 
> What I'm not 100% sure of is how to do step 4. I.e. how do we check that a
> given nsIChannel is a gecko nsHttpChannel? The easiest way would be to QI to
> some non-scriptable interface only implemented by nsHttpChannel.
> 
> I'm not sure if such an interface already exists, or if we should add a new
> one.

There is nsIHttpChannelInternal that is scriptable but is should be used only internally
(an another that can be made noscrip)
And just realize if some addon implements a ftp handler current code is going to make a mess.
the ptch is comming
/s/The patch i coming.
Flags: needinfo?(dd.mozilla)
Attachment #8671307 - Flags: feedback?(jonas)
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #50)
> >>>> LOCALIZING THE PROBLEM OF HTTPCHANNELPARENT::DOASYNCOPEN() <<<<
> 
> For lots of our testing we disabled e10s but still got
> HttpChannelParent::foo() and such. Reason is that we do thumbnail capturing
> in the background which fires up the e10s machinery.


I will open another bug for this.


> > mChannel = static_cast<nsHttpChannel *>(channel.get());
> http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> HttpChannelParent.cpp#422
> 
> Which is also why we have seen slightly different crashes, because the
> static_cast succeeds but the memory access shortly after crashes, but not
> deterministic.
> 
> I think this static casting might also be problematic without our
> channelwrapper, so we should get that fixed. But I don't know how at this
> point. Dragana any suggestions?
> 

I will open another bug for this.
Christoph, this bug is a bit confusing.
You confirm that you do want to uplift this patch to aurora & beta? Thanks
Flags: needinfo?(mozilla)
Comment on attachment 8671307 [details] [diff] [review]
bug_1206199_check_if_addonProtocolHandler_isnsHttpChannel.patch

Review of attachment 8671307 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsIForcePendingChannel.idl
@@ +8,5 @@
>   * nsIForcePending interface exposes a function that enables overwriting of the normal 
>   * behavior for the channel's IsPending(), forcing 'true' to be returned.
>   */
>  
> +[noscript, uuid(2ac3e1ca-049f-44c3-a519-f0681f51e9b1)]

why exactly this change?

::: netwerk/base/nsIOService.cpp
@@ +770,5 @@
> +                do_QueryInterface(channel);
> +            nsCOMPtr<nsIForcePendingChannel> isGeckoChannel =
> +                do_QueryInterface(channel);
> +            if (httpChannelInt && isGeckoChannel) {
> +                channel->SetLoadInfo(aLoadInfo);

so, to make sure I understand, the goal of this whole enterprise, involving nsSecCheckWrapChannel, is to make sure that AsyncOpen2 on the returned channel is implemented and calls nsContentSecurityManager::doContentSecurityCheck just once, right?

@@ +775,5 @@
> +            } else if (httpChannelInt) {
> +                // If it is http channel we can wrap it.
> +                // we have to wrap that channel
> +                channel = new nsSecCheckWrapChannel(channel, aLoadInfo);
> +            }

Hmm... I'm not sure I follow what you are trying to do here.

I would expect you want to simplify stuff and NOT wrap nsHttpChannel (as suggested in comment 51).  This code does something else tho (the opposite).


I could imagine a following method:

static already_AddRefed<nsIChannel> nsSecCheckWrapChannel::MaybeWrap(nsIChannel* aChannel, nsILoadInfo* aLoadInfo);

that would inspect the channel instance and either return the channel directly or wrap it with nsSecCheckWrapChannel.
Attachment #8671307 - Flags: review-
Comment on attachment 8670738 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel_mediate_asynconchannelredirect_v2.patch

Review of attachment 8670738 [details] [diff] [review]:
-----------------------------------------------------------------

This patch needs explanation why the callbacks need to be wrapped at all.  I cannot find any in the code nor in the bug.

::: netwerk/base/nsSecCheckWrapChannel.cpp
@@ +192,5 @@
> +    mChannelEventSink = do_GetInterface(mNotificationCallbacks);
> +    *aResult = static_cast<nsIChannelEventSink*>(this);
> +    NS_ADDREF_THIS();
> +    return NS_OK;
> +  } else if (aIID.Equals(NS_GET_IID(nsIRedirectResultListener))) {

else after return

@@ +215,5 @@
> +  uint32_t redirFlags,
> +  nsIAsyncVerifyRedirectCallback *cb)
> +{
> +  if (!mChannelEventSink) {
> +    return NS_OK;

you must call the callback when you return NS_OK

@@ +219,5 @@
> +    return NS_OK;
> +  }
> +  mNewRedirectChannel = newChannel;
> +  return mChannelEventSink->AsyncOnChannelRedirect(
> +    (mOuterChannel) ? mOuterChannel.get() : oldChannel,

.get() is IMO not necessary

btw, how would it ever happen there were no mOuterChannel at the first place?

@@ +229,5 @@
> +NS_IMETHODIMP
> +SecWrapChannelEventSink::OnRedirectResult(bool aSucceeded)
> +{
> +  if (!mNewRedirectChannel) {
> +    aSucceeded = false;

you cannot override this at all, why are you doing it?
Attachment #8670738 - Flags: review-
Comment on attachment 8670740 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel_mediate_asynconchannelredirect_withoutRedirecResultCallbacks.patch

Review of attachment 8670740 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsSecCheckWrapChannel.cpp
@@ +224,5 @@
> +
> +  if (oldChannel != mInnerChannel) {
> +    // If a protocol handler does not call OnRedirectResult, but finish
> +    // redirect properly oldChannel and this will not match and we can do as
> +    // if we go OnRedirectResult(true).

sorry, the comment doesn't make much sense to me.

@@ +226,5 @@
> +    // If a protocol handler does not call OnRedirectResult, but finish
> +    // redirect properly oldChannel and this will not match and we can do as
> +    // if we go OnRedirectResult(true).
> +    mOuterChannel = nullptr;
> +    mNewRedirectChannel = nullptr;

why nullifying when you reset it a line below?

the code doesn't make much sense
Attachment #8670740 - Flags: review-
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #58)
> Comment on attachment 8671307 [details] [diff] [review]
> bug_1206199_check_if_addonProtocolHandler_isnsHttpChannel.patch
> 
> Review of attachment 8671307 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/base/nsIForcePendingChannel.idl
> @@ +8,5 @@
> >   * nsIForcePending interface exposes a function that enables overwriting of the normal 
> >   * behavior for the channel's IsPending(), forcing 'true' to be returned.
> >   */
> >  
> > +[noscript, uuid(2ac3e1ca-049f-44c3-a519-f0681f51e9b1)]
> 
> why exactly this change?

It was a Jonas' suggestion, that we need a non-scriptable interface that we can check. If a return channel implements a non-scriptable interface that the gecko nsHttpChannel implements it must be a gecko nsHttpChannel. This makes sense to me.

Instead of making a new interface I used nsIForcePendingChannel that can be made non-scriptable since it is only used in the native code.


> 
> ::: netwerk/base/nsIOService.cpp
> @@ +770,5 @@
> > +                do_QueryInterface(channel);
> > +            nsCOMPtr<nsIForcePendingChannel> isGeckoChannel =
> > +                do_QueryInterface(channel);
> > +            if (httpChannelInt && isGeckoChannel) {
> > +                channel->SetLoadInfo(aLoadInfo);
> 
> so, to make sure I understand, the goal of this whole enterprise, involving
> nsSecCheckWrapChannel, is to make sure that AsyncOpen2 on the returned
> channel is implemented and calls
> nsContentSecurityManager::doContentSecurityCheck just once, right?

Yes the goal is to make sure AsyncOpen2 is implemented.

> 
> @@ +775,5 @@
> > +            } else if (httpChannelInt) {
> > +                // If it is http channel we can wrap it.
> > +                // we have to wrap that channel
> > +                channel = new nsSecCheckWrapChannel(channel, aLoadInfo);
> > +            }
> 
> Hmm... I'm not sure I follow what you are trying to do here.
> 
> I would expect you want to simplify stuff and NOT wrap nsHttpChannel (as
> suggested in comment 51).  This code does something else tho (the opposite).
> 

Yes but if it is not gecko nsHttpChannel we do want to wrap it and that is what this code is doing. Here I am just making sure that it is HttpChannel and not FtpChannel.
> 
> I could imagine a following method:
> 
> static already_AddRefed<nsIChannel>
> nsSecCheckWrapChannel::MaybeWrap(nsIChannel* aChannel, nsILoadInfo*
> aLoadInfo);
> 
> that would inspect the channel instance and either return the channel
> directly or wrap it with nsSecCheckWrapChannel.

I  can do that.
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #59)
> Comment on attachment 8670738 [details] [diff] [review]
> bug_1206199_fix_nssecwrapchannel_mediate_asynconchannelredirect_v2.patch
> 
> Review of attachment 8670738 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch needs explanation why the callbacks need to be wrapped at all.  I
> cannot find any in the code nor in the bug.


I can add explanation.
The reason for overwriting is that because of wrapping  the old channel is the gecko nsHttpChannel and the listener olny knows the wrapper. (e.g. "[5542] ###!!! ASSERTION: Got a channel redirect for an unknown channel!: 'mChannel == oldChannel', file /home/kamil/code/mozilla-central/image/imgRequest.cpp, line 1205")
> 
> ::: netwerk/base/nsSecCheckWrapChannel.cpp
> @@ +192,5 @@
> > +    mChannelEventSink = do_GetInterface(mNotificationCallbacks);
> > +    *aResult = static_cast<nsIChannelEventSink*>(this);
> > +    NS_ADDREF_THIS();
> > +    return NS_OK;
> > +  } else if (aIID.Equals(NS_GET_IID(nsIRedirectResultListener))) {
> 
> else after return

why?

> 
> @@ +219,5 @@
> > +    return NS_OK;
> > +  }
> > +  mNewRedirectChannel = newChannel;
> > +  return mChannelEventSink->AsyncOnChannelRedirect(
> > +    (mOuterChannel) ? mOuterChannel.get() : oldChannel,
> 
> .get() is IMO not necessary
> 

() ? : does not like it when they are not of the same type.

> btw, how would it ever happen there were no mOuterChannel at the first place?

If we have redirect, notiificationCallback are transfer from old channel to the new one in the gecko nsHttpChannel that does not know that a wrapper exists. So this listener will be transfer too. As soon as redirect is finished (that is why I am using RedirectResult to know when redirect is finished) mOuterChannel should be set too nullptr and this listener will do nothing and send calls further down.

If we do not set mOuterChannel to nullptr, in a case of a redirect of a redirected channel a wrong mOuterChannel will be set. 

The next patch does a bit of a hack (but I do not know how to do it differently): If RedirectResult are not called we will not know whether redirect succeeded or not whether we need to set mOuterChannel to nullptr. So the only way to figure out that something is wrong its if oldChannel does not match to wat we expect it to be.

> 
> @@ +229,5 @@
> > +NS_IMETHODIMP
> > +SecWrapChannelEventSink::OnRedirectResult(bool aSucceeded)
> > +{
> > +  if (!mNewRedirectChannel) {
> > +    aSucceeded = false;
> 
> you cannot override this at all, why are you doing it?
(In reply to Sylvestre Ledru [:sylvestre] from comment #57)
> Christoph, this bug is a bit confusing.
> You confirm that you do want to uplift this patch to aurora & beta? Thanks

Hey Sylveste, sorry for the confusion. Let me try to clarify: What's happening is that there are several different problems we need to overcome from a technical perspective. Hence we will need different patches which *all* need to be uplifted to aurora and beta.

I tried to mention that in my uplift request in Comment 42, whenever we have a patch ready which definitely fixes one of those problems we should uplift that patch already because it already fixes known issues and hence get a little more bake time on aurora and beta.

To answer your question: I confirm we want to uplift that patch to aurora and beta. There will be more patches in this bug in the near future. I keep an eye on all of them and as soon as one is ready that needs uplifting, I will reach out to your directly. Thanks!
Flags: needinfo?(mozilla)
Flags: needinfo?(jduell.mcbugs)
Christoph, from a QA perspective, is the patch from comment # 42 currently the most stable one to test with? Should it be combined with the patch from comment # 4?
Flags: needinfo?(mozilla)
Honze can you review patch bug_1206199_check_if_addonProtocolHandler_isnsHttpChannel.patch
Attachment #8671307 - Attachment is obsolete: true
Attachment #8671307 - Flags: feedback?(jonas)
Flags: needinfo?(honzab.moz)
Trying to uplift several patches one by one in a same bug won't scale for Release management and the sheriff.
We have two solutions here:
* open a bug per issues addressed
* request the uplift to all patches at once.

I would prefer the first solution.
Attachment #8670738 - Attachment is obsolete: true
Attachment #8670738 - Flags: feedback?(jonas)
Attachment #8670740 - Attachment is obsolete: true
Attachment #8670740 - Flags: feedback?(jonas)
Attachment #8672955 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
Depends on: 1214122
Attachment #8670068 - Attachment is obsolete: true
Flags: needinfo?(mozilla)
Attachment #8670608 - Attachment is obsolete: true
(In reply to Sylvestre Ledru [:sylvestre] from comment #66)
> Trying to uplift several patches one by one in a same bug won't scale for
> Release management and the sheriff.
> We have two solutions here:
> * open a bug per issues addressed
> * request the uplift to all patches at once.
> 
> I would prefer the first solution.

Sounds reasonable to me. We can create different patches for different issues. Dragana already created Bug 1214122 for one of them. I marked all of the patches within this bug as obsolete so we can avoid any kind of confusion. The patch that still has the uplift-flags would need to uplifted for this patch though.

Sounds good?
Flags: needinfo?(sledru)
Comment on attachment 8663237 [details] [diff] [review]
bug_1206199_fix_nssecwrapchannel.patch

This is perfect, thanks!
Flags: needinfo?(sledru)
Attachment #8663237 - Flags: approval-mozilla-beta?
Attachment #8663237 - Flags: approval-mozilla-beta+
Attachment #8663237 - Flags: approval-mozilla-aurora?
Attachment #8663237 - Flags: approval-mozilla-aurora+
(In reply to Kamil Jozwiak [:kjozwiak] from comment #64)
> Christoph, from a QA perspective, is the patch from comment # 42 currently
> the most stable one to test with? Should it be combined with the patch from
> comment # 4?

Sorry, it cleared my needinfo earlier already. Anyway, we should get some more patches landed on aurora and beta. Once we have all those landed and have a stable-version I will reach out to you again and we should do some final testing to make sure we haven't missed anything. Thanks for your help.
(In reply to Dragana Damjanovic [:dragana] from comment #61)
> (In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #58)
> > @@ +775,5 @@
> > > +            } else if (httpChannelInt) {
> > > +                // If it is http channel we can wrap it.
> > > +                // we have to wrap that channel
> > > +                channel = new nsSecCheckWrapChannel(channel, aLoadInfo);
> > > +            }
> > 
> > Hmm... I'm not sure I follow what you are trying to do here.
> > 
> > I would expect you want to simplify stuff and NOT wrap nsHttpChannel (as
> > suggested in comment 51).  This code does something else tho (the opposite).
> > 
> 
> Yes but if it is not gecko nsHttpChannel we do want to wrap it 

Wrap <=> not gecko?  Or how is the condition exactly?  Because this code does something else.

We want to make sure AsyncOpen2 is implemented (what in case of a gecko-channel is) and there is load info set on the channel - what we can do simply with SetLoadInfo when it's a gecko-channel.  Right?  So I don't understand.  Anyway, your new patch in bug 1214122 seems to do it correctly.

> and that is
> what this code is doing. Here I am just making sure that it is HttpChannel
> and not FtpChannel.

Hmm.. you are checking whether the channel implementes nsIHttpChannelInternal.  That is definitely a scriptable interface that our nsHttpChannel implements.  Add-ons may implement it too.  But also may not to, right?  

I think you want to check nsIHttpChannel interface, no?

(In reply to Dragana Damjanovic [:dragana] from comment #62)
> I can add explanation.
> The reason for overwriting is that because of wrapping  the old channel is
> the gecko nsHttpChannel and the listener olny knows the wrapper. (e.g.
> "[5542] ###!!! ASSERTION: Got a channel redirect for an unknown channel!:
> 'mChannel == oldChannel', file
> /home/kamil/code/mozilla-central/image/imgRequest.cpp, line 1205")

Ah.  Yes..  The notification origins from nsHttpChannel.  It puts itself to the "old-channel" arg to AsyncOnChannelRedirect.  But listeners of AsyncOnChannelRedirect expect the outer channel (=the wrapper) as "old-channel".

I'm afraid you still need this nc wrapper.

> > else after return
> 
> why?

Wrong:

if (...) {
  ...
  return;
} else {
  do_something_else();
}

Correct:

if (...) {
  ...
  return;
}

do_something_else();


> > btw, how would it ever happen there were no mOuterChannel at the first place?
> 
> If we have redirect, notiificationCallback are transfer from old channel to
> the new one in the gecko nsHttpChannel that does not know that a wrapper
> exists. So this listener will be transfer too. As soon as redirect is
> finished (that is why I am using RedirectResult to know when redirect is
> finished) mOuterChannel should be set too nullptr and this listener will do
> nothing and send calls further down.
> 
> If we do not set mOuterChannel to nullptr, in a case of a redirect of a
> redirected channel a wrong mOuterChannel will be set. 

Got it!  This all had to be in put to the code as a comment :)

> 
> The next patch does a bit of a hack (but I do not know how to do it
> differently): If RedirectResult are not called 

That is ensured to definitely be called after the redirect veto async questioning is done, always.

> we will not know whether
> redirect succeeded or not whether we need to set mOuterChannel to nullptr.
> So the only way to figure out that something is wrong its if oldChannel does
> not match to wat we expect it to be.

That seems weird to me, if I follow your explanation (which I probably don't).
Flags: needinfo?(madperson)
Depends on: 1214786
Blocks: 1216214
No longer blocks: 1216214
Summary: Fix nsSecCheckWrapChannel → Extend channelwrapper to mediate OnStartRequest, OnStopRequest, OnDataAvailable
I created bug 1216214 to track all the needed changes for the channelwrapper to work in 42. Since all the patches in this bug landed, we can close this bug now.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Blocks: 1216214
No longer blocks: 1216214
No longer depends on: 1214122, 1214786
Blocks: 1216214
No longer blocks: 1120487, 1205410
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.