Tab abnormally stops loading content or app is abnormally killed when viewing certain wired.com articles

VERIFIED FIXED in Firefox 21

Status

Firefox OS
General
VERIFIED FIXED
5 years ago
4 years ago

People

(Reporter: jsmith, Assigned: jduell)

Tracking

unspecified
B2G C4 (2jan on)
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

Details

Attachments

(3 attachments)

(Reporter)

Description

5 years ago
Break out of bug 832796 for the tab abnormally stopping load based on https://bugzilla.mozilla.org/show_bug.cgi?id=832796#c20.

Jason indicates the following:

"This means code *using* an HTTP channel was misbehaving by not passing security info (i.e. not setting a TabChild as the channel callbacks).   Not a necko bug.  We need to find the offending client code and fork a bug to fix it."

Logcats can be found on that bug for more info.
(Reporter)

Comment 1

5 years ago
I think I've seen this bug as well in app mode, so nominating for the same reasons why bug 832796 was +.
blocking-b2g: --- → tef?
Component: Gaia::Browser → General
QA Contact: nhirata.bugzilla
(Reporter)

Updated

5 years ago
Summary: Tab abnormally stops loading content from certain wired.com articles → Tab abnormally stops loading content or app is abnormally killed when viewing certain wired.com articles
blocking-b2g: tef? → tef+

Comment 2

5 years ago
I'll try to find the problem tomorrow.
Assignee: nobody → josh

Comment 3

5 years ago
Hot damn:

Breakpoint 3, mozilla::net::PNeckoChild::SendPHttpChannelConstructor (this=0x4276e680, actor=0x4397c000, browser=0x0, loadContext=...) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/ipc/ipdl/PNeckoChild.cpp:253
253	{

#0  mozilla::net::PNeckoChild::SendPHttpChannelConstructor (this=0x4276e680, actor=0x4397c000, browser=0x0, loadContext=...) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/ipc/ipdl/PNeckoChild.cpp:253

#1  0x404b3fda in mozilla::net::HttpChannelChild::AsyncOpen (this=0x4397c000, listener=<value optimized out>, aContext=<value optimized out>)
    at /run/media/jdm/ssd/b2g/B2G/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1043

#2  0x40684ef6 in nsScriptLoader::StartLoad (this=0x43971c40, aRequest=0x4391a340, aType=<value optimized out>) at /run/media/jdm/ssd/b2g/B2G/gecko/content/base/src/nsScriptLoader.cpp:330

#3  0x40684f88 in nsScriptLoader::PreloadURI (this=0x43971c40, aURI=<value optimized out>, aCharset=..., aType=..., aCrossOrigin=...) at /run/media/jdm/ssd/b2g/B2G/gecko/content/base/src/nsScriptLoader.cpp:1250

#4  0x40849c24 in nsHtml5TreeOpExecutor::PreloadScript (this=<value optimized out>, aURL=<value optimized out>, aCharset=..., aType=..., aCrossOrigin=...)
    at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5TreeOpExecutor.cpp:1058
#5  0x4084cc60 in nsHtml5SpeculativeLoad::Perform (this=<value optimized out>, aExecutor=0x45c47b50) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5SpeculativeLoad.cpp:35
#6  0x4084a3ea in nsHtml5TreeOpExecutor::RunFlushLoop (this=0x45c47b50) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5TreeOpExecutor.cpp:514

#7  0x4084c286 in nsHtml5ExecutorFlusher::Run (this=<value optimized out>) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5StreamParser.cpp:127

#8  0x40c191b2 in nsThread::ProcessNextEvent (this=0x41a06880, mayWait=<value optimized out>, result=0xbec64fd7) at /run/media/jdm/ssd/b2g/B2G/gecko/xpcom/threads/nsThread.cpp:620
#9  0x40bf95de in NS_ProcessNextEvent_P (thread=0x43971c40, mayWait=false) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:237
#10 0x40b16c00 in mozilla::ipc::MessagePump::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:82
#11 0x40b16cb2 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:231

#12 0x40c3af90 in MessageLoop::RunInternal (this=0x1000000) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:215
#13 0x40c3b046 in MessageLoop::RunHandler (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#14 MessageLoop::Run (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:182
#15 0x40a9d6f0 in nsBaseAppShell::Run (this=0x4323f040) at /run/media/jdm/ssd/b2g/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:163

#16 0x4043c4dc in XRE_RunAppShell () at /run/media/jdm/ssd/b2g/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:646
#17 0x40b16c80 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:198

#18 0x40c3af90 in MessageLoop::RunInternal (this=0x4323f040) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:215
#19 0x40c3b046 in MessageLoop::RunHandler (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#20 MessageLoop::Run (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:182
#21 0x4043c880 in XRE_InitChildProcess (aArgc=<value optimized out>, aArgv=<value optimized out>, aProcess=GeckoProcessType_Content) at /run/media/jdm/ssd/b2g/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:485
#22 0x00008410 in main (argc=5, argv=0xbec65a44) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/app/MozillaRuntimeMain.cpp:48

Comment 4

5 years ago
It's interesting. I've seen failures in SendPHttpChannel, GetCookieString, SetCookieString, and WebSocketChild::SendAsyncOpen all stemming from the same problem, where the PBrowser argument is null. Having caught the websocket one in gdb, I can see that we actually have a loadgroup, but are surprisingly not able to obtain an nsITabChild from it. Boris, do you have any suggestions for how to figure out why this is?
Flags: needinfo?(bzbarsky)

Comment 5

5 years ago
I sorted this out with Boris today: it looks like loads are starting from pages/frames after the containing page has already been destroyed, so there's no tree owner for the docshell to retrieve the nsITabChild from. This presumably isn't a problem with non-IPC channels; the only good solution I see here is to abort the AsyncOpen call if a null nsITabChild is encountered. The downside here is that we lose the high visibility failure case for when our channels are simply set up incorrectly, but the alternative is a very ill-defined investigation into how to keep the loads from starting in the first place.
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 6

5 years ago
> the only good solution I see here is to abort the AsyncOpen call if a null 
> nsITabChild is encountered

I was thinking the same thing.  Just return NS_ERROR_ILLEGAL_VALUE or something.  I think synchronous failure here should be ok--it's certainly easier to code.
(Assignee)

Comment 7

5 years ago
We could potentially throw in a MOZ_ASSERT(TabChild) here too, and then we'd catch other cases where we don't setup the channels correctly.  Of course, if we have cases in our test harness that hit this because of the containing-page-has-gone-away issue, we'd get random oranges.  But I suspect we don't have any, so try throwing it in and let's see if it passes try OK.

I also think issuing a printf_stderr("WARNING: child channel tried to open channel w/o security info" msg is worth issuing here too).

I have a patch that does some of the infrastructure here. jdm is off IRC.  At the risk of duplicating effort I'm going to steal this and just finish that patch.
Assignee: josh → jduell.mcbugs
(Assignee)

Comment 8

5 years ago
Created attachment 706233 [details] [diff] [review]
part1: infrastructure

Move the usingSecurity variables out from being static in NeckoParent/Child so we can check in other files whether security is being used or not.  Could probably do a fix with less lines of code this is pretty much risk free, just boilerplate.
Attachment #706233 - Flags: review?(josh)
(Assignee)

Comment 9

5 years ago
Created attachment 706234 [details] [diff] [review]
Part 2: don't kill child in release builds if navigation causes missing TabChild

This doesn't cover the appcache (see OfflineCacheUpdateChild.cpp, etc)--I need to look into that tomorrow.
Attachment #706234 - Flags: review?(josh)
Comment on attachment 706234 [details] [diff] [review]
Part 2: don't kill child in release builds if navigation causes missing TabChild

Review of attachment 706234 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/wyciwyg/WyciwygChannelChild.cpp
@@ +602,5 @@
>    URIParams originalURI;
>    SerializeURI(mOriginalURI, originalURI);
>  
>    mozilla::dom::TabChild* tabChild = GetTabChild(this);
> +  if (MissingRequiredTabChild(tabChild, "http")) {

wyciwyg
Attachment #706234 - Flags: review?(josh) → review+

Updated

5 years ago
Attachment #706233 - Flags: review?(josh) → review+
I'll fix this up and land both patches. Also, I looked at OfflineCacheUpdateChild and it already has null tabchild detection and synchronous abort, so I'm updating it to use the same mechanism.
(Assignee)

Comment 14

5 years ago
Created attachment 706714 [details] [diff] [review]
Part 3:  stop using static var per compilation unit

jdm: how do you feel about landing this too as part of this bug?  I realized that by putting static vars inside an inline function we'll wind up creating a copy of them per compilation unit, which seems... unintuitive.  GCC at least isn't creating the static in compilation units that don't use the function, but I could imagine other compilers might.  Anyway, seems messy. 

Can also get split out into a new bug, but it's only been a day since we landed part 1/2.

Also: have you checked to make sure the JS TCP socket API won't kill the child if we try to open one when the containing page has gone away?
Attachment #706714 - Flags: review?(josh)
(Assignee)

Comment 15

5 years ago
This and bug 832796 need to be verified (i.e. no more crashes in B2G on wired.com?)
Keywords: qawanted
(Reporter)

Updated

5 years ago
Keywords: qawanted → verifyme
QA Contact: jsmith
Comment on attachment 706714 [details] [diff] [review]
Part 3:  stop using static var per compilation unit

Review of attachment 706714 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/NeckoCommon.cpp
@@ +11,5 @@
> +namespace net {
> +
> +namespace NeckoCommonInternal {
> +  bool securityDisabled = true;
> +  bool registeredBool = false;

Use a g prefix for the name, please.
Attachment #706714 - Flags: review?(josh) → review+
(In reply to Jason Duell (:jduell) from comment #14)
> Also: have you checked to make sure the JS TCP socket API won't kill the
> child if we try to open one when the containing page has gone away?

Good catch. I just looked, and we don't actually use the necko security stuff o.O We just merrily go on our way in the parent if no PBrowser is present. I'll fix that.

Updated

5 years ago
Blocks: 835038
(Reporter)

Updated

5 years ago
Whiteboard: [needs-b2g-v1.0 uplift]
(Assignee)

Comment 21

5 years ago
jdm: I'm guessing we should fork the TCP socket fix to another bug at this point.

Landed part3 on v1.0.0: 

  https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/dc3e991c5ef0

Since I don't see a tracking flags for b2gv1.0.0, I'm assuming clearing the whiteboard is all I need to do for that.
Whiteboard: [needs-b2g-v1.0 uplift]
status-firefox19: --- → wontfix
status-firefox20: --- → wontfix
Target Milestone: --- → B2G C4 (2jan on)
status-b2g18-v1.0.0: --- → fixed
(Reporter)

Comment 22

5 years ago
lgtm on 1/30 build.
Status: RESOLVED → VERIFIED
Keywords: verifyme
status-b2g18-v1.0.1: --- → fixed
Depends on: 959391
You need to log in before you can comment on or make changes to this bug.