Closed Bug 833935 Opened 11 years ago Closed 11 years ago

Tab abnormally stops loading content or app is abnormally killed when viewing certain wired.com articles

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

VERIFIED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: jsmith, Assigned: jduell.mcbugs)

References

Details

Attachments

(3 files)

Break out of bug 832796 for the tab abnormally stopping load based on https://bugzilla.mozilla.org/show_bug.cgi?id=832796#c20.

Jason indicates the following:

"This means code *using* an HTTP channel was misbehaving by not passing security info (i.e. not setting a TabChild as the channel callbacks).   Not a necko bug.  We need to find the offending client code and fork a bug to fix it."

Logcats can be found on that bug for more info.
I think I've seen this bug as well in app mode, so nominating for the same reasons why bug 832796 was +.
blocking-b2g: --- → tef?
Component: Gaia::Browser → General
QA Contact: nhirata.bugzilla
Summary: Tab abnormally stops loading content from certain wired.com articles → Tab abnormally stops loading content or app is abnormally killed when viewing certain wired.com articles
blocking-b2g: tef? → tef+
I'll try to find the problem tomorrow.
Assignee: nobody → josh
Hot damn:

Breakpoint 3, mozilla::net::PNeckoChild::SendPHttpChannelConstructor (this=0x4276e680, actor=0x4397c000, browser=0x0, loadContext=...) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/ipc/ipdl/PNeckoChild.cpp:253
253	{

#0  mozilla::net::PNeckoChild::SendPHttpChannelConstructor (this=0x4276e680, actor=0x4397c000, browser=0x0, loadContext=...) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/ipc/ipdl/PNeckoChild.cpp:253

#1  0x404b3fda in mozilla::net::HttpChannelChild::AsyncOpen (this=0x4397c000, listener=<value optimized out>, aContext=<value optimized out>)
    at /run/media/jdm/ssd/b2g/B2G/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1043

#2  0x40684ef6 in nsScriptLoader::StartLoad (this=0x43971c40, aRequest=0x4391a340, aType=<value optimized out>) at /run/media/jdm/ssd/b2g/B2G/gecko/content/base/src/nsScriptLoader.cpp:330

#3  0x40684f88 in nsScriptLoader::PreloadURI (this=0x43971c40, aURI=<value optimized out>, aCharset=..., aType=..., aCrossOrigin=...) at /run/media/jdm/ssd/b2g/B2G/gecko/content/base/src/nsScriptLoader.cpp:1250

#4  0x40849c24 in nsHtml5TreeOpExecutor::PreloadScript (this=<value optimized out>, aURL=<value optimized out>, aCharset=..., aType=..., aCrossOrigin=...)
    at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5TreeOpExecutor.cpp:1058
#5  0x4084cc60 in nsHtml5SpeculativeLoad::Perform (this=<value optimized out>, aExecutor=0x45c47b50) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5SpeculativeLoad.cpp:35
#6  0x4084a3ea in nsHtml5TreeOpExecutor::RunFlushLoop (this=0x45c47b50) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5TreeOpExecutor.cpp:514

#7  0x4084c286 in nsHtml5ExecutorFlusher::Run (this=<value optimized out>) at /run/media/jdm/ssd/b2g/B2G/gecko/parser/html/nsHtml5StreamParser.cpp:127

#8  0x40c191b2 in nsThread::ProcessNextEvent (this=0x41a06880, mayWait=<value optimized out>, result=0xbec64fd7) at /run/media/jdm/ssd/b2g/B2G/gecko/xpcom/threads/nsThread.cpp:620
#9  0x40bf95de in NS_ProcessNextEvent_P (thread=0x43971c40, mayWait=false) at /run/media/jdm/ssd/b2g/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:237
#10 0x40b16c00 in mozilla::ipc::MessagePump::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:82
#11 0x40b16cb2 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:231

#12 0x40c3af90 in MessageLoop::RunInternal (this=0x1000000) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:215
#13 0x40c3b046 in MessageLoop::RunHandler (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#14 MessageLoop::Run (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:182
#15 0x40a9d6f0 in nsBaseAppShell::Run (this=0x4323f040) at /run/media/jdm/ssd/b2g/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:163

#16 0x4043c4dc in XRE_RunAppShell () at /run/media/jdm/ssd/b2g/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:646
#17 0x40b16c80 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x41a022e0, aDelegate=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/glue/MessagePump.cpp:198

#18 0x40c3af90 in MessageLoop::RunInternal (this=0x4323f040) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:215
#19 0x40c3b046 in MessageLoop::RunHandler (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#20 MessageLoop::Run (this=0xbec658e8) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/chromium/src/base/message_loop.cc:182
#21 0x4043c880 in XRE_InitChildProcess (aArgc=<value optimized out>, aArgv=<value optimized out>, aProcess=GeckoProcessType_Content) at /run/media/jdm/ssd/b2g/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:485
#22 0x00008410 in main (argc=5, argv=0xbec65a44) at /run/media/jdm/ssd/b2g/B2G/gecko/ipc/app/MozillaRuntimeMain.cpp:48
It's interesting. I've seen failures in SendPHttpChannel, GetCookieString, SetCookieString, and WebSocketChild::SendAsyncOpen all stemming from the same problem, where the PBrowser argument is null. Having caught the websocket one in gdb, I can see that we actually have a loadgroup, but are surprisingly not able to obtain an nsITabChild from it. Boris, do you have any suggestions for how to figure out why this is?
Flags: needinfo?(bzbarsky)
I sorted this out with Boris today: it looks like loads are starting from pages/frames after the containing page has already been destroyed, so there's no tree owner for the docshell to retrieve the nsITabChild from. This presumably isn't a problem with non-IPC channels; the only good solution I see here is to abort the AsyncOpen call if a null nsITabChild is encountered. The downside here is that we lose the high visibility failure case for when our channels are simply set up incorrectly, but the alternative is a very ill-defined investigation into how to keep the loads from starting in the first place.
Flags: needinfo?(bzbarsky)
> the only good solution I see here is to abort the AsyncOpen call if a null 
> nsITabChild is encountered

I was thinking the same thing.  Just return NS_ERROR_ILLEGAL_VALUE or something.  I think synchronous failure here should be ok--it's certainly easier to code.
We could potentially throw in a MOZ_ASSERT(TabChild) here too, and then we'd catch other cases where we don't setup the channels correctly.  Of course, if we have cases in our test harness that hit this because of the containing-page-has-gone-away issue, we'd get random oranges.  But I suspect we don't have any, so try throwing it in and let's see if it passes try OK.

I also think issuing a printf_stderr("WARNING: child channel tried to open channel w/o security info" msg is worth issuing here too).

I have a patch that does some of the infrastructure here. jdm is off IRC.  At the risk of duplicating effort I'm going to steal this and just finish that patch.
Assignee: josh → jduell.mcbugs
Move the usingSecurity variables out from being static in NeckoParent/Child so we can check in other files whether security is being used or not.  Could probably do a fix with less lines of code this is pretty much risk free, just boilerplate.
Attachment #706233 - Flags: review?(josh)
This doesn't cover the appcache (see OfflineCacheUpdateChild.cpp, etc)--I need to look into that tomorrow.
Attachment #706234 - Flags: review?(josh)
Comment on attachment 706234 [details] [diff] [review]
Part 2: don't kill child in release builds if navigation causes missing TabChild

Review of attachment 706234 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/wyciwyg/WyciwygChannelChild.cpp
@@ +602,5 @@
>    URIParams originalURI;
>    SerializeURI(mOriginalURI, originalURI);
>  
>    mozilla::dom::TabChild* tabChild = GetTabChild(this);
> +  if (MissingRequiredTabChild(tabChild, "http")) {

wyciwyg
Attachment #706234 - Flags: review?(josh) → review+
Attachment #706233 - Flags: review?(josh) → review+
I'll fix this up and land both patches. Also, I looked at OfflineCacheUpdateChild and it already has null tabchild detection and synchronous abort, so I'm updating it to use the same mechanism.
jdm: how do you feel about landing this too as part of this bug?  I realized that by putting static vars inside an inline function we'll wind up creating a copy of them per compilation unit, which seems... unintuitive.  GCC at least isn't creating the static in compilation units that don't use the function, but I could imagine other compilers might.  Anyway, seems messy. 

Can also get split out into a new bug, but it's only been a day since we landed part 1/2.

Also: have you checked to make sure the JS TCP socket API won't kill the child if we try to open one when the containing page has gone away?
Attachment #706714 - Flags: review?(josh)
This and bug 832796 need to be verified (i.e. no more crashes in B2G on wired.com?)
Keywords: qawanted
Keywords: qawantedverifyme
QA Contact: jsmith
Comment on attachment 706714 [details] [diff] [review]
Part 3:  stop using static var per compilation unit

Review of attachment 706714 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/NeckoCommon.cpp
@@ +11,5 @@
> +namespace net {
> +
> +namespace NeckoCommonInternal {
> +  bool securityDisabled = true;
> +  bool registeredBool = false;

Use a g prefix for the name, please.
Attachment #706714 - Flags: review?(josh) → review+
(In reply to Jason Duell (:jduell) from comment #14)
> Also: have you checked to make sure the JS TCP socket API won't kill the
> child if we try to open one when the containing page has gone away?

Good catch. I just looked, and we don't actually use the necko security stuff o.O We just merrily go on our way in the parent if no PBrowser is present. I'll fix that.
Blocks: 835038
Whiteboard: [needs-b2g-v1.0 uplift]
jdm: I'm guessing we should fork the TCP socket fix to another bug at this point.

Landed part3 on v1.0.0: 

  https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/dc3e991c5ef0

Since I don't see a tracking flags for b2gv1.0.0, I'm assuming clearing the whiteboard is all I need to do for that.
Whiteboard: [needs-b2g-v1.0 uplift]
Target Milestone: --- → B2G C4 (2jan on)
lgtm on 1/30 build.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Depends on: 959391
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: