Investigate HttpChannelChild missing LOAD_BACKGROUND flag

NEW
Assigned to

Status

()

defect
P3
normal
5 years ago
8 months ago

People

(Reporter: valentin.gosu, Assigned: valentin.gosu)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-next])

Bug 1118256 and bug 1116124 seem to be caused by an assertion failing in HttpChannelChild::DoOnStatus. It expects status to be NS_NET_STATUS_RECEIVING_FROM or NS_NET_STATUS_READING, but it it instead NS_OK.

A comment in HttpChannelParent::OnDataAvailable says:
>  // OnDataAvailable is always preceded by OnStatus/OnProgress calls that set
>  // mStoredStatus/mStoredProgress(Max) to appropriate values, unless
>  // LOAD_BACKGROUND set.  In that case, they'll have garbage values, but
>  // child doesn't use them.

So mStoredStatus is NS_OK, by default, and this is what is triggering the assertion.

But the assertion shouldn't be triggered if the child had the LOAD_BACKGROUND flag. We probably end up in a situation where the child doesn't have the flag.

We could probably disable the assertion until we find the source of this problem.
Possible scenario: One of the only places that removes the LOAD_BACKGROUND flag is 
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#2685
Maybe the child creates an xmlhttprequest, with the LOAD_BACKGROUND flag set, which then gets reset. The channel on the parent would have the LOAD_BACKGROUND flag, but the child removes it, which eventually leads to the assertion.

Is this possible? What if a redirect occurs?
Flags: needinfo?(mcmanus)
so it looks like mLoadFlags, and more imptly SetLoadFlags() doesn't have an e10s implementation. ChannelChild uses them in open, but I think if they are changed via set flags in the child they won't get propogated to the parent..

is that what you are suggesting in comment 1? Seems plausible.

We can ask jason, that's really his expertise, but I would start on a patch to deal with that if you agree. It seems like it would imapct more than this :)
Flags: needinfo?(mcmanus) → needinfo?(jduell.mcbugs)
(In reply to Patrick McManus [:mcmanus] from comment #2)
> so it looks like mLoadFlags, and more imptly SetLoadFlags() doesn't have an
> e10s implementation. ChannelChild uses them in open, but I think if they are
> changed via set flags in the child they won't get propogated to the parent..
> 
> is that what you are suggesting in comment 1? Seems plausible.

Yes, that's was interpretation as well, but I had trouble expressing it :)

> 
> We can ask jason, that's really his expertise, but I would start on a patch

Should I do it, or have you already started working on a patch?
valentin - its all you
It's true that we currently only copy the child channel's loadFlags to the parent during AsyncOpen (i.e. when we create the parent channel), so any changes to them are not propagated to the parent.  But...

   https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#2685

This particular code is modifying loadFlags before AsyncOpen is called, so we should copy the correct value to the parent, and we shouldn't see the assert hit.

There are two other places in the (C++ at least) tree that remove LOAD_BACKGROUND from the flags:

   https://dxr.mozilla.org/mozilla-central/source/dom/media/MediaResource.cpp?from=MediaResource.cpp#472

This happens in OnStopRequest, so we shouldn't be sending any more OnStatus events after that.  So also shouldn't hit the assert.

  https://dxr.mozilla.org/mozilla-central/source/dom/jsurl/nsJSProtocolHandler.cpp?from=nsJSProtocolHandler.cpp#851

This one I'm less clear about, because I don't know the calls sites for nsJSChannel::SetLoadFlags. So this would be my best guess for the culprit here, unless I'm missing something.

As far as a fix goes: trying to notify the parent that the loadFlags have changed is a losing game: by the time the message arrives, many (possibly all) OnStatus messages will already have been skipped.  The only real fix I can think of would be to never set LOAD_BACKGROUND in the parent--i.e. always have status notifications delivered to the HttpChannelParent, which in turns sends them to the child.  Then in the child simply drop notifications if LOAD_BACKGROUND is set.  We should get perfectly correct behavior with that.  Of course it's not the most efficient approach in the world, but I don't think it's that much overhead, so I'm not too worried about it.

Valentin, if you're got the time to throw in an assert in HttpBaseChannel::SetLoadFlags which barfs if LOAD_BACKGROUND is set after asyncOpen has been called (see ENSURE_CALLED_BEFORE_ASYNC_OPEN() for how to check for this), we could see more of what's going here (assuming you can reproduce this).  There may be a more efficient solution out there.   If that's too hard I'm fine with the architecture I mentioned.
Flags: needinfo?(jduell.mcbugs)
so should setloadflags in the child after open assert and fail if it can't actually get to the networking code in the parent?
Flags: needinfo?(jduell.mcbugs)
I'm not sure.  There are clearly some existing uses of unsetting LOAD_BACKGROUND during OnStopRequest so that the loadGroup fires OnStopRequest in some cases

  https://dxr.mozilla.org/mozilla-central/source/dom/media/MediaResource.cpp?from=MediaResource.cpp#464

But we could work around that (either by whitelisting setting LOAD_BACKGROUND, and/or by allowing setting flags during OnStopRequest, when we know they will no longer affect how the parent does anything).  

From a glance over the loadFlags (which are scattered across nsIRequest/nsIChannel/nsICachingChannel), many of them are clearly bogus to try to set after AsyncOpen (i.e. it's quite possibly too late to affect any behavior, since you're now racing with the socket transport and/or cache threads).  That would apply to all the flags in nsIRequest (except LOAD_BACKGROUND).  

LOAD_RETARGETED_DOCUMENT_URI and LOAD_TARGETED *do* get set after AsyncOpen, by URILoader, for valid reasons.  

I'm not sure about LOAD_DOCUMENT_URI (I suspect it's always set before AsyncOpen, but I haven't checked thoroughly). LOAD_REPLACE appears to be mostly set internally by HTTP: it looks like it would possibly be fine to fail it after AsyncOpen (OTOH these flags don't affect how actual networking happens IIUC, so it shouldn't matter if the parent sees them). LOAD_CLASSIFY also looks like it's only set before AsyncOpen.

LOAD_MEDIA_SNIFFER_OVERRIDES_CONTENT_TYPE && LOAD_EXPLICIT_CREDENTIALS also look pre-AsynOpen only.

So... I suspect that we could probably write a function (IsValidLoadFlagChange) that exclusive ORs the existing and new loadFlags, and then we could AND that against a blacklist (which would contain most but not all loadflags) and fail the function if we've detected a change in a flag that doesn't make sense to change. AFAICT the flags we wouldn't blacklist are {LOAD_RETARGETED_DOCUMENT_URI, LOAD_TARGETED, LOAD_BACKGROUND} but I'd like Valentin to double check to see if I've missed any (especially LOAD_DOCUMENT_URI and LOAD_REPLACE).  

If we write that function we might as well have HttpBaseChannel::SetLoadFlags call it, i.e. enforce the semantics in both the child and parent.  (It's a racy footgun to set most of these loadFlags after AsyncOpen in the parent).

Sadly this may still mean that we need to always send OnStatus messages to the child (i.e never set LOAD_BACKGROUND in the parent) in order to get the semantics there right.  (Unless Valentin can figure out what's trying to unset it in the child and it turns out not to be needed).
Flags: needinfo?(jduell.mcbugs)
> LOAD_RETARGETED_DOCUMENT_URI and LOAD_TARGETED *do* get set after AsyncOpen, by 
> URILoader, for valid reasons.  

and hopefully they don't need to get propagated to the parent--they're used by DocManager.cpp/docshell/uriloader, which are on the child, and the security manager (nsSecureBrowserUIImpl), which I'm hoping also lives on the child.

Steve, do you know (or know who'd know) whether nsSecureBrowserUIImpl lives in the child or parent in e10s?
Flags: needinfo?(sworkman)
(In reply to Jason Duell [:jduell] (needinfo? me) from comment #7)
> semantics there right.  (Unless Valentin can figure out what's trying to
> unset it in the child and it turns out not to be needed).

I'm fairly sure at this point that the culprit is this changeset:
https://hg.mozilla.org/mozilla-central/rev/edaf95c2be57

The crashes happened only for dom/media tests, and its commit date is in line with the time when the asserts started failing.
Keeler, see comment #8. Jduell wants to know if nsSecureBrowserUIImpl is in both the parent and child. Assuming it does exits in the child, do you know if the load flags NEED to be propagated from child to parent after AsyncOpen?
Flags: needinfo?(sworkman) → needinfo?(dkeeler)
> https://hg.mozilla.org/mozilla-central/rev/edaf95c2be57

That looks like a valid use for changing LOAD_BACKGROUND in the middle of loads.  So let's go ahead and do 1) never set LOAD_BACKGROUND in the parent and filter events in the child instead, and 2) let's implement the 'fail if changed after AsyncOpen' for most other flags as described in comment 7.
nsSecureBrowserUIImpl is in both the parent and child. I don't think the flags need to be propagated from the child to the parent at the moment. My understanding is that the child determines what its security state is (i.e. whether or not to show the lock icon).
Flags: needinfo?(dkeeler)
These failures are pretty frequent on B2G. Any news here? Can we just go ahead and disable the assertion if nobody has time to work on it beyond that right now?
Flags: needinfo?(valentin.gosu)
I have posted a temporary fix on bug 1116124.

I also have a patch fixing the underlying issue, but it uncovered a leak in mochitest-e10s-1 which I am trying to debug. https://treeherder.mozilla.org/#/jobs?repo=try&revision=929867917fac
Flags: needinfo?(valentin.gosu)
this looks like something to finish up for e10s
Whiteboard: [necko-active]
Valentin: any update here?
Flags: needinfo?(valentin.gosu)
We may fix this as part of bug 1260527. Flagging as necko-next.
Blocks: 1260527
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-active] → [necko-next]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P2
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
You need to log in before you can comment on or make changes to this bug.