e10s-ify the network predictor

RESOLVED FIXED in Firefox 41

Status

()

defect
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: keeler, Assigned: u408661)

Tracking

unspecified
mozilla41
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(e10s+, firefox41 fixed)

Details

Attachments

(1 attachment, 6 obsolete attachments)

bug 948574 makes it fatal if a child process attempts to instantiate nsISiteSecurityService directly. It turns out, on b2g, layout/generic/test/test_bug448860.html causes the Seer to call SpeculativeConnect in the child process (through nsDocShell::OnOverLink), which uses nsISiteSecurityService (as well as does some networking).

Comment 1

6 years ago
Nick: we should teach nsIOService to (if it's on the child) ship SpeculativeConnect calls to the parent (via a new method in PNecko.ipdl).  Ask me for help if you need it with IPDL.

It would be nice to start adding some assertions to our nsIOService code and socketTransport to catch this sort of thing (right now it's still possible to instantiate them and call pretty much anything).  Not sure if we can actually turn off the socket transport thread (see bug 607924) but we could at least add some asserts in the right places.  I'm checking with :ekr to see what stuff WebRTC actually needs here.
Assignee: nobody → hurley
Assignee

Comment 2

6 years ago
Jason - I'm not sure about shipping SpeculativeConnect calls off to the parent process (at least not directly). Problem is, one of the args to it is an nsIInterfaceRequestor, which means (I think) we'd have to be able to serialize arbitrary objects - not exactly an exciting prospect to implement!

Instead, what about making the seer e10s-aware? It would still be a bit ugly (we'd have to either serialize an nsILoadGroup (maybe already done?) or change the seer interface to just take a bool regarding private browsing mode, plus we'd have to ship nsINetworkSeerVerifier information across processes), but those sound like much more tractable problems than serializing arbitrary, unknown, opaque objects :)

Thoughts?
Flags: needinfo?(jduell.mcbugs)
(In reply to Nicholas Hurley [:hurley] from comment #2)
> Jason - I'm not sure about shipping SpeculativeConnect calls off to the
> parent process (at least not directly). Problem is, one of the args to it is
> an nsIInterfaceRequestor, which means (I think) we'd have to be able to
> serialize arbitrary objects - not exactly an exciting prospect to implement!

We wouldn't need to be able to serialize any nsIInterfaceRequestor. At the e10s boundary, the parent needs to synthesize (or find) its own nsIInterfaceRequestor that implements the necessary interfaces. Note that this is already an issue with other things like XHR so I would expect it to be relatively simple to fix this by copying what XHR and friends do.

> Instead, what about making the seer e10s-aware? It would still be a bit ugly
> (we'd have to either serialize an nsILoadGroup (maybe already done?) or
> change the seer interface to just take a bool regarding private browsing
> mode, plus we'd have to ship nsINetworkSeerVerifier information across
> processes), but those sound like much more tractable problems than
> serializing arbitrary, unknown, opaque objects :)

For the purposes of sandboxing, which is being built on top of e10s, I would recommend against doing this because it makes the process boundary interface much larger than doing things the other way: have the child process notify the parent about hover events (and/or whatever else kicks of Seer) and then have the parent-process event handler initiate the speculative requests. This way, the interface between the child and parent stays simple, small, and more clearly secure.

Comment 4

6 years ago
As of yet the parent is blissfully unaware of nsILoadGroups on the child, and let's try to keep it that way if we can.

re: passing nsIInterfaceRequestor across IPDL. Yeah, I freaked out about that too when I first wrote the IPDL code :)  The general strategy here is to figure out which interfaces actually get requested on the parent, and what args you need to pass to IPDL to help those interfaces work correctly on the parent. And then, as Brian notes, you synthesize your own nsIInterfaceRequestor (often the parent IPDL class does that, or a member variable that it delegates to). The most obvious thing to pass is a SerializedLoadContext--all necko channels pass that, and we (after security checks--see NeckoParent.cpp) create from it a nsILoadContext that gives important info like which app is making the request (so we get things like cookies and private browsing right).  See all the calls to CreateChannelLoadContext() in NeckoParent.cpp. Note that doing the security checks requires a PBrowser (IPDL uses that to make sure an app doesn't lie about its appId, for instance): you get that by querying the callbacks of the channel on the child.  This all sounds confusing :) but just look at what FTPChannelChild does in AsyncOpen to get ready to call SendPFTPChannelConstructor and you'll have the recipe.

I see that you've also got a nsISpeculativeConnectionOverrider interface which can be queried.  If that lives on the child you may need to ship its values (the long, bool, bool) to the child and set up a dummy object that you hand off to GetInterface, just like LoadContext objects do for nsILoadContext).

I didn't dig down to see if the wrappedCallbacks passed to the nullHttpTransaction use any interfaces out of the ordinary--I assume not.

Re: Brian's comments.  Hmm, sounds like we're talking about 3 possible layers to do the IPDL:  1) sending mouse hover events, etc to parent; 2) making the seer span parent/child; or 3) sending the low-level network events (SpeculativeConnect) to the parent.  My sense is that doing this at the SpeculativeConnect level keeps the boundary interface smaller than starting to shlep higher-level objects/events around (for one thing, it's a single message type). That's certainly how we've done things in the past with DNSPrefetch, for instance.  Brian, feel free to make the case some more, but I think adding a SpeculativeConnect message to PNecko.ipdl is the way to go here.
Flags: needinfo?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #4)
> And then, as Brian notes, you synthesize your own
> nsIInterfaceRequestor (often the parent IPDL class does that, or a member
> variable that it delegates to). The most obvious thing to pass is a
> SerializedLoadContext

Not sure we even need SerializedLoadContext for simple speculative connections. If we're doing speculative loads, then we probably do.

> all necko channels pass that, and we (after security
> checks--see NeckoParent.cpp) create from it a nsILoadContext that gives
> important info like which app is making the request (so we get things like
> cookies and private browsing right).

Is the parent process really using information from the serialized load context to determine whether to act like we're in private browsing mode and/or whether to send cookies? If so, I would like to file a follow-up bug to change that, because in a sandboxed world those decisions should be made from information within the parent process. (It may be the case that currently this information isn't available in the parent process, but this is the way we should make it work eventually, if not right away.)

> I see that you've also got a nsISpeculativeConnectionOverrider interface
> which can be queried.  If that lives on the child you may need to ship its
> values (the long, bool, bool) to the child and set up a dummy object that
> you hand off to GetInterface, just like LoadContext objects do for
> nsILoadContext).

It seems to me like nsISpeculativeConnectionOverrider needs to be in the parent because it is trying to enforce global behavior across all tabs. With there will be multiple child processes.

> My sense is that doing this at
> the SpeculativeConnect level keeps the boundary interface smaller than
> starting to shlep higher-level objects/events around (for one thing, it's a
> single message type). That's certainly how we've done things in the past
> with DNSPrefetch, for instance.  Brian, feel free to make the case some
> more, but I think adding a SpeculativeConnect message to PNecko.ipdl is the
> way to go here.

I think that is reasonable. My concern is that we want to limit the number of distinct messages that we send child->parent, even if those messages are fully asynchronous, for efficiency reasons. If we already are sending hover events to the parent (seems likely, to update the UI) then it would be better to just add the necessary information to the existing hover event. If we aren't sending hover events but we're still sending DNS prefetch events, then we should consider whether or not it makes sense to coalesce those two things into one type of event.
Flags: needinfo?(jduell.mcbugs)
One more thing: I would consider most of the prefs that control speculative connect to be "privacy prefs" and generally we should enforce behavior controlled by privacy & security prefs in the parent process. For example, instead of the child process saying "if (seer is enabled ) { send message to parent; }", we should have the parent do "if (seer is enabled) { process the message from the child; }". It seems like it might be useful to *also* do the check in the child, because it is wasteful to send the message if the parent is going to ignore it; however, since seer is almost always going to be enabled (eventually), that check ends up being mostly redundant and even confusing with respect to where such checks belong, so IMO it is better to skip it.

Comment 7

6 years ago
> Is the parent process really using information from the serialized load
> context to determine whether to act like we're in private browsing mode and/or
> whether to send cookies?

In "real life" we enforce using the TabParent of the channel to get the appID (and thus what cookies to send). (for xpcshell tests there's no TabParent to supply, so we allow child tests to fake being an app via the SerializedLoadContext). We do trust the child (i.e. the SerializedLoadContext) as to whether it's in private browsing mode or not, as no one seemed to know of a good way to determine that info independently on the parent (i.e. via IPDL).  It also didn't seem like a significant security hole (it only gives a compromised process the ability to choose whether to use the app's regular or private cookie/etc databases).  If there's something to worry about there let's open a new bug for it.

> It seems to me like nsISpeculativeConnectionOverrider needs to be in the parent

Not sure--I don't how how that code works, but perhaps Nick can say.

> If we already are sending hover events to the parent (seems likely, to update
> the UI) then it would be better to just add the necessary information to the
> existing hover event.

It's an interesting idea. I don't actually know if we're sending hover events or not.  A lot of the prefetch stuff is launched by the HTTP parser as it sees links, so we'd still want the necko-level DNSPrefetch, etc, for that.

> generally we should enforce behavior controlled by privacy & security prefs in
> the parent process.

I've got sympathy for that approach, all things being equal, so if it's easy to add checks in the parent, let's do them.  I'm not going to sweat the efficiency loss of doing them twice (or sending a IPDL msgs if they're disabled).  So far in my experience IPDL messages are cheap (so long as they're async and there's not a deluge of them).  That said it ought to be a one-liner to skip sending.
Flags: needinfo?(jduell.mcbugs)
Assignee

Comment 8

6 years ago
Posted patch patch (obsolete) — Splinter Review
OK, you were right, that was a lot of boilerplate. I almost thought I was writing Java!

Anyway, I'm almost certain I've done something horribly wrong here (even though xpcshell tests seem to be passing so far - https://tbpl.mozilla.org/?tree=Try&rev=9a1b9d503c97 - debug build may prove me wrong, though), hence just f? instead of r?. Why am I almost certain I've done something horribly wrong, you ask? Because with all the junk I had to do to send a message from one place to another, how could I have NOT gotten something wrong? :)
Attachment #8364627 - Flags: feedback?(jduell.mcbugs)
Assignee

Comment 9

6 years ago
Oh yeah, debug build proved me wrong. The test timed out. I still think the patch is worthy of f? for the general bits, but there's definitely some work to be done.
Assignee

Comment 10

6 years ago
And joy of joys... debug build is perma-fail on try, and perma-succeed on my local machine. Guess it's time to request a test slave to mess around with.
Assignee

Updated

6 years ago
Depends on: 964449
Comment on attachment 8364627 [details] [diff] [review]
patch

Review of attachment 8364627 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsIOService.cpp
@@ +1256,5 @@
> +            connectionOverrider.Init(overrider);
> +        }
> +
> +        net::gNeckoChild->SendSpeculativeConnect(loadContext,
> +                                                 connectionOverrider, uri);

AFAICT, the nsISpeculativeConnectionOverrider should be retrieved from the parent process, not sent from the child to the parent so that we can properly control the networking activity that spans multiple child processes. For example, consider parallelSpeculativeConnectLimit; this is supposed to be global across all content processes, not local to each content process, right?
Attachment #8364627 - Flags: feedback-
Assignee

Comment 12

6 years ago
Posted patch patch v2 (obsolete) — Splinter Review
Here's one, exact same clause you suggested. Running through try, let's see what breaks: https://tbpl.mozilla.org/?tree=Try&rev=0824fb33138a
Attachment #8364627 - Attachment is obsolete: true
Attachment #8364627 - Flags: feedback?(jduell.mcbugs)
Attachment #8366892 - Flags: review?(mcmanus)
Assignee

Comment 13

6 years ago
Comment on attachment 8364627 [details] [diff] [review]
patch

Argh, wrong bug!
Attachment #8364627 - Attachment is obsolete: false
Attachment #8364627 - Flags: feedback- → feedback?(jduell.mcbugs)
Assignee

Updated

6 years ago
Attachment #8366892 - Attachment is obsolete: true
Attachment #8366892 - Flags: review?(mcmanus)
Assignee

Comment 14

6 years ago
(In reply to Brian Smith (:briansmith, :bsmith; NEEDINFO? for response) from comment #11)
> AFAICT, the nsISpeculativeConnectionOverrider should be retrieved from the
> parent process, not sent from the child to the parent so that we can
> properly control the networking activity that spans multiple child
> processes. For example, consider parallelSpeculativeConnectLimit; this is
> supposed to be global across all content processes, not local to each
> content process, right?

Just because the overrider happens to come from the child process doesn't mean that the limits being used for that speculative connection apply only to that child process' connections. The speculative connection code has no concept of "per process" limits, and this doesn't change that. Perhaps a better name for the IDL would've been nsISpeculativeConnectionClassifier (since the limits are whether it's coming from the seer or some other call to SpeculativeConnect, ie, two classes), but that ship has sailed. Poor naming is not necessarily indicative of incorrect operation :)
Assignee

Comment 15

6 years ago
(the tl;dr of my previous comment is "no, the operation is exactly what is desired in this situation, nothing per-process about it)

Comment 16

5 years ago
Comment on attachment 8364627 [details] [diff] [review]
patch

Review of attachment 8364627 [details] [diff] [review]:
-----------------------------------------------------------------

After talking to Nick it looks like we want a different architecture here--having the seer exist on just the parent instead of each child process (which will consume more memory).
Attachment #8364627 - Flags: feedback?(jduell.mcbugs)
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s: --- → +
Assignee: hurley → jaypoulz
Hey Jeremy, what's the status of this work? bug 948574 is shaping up to land but afaict, it can't according to comment 1 without this landing first. Both these bugs block the e10s m1 milestone which we'd like to complete here in the next couple weeks. Do you have time to finish this work up and get it landed?
Flags: needinfo?(jaypoulz)
FWIW bug 948574 is green on try so it doesn't look like it should be blocked by this bug.

Comment 21

5 years ago
I'm not clear from the comments in bug 948574 why it's blocked on this bug.  Does the seer in the child (even when pref'd off, as it is now) call into nsISiteSecurityService? (Jeremy, take a look and see?).  If it does then we should stop doing that.
See comment 1. In theory, if the child process attempts to directly create any new connection to an http url, we'll directly look the host up in nsISiteSecurityService to see if it should be upgraded to https. This shouldn't happen in the child process.
Jim - I'm not making any promises, especially since :hurley said this would take me longer than I thought it might. That being said, I've got a small patch I'll be finishing today, and then my cycles will be spent primarily working on this bug.

The seer doesn't call call into anything now that it's pref'd off. It, however, possible for the enabled seer to create a connection that gets upgraded to https in nsHttpChannel/Handler. I'm not sure that this would happen in the current configuration (that only allows up to preconnect), given that SpeculativeConnect only makes a new half open connection, but it would definetely happen in the prefetch code I wrote (which theorhetically could land in the coming weeks as well).
Flags: needinfo?(jaypoulz)
Assignee

Comment 24

5 years ago
The seer won't ever call into nsISiteSecurityService while it's disabled, no. Given that the seer is going to stay disabled for a while, I suggest we at least contemplate landing bug 948574, remove this bug from blocking e10s m1, and make this bug block prefing the seer on by default (no bug for that, just a list of things in my head - I should make a whiteboard tag, really).
(In reply to Nicholas Hurley [:hurley] from comment #24)
> The seer won't ever call into nsISiteSecurityService while it's disabled,
> no. Given that the seer is going to stay disabled for a while, I suggest we
> at least contemplate landing bug 948574, remove this bug from blocking e10s
> m1, and make this bug block prefing the seer on by default (no bug for that,
> just a list of things in my head - I should make a whiteboard tag, really).

Excellent, thanks guys. Unhooking from bug 948574 / e10s-m1.
No longer blocks: 948574, e10s-m1
(In reply to David Keeler (:keeler) [use needinfo?] from comment #0)
> It turns out, on b2g,
> layout/generic/test/test_bug448860.html causes the Seer to call
> SpeculativeConnect in the child process (through nsDocShell::OnOverLink),
> which uses nsISiteSecurityService (as well as does some networking).

Nit: as well as quietly failing to do some networking.  B2G inherits Android's security features, and content processes are treated like Android apps without network access permissions; attempts to create Internet-domain sockets fail with EACCES.  (Sandboxing imposes more restrictions on networking-related system calls, but WebRTC currently depends on socket creation being a non-fatal error.)
Posted patch bug-959752-fix.patch (obsolete) — Splinter Review
Hi Jason, I was hoping you could take a look at this and tell me if I'm barking up the right tree. Thanks!! :)
Attachment #8364627 - Attachment is obsolete: true
Attachment #8461926 - Flags: feedback?(jduell.mcbugs)

Comment 28

5 years ago
Comment on attachment 8461926 [details] [diff] [review]
bug-959752-fix.patch

Review of attachment 8461926 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like a plausible e10s skeleton to build on.

::: netwerk/base/src/Predictor.cpp
@@ +202,5 @@
>    ,mLastCleanupTime(0)
>  {
> +  // Only allowed to run in the parent process
> +  if (net::IsNeckoChild()) {
> +    return;

sounds like you want a 

  MOZ_RELEASE_ASSERT(!IsNeckoChild) 

here instead of the early return? If something should really be impossible that's often the best way to go (and it's hard/scary to add RELEASE_ASSERTs later on).

@@ +217,5 @@
>  Predictor::~Predictor()
>  {
> +  // Only allowed to run in the parent process
> +  if (net::IsNeckoChild()) {
> +    return;

If you've got RELEASE_ASSERT in constructor you don't need it here too.

@@ +438,5 @@
>  Predictor::Init()
>  {
> +  // Only allowed to run in the parent process
> +  if (net::IsNeckoChild()) {
> +    return NS_OK;

ditto--not needed

@@ +876,5 @@
>  Predictor::Shutdown()
>  {
> +  // Only allowed to run in the parent process
> +  if (net::IsNeckoChild()) {
> +    return;

ditto

@@ +1734,5 @@
>                     PredictorPredictReason reason,
>                     nsILoadContext *loadContext,
>                     nsINetworkPredictorVerifier *verifier)
>  {
> +  // TODO e10s-ify

If the client on the child is handed a PredictorChild, there's nothing to do here, right?

@@ +2264,5 @@
>  Predictor::Learn(nsIURI *targetURI, nsIURI *sourceURI,
>                   PredictorLearnReason reason,
>                   nsILoadContext *loadContext)
>  {
> +  // TODO e10s-ify

ditto for all the rest of these TODO: e10s-ify.  AFAICT the code to e10sify is the Predictor{Child|Parent} code that schleps the arguments to the child process and then call into these functions.  The Predictor class itself doesn't need to know squat about e10s, hopefully.

::: netwerk/base/src/Predictor.h
@@ +64,5 @@
>    friend class PredictorCommitTimerInitEvent;
>    friend class PredictorNewTransactionEvent;
>    friend class PredictorCleanupEvent;
> +  friend class PredictorChild;
> +  friend class PredictorParent;

I'm not a fan of friend classes getting out of hand, but looks like it's a lost cause for this codebase :)

::: netwerk/base/src/PredictorParent.cpp
@@ +28,5 @@
> +  nsCOMPtr<nsIURI> targetURI = ipc::DeserializeURI(aTargetURI);
> +  nsCOMPtr<nsIURI> sourceURI = ipc::DeserializeURI(aSourceURI);
> +
> +  // We only actually care about the loadContext.mPrivateBrowsing, so we'll just
> +  // pass dummy params for nestFrameId, inBrowser and appId

A little gross but might be fine for now.

@@ +70,5 @@
> +}
> +
> +void PredictorParent::ActorDestroy(ActorDestroyReason aWhy)
> +{
> +  // Implement me! Bug 1005186

So follow the lead of  HttpChannelParent::ActorDestroy() and set a mIPCClosed variable, and whenever you send an IPDL message to the child (which for this patch is just for the OnPredict{DNS|Preconnect} stuff) make sure to check for !mIPCClosed beforehand. Otherwise the parent process will die if it tries to send a message to a dead child process.

::: netwerk/base/src/PredictorParent.h
@@ +6,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +class PredictorParent : public PPredictorParent

This class will probably also need to implement nsINetworkPredictorVerifier so that it can pass itself (when test mode is on) to get the callbacks (and then send those back to the child).

::: netwerk/ipc/PNecko.ipdl
@@ +74,5 @@
>    PUDPSocket(nsCString host, uint16_t port, nsCString filter);
>  
>    PDNSRequest(nsCString hostName, uint32_t flags);
>  
> +  PPredictor();

So, given that the predictor is a singleton in each process, and basically lives throughout the life of the process (i.e. from first time the service is asked for until shutdown), we could easily enough not have a separate PPredictor protocol, and just put all its functions in PNecko instead.   That's my impression at least.  But it does no harm to have a separate protocol, just more code and a little more complexity.  I could go either way.  For now you might as well keep it, since you've written the code.  And it's a little nicer to keep the e10s code in the separate Predictor{Child|Parent} classes.

::: netwerk/ipc/PPredictor.ipdl
@@ +30,5 @@
> +
> +child:
> +  OnPredictPrefetch(URIParams uri);
> +  OnPredictPreconnect(URIParams uri);
> +  OnPredictDNS(URIParams uri);

So... AFAICT the "OnPredirectPreconnect|DNS" functions are only used for testing purposes.  Assuming that's true we don't want to send IPDL traffic between processes for them, unless testing mode is on.  That should probably be a pref that we can turn on in tests.

I see no mention of OnPredirectPrefetch in the code anywhere?  It's not in nsINetworkPredictorVerifier and grep /netwerk didn't find it either...
Attachment #8461926 - Flags: feedback?(jduell.mcbugs) → feedback+
Posted patch bug-959752-fix.patch (obsolete) — Splinter Review
This expands from my previous patch and moves this in the same direction. I'm not sure how to test this though.

Nick - any ideas?
Attachment #8461926 - Attachment is obsolete: true
Flags: needinfo?(hurley)
Flags: needinfo?(hurley)
Assignee

Comment 30

5 years ago
This needs a better name to explain what it is we're doing here...
Summary: Seer can call SpeculativeConnect in the child process → e10s-ify the network predictor
Assignee

Updated

4 years ago
Assignee: jaypoulz → hurley
Assignee

Comment 32

4 years ago
Posted patch patch (obsolete) — Splinter Review
Patrick - nothing particularly crazy here, just a lot of boilerplate (as to be expected with e10s), plus a lot of logging I added during debugging. The tricky part is the test, and even that isn't exceedingly tricky, but it did take some thought to get right :)

Try seems reasonably happy with this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4752fa1db20c
Attachment #8471970 - Attachment is obsolete: true
Attachment #8594915 - Attachment is obsolete: true
Attachment #8620522 - Flags: review?(mcmanus)
Comment on attachment 8620522 [details] [diff] [review]
patch

Review of attachment 8620522 [details] [diff] [review]:
-----------------------------------------------------------------

I would prefer to change the RELEASE_ASSERTs here to MOZ_DIAGNOSTIC_ASSERT (which works on opt builds like RELEASE_ASSERT, but only for pre-release channels). If you're worried about them really getting hit, then runtime check and return an error after the DIAGNOSTIC_ASSERT.. 

message passing ftw
Attachment #8620522 - Flags: review?(mcmanus) → review+
Assignee

Comment 34

4 years ago
SGTM, I'll update the patch as such and get it landed.
Assignee

Comment 35

4 years ago
Once again, my m-i timing is impeccably bad. Gonna set checkin-needed on this one. (This patch updated with mcmanus's comments.)
Attachment #8620522 - Attachment is obsolete: true
Attachment #8624379 - Flags: review+
Assignee

Updated

4 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/185a9bd1dc62
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Depends on: 1176612
Depends on: 1176613
You need to log in before you can comment on or make changes to this bug.