Closed Bug 945066 Opened 6 years ago Closed 6 years ago

e10s: need name resolution service

Categories

(Core :: Networking: DNS, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla29
Tracking Status
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.3 --- fixed

People

(Reporter: ekr, Assigned: jduell.mcbugs)

References

Details

Attachments

(3 files, 11 obsolete files)

1.34 KB, patch
jduell.mcbugs
: review+
Details | Diff | Splinter Review
52.82 KB, patch
jduell.mcbugs
: review+
Details | Diff | Splinter Review
1.28 KB, patch
ekr
: review+
Details | Diff | Splinter Review
We need to be able to do name resolution in the renderer process for WebRTC
and this means we need a DNS resolution service....
Assignee: nobody → doug.turner
> ekr wrote:

Here is the relevant code that needs to use this:

  http://dxr.mozilla.org/mozilla-central/source/media/mtransport/nriceresolver.cpp
Assignee: doug.turner → jduell.mcbugs
Blocks: 923363
Jason - Are you be able to get this implemented before Monday's merge?
Flags: needinfo?(jduell.mcbugs)
Discussed in IRC:

jsmith	jduell: Context - bug 945066. Do you know if it's that possible to get done before Monday's merge?
jduell	I'm still working on it. I've lined up a reviewer who can look at it this weekend. Still cautiously optimistic
Flags: needinfo?(jduell.mcbugs)
jduell,

Please advise if you need any assistance here.
Blocks: 947721
Attached patch 945066_dns_e10s (obsolete) — Splinter Review
This covers much of the API surface of nsIDNSServer, including the only part we care about immediately (AsyncResolve) for B2G WebRTC.  I've verified that it works with e10s xpcshell on my desktop, but I could use someone to run this patch on a phone build and tell me if WebRTC is happy.  EKR, could you do that?

try run (but I'm not sure if any of the B2G builds will actually have a locked-down child process, which is what we need to test against):

  https://tbpl.mozilla.org/?tree=Try&rev=9fa0d8fba80e
Attachment #8344361 - Flags: review?(josh)
Flags: needinfo?(ekr)
jdm: all the XPCOM service goop I did (to make a different object for nsIDNSService get instantiated on the child) is copied from the way we do that already for nsICookieService.
Jduell: I am testing now.
ekr: nothing worth rebuilding here for:  just some vim/emacs modeline changes I made were wrong, a few typos in comments, and a no longer needed IsNeckoChild check in the parent DNS service.
Attachment #8344361 - Attachment is obsolete: true
Attachment #8344361 - Flags: review?(josh)
Attachment #8344363 - Flags: review?(josh)
Comment on attachment 8344361 [details] [diff] [review]
945066_dns_e10s

Review of attachment 8344361 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, apart from the lifetime issue, and I don't think any of my comments necessitate me taking another look at it.

::: netwerk/dns/ChildDNSService.cpp
@@ +75,5 @@
> +  // Send request to Parent process.
> +  DNSRequestChild *child =
> +    static_cast<DNSRequestChild*>(gNeckoChild->SendPDNSRequestConstructor(
> +                                                nsCString(hostname), flags));
> +  child->SetListener(listener);

Add a constructor to DNSRequestChild that takes a listener. Then you can explicitly create it and pass it as an argument to SendPDNSRequestConstructor.

@@ +132,5 @@
> +  // is configured
> +  bool disablePrefetch = false;
> +  int  proxyType = nsIProtocolProxyService::PROXYCONFIG_DIRECT;
> +
> +

Kill the extra newline.

@@ +137,5 @@
> +  nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID);
> +  prefs->GetBoolPref(kPrefNameDisablePrefetch, &disablePrefetch);
> +  if (prefs) {
> +    prefs->GetIntPref("network.proxy.type", &proxyType);
> +    prefs->GetBoolPref(kPrefNameDisablePrefetch, &disablePrefetch);

I'd prefer to use Preferences::AddFooVarCache instead of the observer goop.

::: netwerk/dns/DNS.cpp
@@ +219,5 @@
> +           (this->inet6.scope_id == other.inet6.scope_id);
> +#if defined(XP_UNIX) || defined(XP_OS2)
> +  } else if (this->raw.family == AF_LOCAL) {
> +    return PL_strncmp(this->local.path, other.local.path,
> +                      sizeof(this->local.path));

mozilla::ArrayLength

::: netwerk/dns/DNSRequestChild.cpp
@@ +133,5 @@
> +NS_IMETHODIMP
> +ChildDNSRecord::ReportUnusable(uint16_t aPort)
> +{
> +  // "We thank you for your feedback" == >/dev/null
> +  // TODO: we could send info back to child.

To the parent?

::: netwerk/dns/DNSRequestParent.cpp
@@ +18,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +DNSRequestParent::DNSRequestParent(const nsCString& aHost, const uint32_t& aFlags)

Just initialize mFlags to 0 here and don't bother taking arguments.

@@ +30,5 @@
> +
> +}
> +
> +void
> +DNSRequestParent::DoAsyncResolve(const nsACString &hostname, uint32_t flags)

Set mFlags here instead.

@@ +79,5 @@
> +    }
> +
> +    unused << Send__delete__(this, DNSRequestResponse(DNSRecord(cname, array)));
> +  } else {
> +    unused << Send__delete__(this, DNSRequestResponse(status));

There's no guarantee that the child is still alive. We need the usual mIPCOpen and ActorDestroy.
Attachment #8344361 - Attachment is obsolete: false
Comment on attachment 8344363 [details] [diff] [review]
v2: just some minor code cleanups: no semantic changes

Review of attachment 8344363 [details] [diff] [review]:
-----------------------------------------------------------------

See previous review.
Attachment #8344363 - Flags: review?(josh) → review+
> @@ +79,5 @@
> > +    }
> > +
> > +    unused << Send__delete__(this, DNSRequestResponse(DNSRecord(cname, array)));
> > +  } else {
> > +    unused << Send__delete__(this, DNSRequestResponse(status));
> 
> There's no guarantee that the child is still alive. We need the usual
> mIPCOpen and ActorDestroy.

Sorry, I meant RemoveIPDLReference instead of ActorDestroy.
Attached patch Fix compile issues on try (obsolete) — Splinter Review
Try boxes need a few includes that my machine mysteriously was happy without.

I haven't incorporated jdm's comments yet--doing that now.  Just wanted to post this in case ekr is hitting the same errors as try.

new try build:

  https://tbpl.mozilla.org/?tree=Try&rev=432aa9efeac6
Attachment #8344361 - Attachment is obsolete: true
Attachment #8344363 - Attachment is obsolete: true
Attached patch v4: now with even less suckage (obsolete) — Splinter Review
Not all of jdm's comment addressed yet, but this *should* compile and be testable.  Hopefully :)
Attachment #8344365 - Attachment is obsolete: true
Attached file crash stack on phone (obsolete) —
Status:  last patch posted compiles and runs fine on try (with xpcshell test that hits the codepath):

  https://tbpl.mozilla.org/?tree=Try&rev=f4e50b20ceb4

but ekr is reporting his phone crashes (stack attached).  Child side of IPDL class is segfaulting in Release() during IPDL channel shutdown.  But only on the phone.

I tried to push a build onto my Keon so I can debug this, but I somehow just managed to wedge it (boots into splash screen and hangs: I can no longer reach it via adb), so I'm stuck here until I can get that issue resolved.  I'll look into ASAP tomorrow.
Flags: needinfo?(ekr)
Comment on attachment 8344369 [details] [diff] [review]
v4: now with even less suckage

Review of attachment 8344369 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/dns/ChildDNSService.cpp
@@ +57,5 @@
> +ChildDNSService::AsyncResolve(const nsACString  &hostname,
> +                              uint32_t           flags,
> +                              nsIDNSListener    *listener,
> +                              nsIEventTarget    *target_,
> +                              nsICancelable    **result)

This outparam is never set.

@@ +75,5 @@
> +  // Send request to Parent process.
> +  DNSRequestChild *child =
> +    static_cast<DNSRequestChild*>(gNeckoChild->SendPDNSRequestConstructor(
> +                                                nsCString(hostname), flags));
> +  child->SetListener(listener);

Child should be placed in an nsRefPtr here which is forgotten into result.
to be used to enable peerconnections when this bug lands, and for testing this bug
FYI - This is moving to 1.4. We're out of time to get this landed for 1.3 at this point. Branching has already started.
Even though jsmith is correct (branching has started), I'd still love to get this fixed and landed ASAP in m-c.  We're still targeting video calling for v1.4.
we can make a call about uplifting when the patch has been completed, reviewed, and has baked.
(In reply to Doug Turner (:dougt) from comment #19)
> we can make a call about uplifting when the patch has been completed,
> reviewed, and has baked.

That's not allowed. We do not pref on features post feature complete for Firefox OS.
Attached patch v5 (obsolete) — Splinter Review
Now with JDM's suggestions and bugfix (which I'm hoping will fix the crash--it's very likely).

i haven't been able to get my B2G builds working for my Keon, so I need someone (Fabrice?  or someone he finds.  ekr can do it too once he's not on a plane over the Pacific any more :)

Here are (as I understand it) the steps to test/repro: (should crash with patch v4, and work with patch v5--both those datapoints are worth having, so I'm not hiding patch v4 yet.)

1) apply my patch (until I post the new patch, this will be steps to repro the crash, but that's progress)
2) apply randell's patch in the bug too to enable the prefs.
3) push build to phone
4) follow these steps from ekr:

ekr: how are you running your test?
<ekr> jduell: go to http://webrtcme.herokuapp.com/r/<something>
<ekr> That will instantiate a PeerConnection
<jduell> in the firefox OS browser?
<ekr> yeah
<ekr> If it doesn't crash that's good
<jduell> ekr: ok, I'll try to replicate that. Thanks
<ekr> if you go there twice, it will actually do CreateOffer
<ekr> and then you are looking at the logs to see if they have an 'a=candidate
… srflx'
<ekr> if they do, you are good to go
<jduell> ekr: so I go to that URI and hit reload?
<jduell> which logs do I look at?
<ekr> yeah. you are looking for "caller=true" I think
<ekr> it will be in the console
<jduell> ekr: been a while--how do I see the B2G console?
<ekr> sorry, not the console…. will show up at the bottom of the page
<ekr> like on the screen
<ekr> no need for anything fancy
* jib has quit (Quit: jib)
<jduell> ekr: shows up on the phone?
<ekr> yeah in the browser window.
<ekr> there are debugging logs as part of the test
Attachment #8344877 - Flags: review+
Flags: needinfo?(fabrice)
Sorry for slightly garbled instructions.  Ignore stuff in parentheses in step #1
The latest patch worked for me. I got prompted to share my camera, and no crash.
Flags: needinfo?(fabrice)
This is just the patch that worked for Fabrice with a #include "moizilla/Util.h" to see "ArrayLength" (Fabrice's compile didn't see it, so we worked around it).

It'd give me a warm feeling to see someone else from webRTC give this a try before we land.  But since we're not landing for 1.3 I supposed we can also just land now and let it shake out.  I'll leave that call to ekr/jesup.
Attachment #8344369 - Attachment is obsolete: true
Attachment #8344430 - Attachment is obsolete: true
Attachment #8344877 - Attachment is obsolete: true
Attachment #8345000 - Flags: review+
Flags: needinfo?(ekr)
Since this is just going to m-i and then m-c and it worked for Fabrice, I'd like to land this now, along with the pref on patch from jesup (also on this bug), and ask QA to verify it works with a "verifyme".  If we have any problems, then we'll deal with them.  If it does work, then we've saved ourselves some time because I'd like QA to verify this anyway.
(In reply to Maire Reavy [:mreavy] from comment #25)
> Since this is just going to m-i and then m-c and it worked for Fabrice, I'd
> like to land this now, along with the pref on patch from jesup (also on this
> bug), and ask QA to verify it works with a "verifyme".  If we have any
> problems, then we'll deal with them.  If it does work, then we've saved
> ourselves some time because I'd like QA to verify this anyway.

I think the pref doesn't land until bug 853356 gets the immediate bustage fix to resolve the contacts API fallout.
(In reply to Jason Smith [:jsmith] from comment #26)
> (In reply to Maire Reavy [:mreavy] from comment #25)
> > Since this is just going to m-i and then m-c and it worked for Fabrice, I'd
> > like to land this now, along with the pref on patch from jesup (also on this
> > bug), and ask QA to verify it works with a "verifyme".  If we have any
> > problems, then we'll deal with them.  If it does work, then we've saved
> > ourselves some time because I'd like QA to verify this anyway.
> 
> I think the pref doesn't land until bug 853356 gets the immediate bustage
> fix to resolve the contacts API fallout.

I see the traffic on bug 853356.  I'd still like to land the main e10s patch for this bug and fix the problem in bug 853356 ASAP.  Then we can pref on in Nightly and evaluate where we stand.
(In reply to Maire Reavy [:mreavy] from comment #27)
> (In reply to Jason Smith [:jsmith] from comment #26)
> > (In reply to Maire Reavy [:mreavy] from comment #25)
> > > Since this is just going to m-i and then m-c and it worked for Fabrice, I'd
> > > like to land this now, along with the pref on patch from jesup (also on this
> > > bug), and ask QA to verify it works with a "verifyme".  If we have any
> > > problems, then we'll deal with them.  If it does work, then we've saved
> > > ourselves some time because I'd like QA to verify this anyway.
> > 
> > I think the pref doesn't land until bug 853356 gets the immediate bustage
> > fix to resolve the contacts API fallout.
> 
> I see the traffic on bug 853356.  I'd still like to land the main e10s patch
> for this bug and fix the problem in bug 853356 ASAP.  Then we can pref on in
> Nightly and evaluate where we stand.

Okay, that sounds fine to me.
Still getting a crash:
#0  0xb57b754a in mozilla::net::DNSRequestChild::Release (this=0xb340b700)
    at ../../../netwerk/dns/DNSRequestChild.cpp:184
#1  0xb58394c2 in mozilla::net::NeckoChild::DeallocPDNSRequestChild (this=<optimized out>, aChild=<optimized out>)
    at ../../../netwerk/ipc/NeckoChild.cpp:247
#2  0xb58e8028 in mozilla::net::PNeckoChild::RemoveManagee (this=0xb48ad100, aProtocolId=<optimized out>, 
    aListener=0xb340b700) at PNeckoChild.cpp:871
#3  0xb58c0908 in OnMessageReceived (__msg=..., this=0xb340b700) at PDNSRequestChild.cpp:168
#4  mozilla::net::PDNSRequestChild::OnMessageReceived (this=0xb340b700, __msg=...) at PDNSRequestChild.cpp:134
#5  0xb58ba9a0 in mozilla::dom::PContentChild::OnMessageReceived (this=0xb4846c18, __msg=...)
    at PContentChild.cpp:3155
#6  0xb5888b62 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0xb4846c48, aMsg=...)
    at ../../../ipc/glue/MessageChannel.cpp:986
#7  0xb588a818 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=<optimized out>)
    at ../../../ipc/glue/MessageChannel.cpp:887
#8  0xb5888614 in DispatchToMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)()> (
    method=
    (void (mozilla::ipc::MessageChannel::*)(mozilla::ipc::MessageChannel * const)) 0xb588a79b <mozilla::ipc::MessageChannel::OnMaybeDequeueOne()>, obj=<optimized out>, arg=<optimized out>)
    at ../../../ipc/chromium/src/base/tuple.h:383
#9  RunnableMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run (
    this=<optimized out>) at ../../../ipc/chromium/src/base/task.h:307
#10 0xb588840e in Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:441
#11 mozilla::ipc::MessageChannel::DequeueTask::Run (this=<optimized out>)
    at ../../dist/include/mozilla/ipc/MessageChannel.h:458
#12 0xb588218c in MessageLoop::RunTask (this=0xbeb57fd8, task=0xb2b0d528)
    at ../../../ipc/chromium/src/base/message_loop.cc:340
#13 0xb5882832 in MessageLoop::DeferOrRunPendingTask (this=<optimized out>, pending_task=<optimized out>)
    at ../../../ipc/chromium/src/base/message_loop.cc:348
#14 0xb588384c in DoWork (this=<optimized out>) at ../../../ipc/chromium/src/base/message_loop.cc:448
#15 MessageLoop::DoWork (this=0xbeb57fd8) at ../../../ipc/chromium/src/base/message_loop.cc:427
#16 0xb588b312 in mozilla::ipc::DoWorkRunnable::Run (this=<optimized out>) at ../../../ipc/glue/MessagePump.cpp:45
#17 0xb5768f24 in ProcessNextEvent (result=0xbeb57ed7, mayWait=<optimized out>, this=0xb4814b20)
    at ../../../xpcom/threads/nsThread.cpp:612
#18 nsThread::ProcessNextEvent (this=0xb4814b20, mayWait=<optimized out>, result=0xbeb57ed7)
    at ../../../xpcom/threads/nsThread.cpp:545
#19 0xb573df9e in NS_ProcessNextEvent (thread=<optimized out>, mayWait=<optimized out>)
    at ../../../xpcom/glue/nsThreadUtils.cpp:263
#20 0xb588b4bc in mozilla::ipc::MessagePump::Run (this=0xb4802b50, aDelegate=0xbeb57fd8)
    at ../../../ipc/glue/MessagePump.cpp:85
#21 0xb588211a in MessageLoop::RunInternal (this=<optimized out>)
    at ../../../ipc/chromium/src/base/message_loop.cc:222
#22 0xb58821cc in RunHandler (this=0xbeb57fd8) at ../../../ipc/chromium/src/base/message_loop.cc:215
#23 MessageLoop::Run (this=0xbeb57fd8) at ../../../ipc/chromium/src/base/message_loop.cc:189
#24 0xb5c16aa6 in nsBaseAppShell::Run (this=0xb3b67700) at ../../../widget/xpwidgets/nsBaseAppShell.cpp:161
#25 0xb61c8ef6 in XRE_RunAppShell () at ../../../toolkit/xre/nsEmbedFunctions.cpp:679
#26 0xb588211a in MessageLoop::RunInternal (this=<optimized out>)
    at ../../../ipc/chromium/src/base/message_loop.cc:222
#27 0xb58821cc in RunHandler (this=0xbeb57fd8) at ../../../ipc/chromium/src/base/message_loop.cc:215
#28 MessageLoop::Run (this=0xbeb57fd8) at ../../../ipc/chromium/src/base/message_loop.cc:189
#29 0xb61c927e in XRE_InitChildProcess (aArgc=5, aArgv=<optimized out>, aProcess=<optimized out>)
    at ../../../toolkit/xre/nsEmbedFunctions.cpp:516
#30 0x000086ee in main (argc=7, argv=0xbeb58964) at ../../../ipc/app/MozillaRuntimeMain.cpp:137
(gdb) Quit
Flags: needinfo?(ekr)
I wonder if we should be duplicating the extra addref/resolve that nsDNSService does: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsDNSService2.cpp#695 and http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsDNSService2.cpp#301 . I don't really understand the logic behind it, though.
Tried this patch but got an assertion. ChildDNSService::AsyncResolve need to submit an runnable to main thread if it's not invoked on main thread at the first place.

#0  mozilla::ipc::MessageChannel::AssertWorkerThread (this=<value optimized out>) at /Users/hellfire/workspace/hg/mozilla-central/ipc/chromium/src/base/message_loop.h:223
#1  0x40a41312 in CxxStackFrame (this=0x440fc974, that=..., direction=10, msg=0xd) at ../../dist/include/mozilla/ipc/MessageChannel.h:309
#2  0x40a422cc in mozilla::ipc::MessageChannel::Send (this=0x40442848, aMsg=0x44f99680) at /Users/hellfire/workspace/hg/mozilla-central/ipc/glue/MessageChannel.cpp:248
#3  0x40ad677e in mozilla::net::PNeckoChild::SendPDNSRequestConstructor (this=0x441eaa60, actor=0x4443e730, hostName=..., flags=@0x440fca04) at /Users/hellfire/workspace/github/B2G/objdir-gecko.noindex/ipc/ipdl/PNeckoChild.cpp:564
#4  0x40ad67f8 in mozilla::net::PNeckoChild::SendPDNSRequestConstructor (this=0x441eaa60, hostName=..., flags=@0x440fca04) at /Users/hellfire/workspace/github/B2G/objdir-gecko.noindex/ipc/ipdl/PNeckoChild.cpp:530
#5  0x40918334 in mozilla::net::ChildDNSService::AsyncResolve (this=<value optimized out>, hostname=<value optimized out>, flags=32, listener=0x4443e6d0, target_=0x40424984, result=0x4443e6d4)
    at /Users/hellfire/workspace/hg/mozilla-central/netwerk/dns/ChildDNSService.cpp:78
#6  0x40b9594a in mozilla::NrIceResolver::resolve (this=<value optimized out>, resource=<value optimized out>, cb=0x417f63fd <nr_ice_candidate_resolved_cb>, cb_arg=0x441a5adc, handle=0x441a5c5c)
    at /Users/hellfire/workspace/hg/mozilla-central/media/mtransport/nriceresolver.cpp:165
#7  0x40b95acc in mozilla::NrIceResolver::resolve (obj=0x441fa280, resource=<value optimized out>, cb=warning: (Internal error: pc 0xa in read in psymtab, but not in symtab.)

0xa, cb_arg=0xd, handle=0x441a5c5c) at /Users/hellfire/workspace/hg/mozilla-central/media/mtransport/nriceresolver.cpp:142
#8  0x417fb10e in nr_resolver_resolve (resolver=<value optimized out>, resource=<value optimized out>, cb=<value optimized out>, cb_arg=0xd, handle=0x441a5c5c)
    at /Users/hellfire/workspace/hg/mozilla-central/media/mtransport/third_party/nICEr/src/net/nr_resolver.c:79
#9  0x417f6890 in nr_ice_candidate_initialize (cand=0x441a5adc, ready_cb=0x417f8a5d <nr_ice_initialize_finished_cb>, cb_arg=<value optimized out>)
    at /Users/hellfire/workspace/hg/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_candidate.c:493
> ChildDNSService::AsyncResolve need to submit an runnable to main thread if it's not invoked on main thread at the first place

Will do. That could be our bug right there--I don't know why I assumed the DNS request was on the main-thread.

> I wonder if we should be duplicating the extra addref/resolve that nsDNSService does: 

IIUC it's the same as the way that Httpchannel, etc, all add a ref to a listener and then release it during OnStopRequest.  We have to make sure the object doesn't go away since we call callbacks on it.  But we've already got the equivalent with our IPDL reference that we add/release, so I *think* we have it already.  I'll think about it some more.
Attached patch diff of v6 -> v7 (obsolete) — Splinter Review
in case jdm wants to see a diff instead of just the full patch.

The DNSListenerProxy changes don't need any brainpower--just moving into a separate file so I can see them outside nsDNSService.cpp.

Basically, this is all just making non-main thread consumers work.  EKR reports it's working for him.
Attached patch 945066_dns_e10s.v7 (obsolete) — Splinter Review
Attachment #8345000 - Attachment is obsolete: true
Attachment #8345730 - Flags: review?(josh)
Comment on attachment 8345729 [details] [diff] [review]
diff of v6 -> v7

Review of attachment 8345729 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/dns/ChildDNSService.cpp
@@ +75,5 @@
>    if (mOffline) {
>      flags |= RESOLVE_OFFLINE;
>    }
>  
> +  // make sure JS callers get notification on the main thread

This stuff for handling JS objects down to the DNSListenerProxy creation is copied verbatim from existing code in nsDNSService.cpp.
Let's try that again:

   https://tbpl.mozilla.org/?tree=Try&rev=db007df5a051

The culprot (too good a typo to correct) is "ArrayLength", which apparently needs #include mozilla/ArrayUtils.h on m-c but mozilla/Util.h on some b2g branches.  I'm not posting new patch yet as jdm is reviewing the last one.
Comment on attachment 8345729 [details] [diff] [review]
diff of v6 -> v7

>+    // we don't need an 'nIPCOpen' variable until/unless we add calls that might

mIPCOpen
Attachment #8345730 - Flags: review?(josh) → review+
This works. Can we land it?
  https://hg.mozilla.org/integration/b2g-inbound/rev/3072c9af15b3

Tip o' the hat to jdm for the review at 4 AM (!) Toronto time with <1 hour latency.
Sheriffs: if Windows appears to be burning on b2g-inbound, it's almost definitely not the fault of this bug (yes, I know it's a DNS file that's busted), but rather this:

  https://bugzilla.mozilla.org/show_bug.cgi?id=942317#c11

Don't back me out, bro!
landed the patch that turns on the pref, too:

  https://hg.mozilla.org/integration/b2g-inbound/rev/53377797571a
"So quick bright things come to confusion." - Shakespeare.

Somehow this patch is getting very strange compile errors on Windows that seem related somehow to bug 942317 (it's code that bug added, though my patch AFAICT shouldn't affect it in any way).  Getting backed out.

It's time to bring in Superman.  Assigning to Nick to try to figure out what's up with these errors:

  https://tbpl.mozilla.org/php/getParsedLog.php?id=31797213&tree=B2g-Inbound
Assignee: jduell.mcbugs → hurley
(In reply to Jason Duell (:jduell) from comment #41)
> Sheriffs: if Windows appears to be burning on b2g-inbound, it's almost
> definitely not the fault of this bug (yes, I know it's a DNS file that's
> busted), but rather this:
> 
>   https://bugzilla.mozilla.org/show_bug.cgi?id=942317#c11
> 
> Don't back me out, bro!

hey sorry i had to backout this changes in https://tbpl.mozilla.org/?tree=B2g-Inbound&rev=beb3f7812a1e (teamworking with jduell) for bustage on windows builds like https://tbpl.mozilla.org/php/getParsedLog.php?id=31797463&tree=B2g-Inbound
Attached patch windows build fix (obsolete) — Splinter Review
Once more, this time with feeling...

  https://hg.mozilla.org/integration/b2g-inbound/rev/3e44ba449b0f
Assignee: hurley → jduell.mcbugs
Status: NEW → ASSIGNED
Attachment #8346057 - Flags: review+
blocking-b2g: --- → 1.3?
https://hg.mozilla.org/mozilla-central/rev/3e44ba449b0f
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
This isn't applicable to 1.3 anymore - P2P support has been punted to 1.4.
blocking-b2g: 1.3? → ---
Attachment #8345729 - Attachment is obsolete: true
Attachment #8345730 - Attachment is obsolete: true
Attachment #8345968 - Attachment is obsolete: true
(In reply to Jason Smith [:jsmith] from comment #48)
> This isn't applicable to 1.3 anymore - P2P support has been punted to 1.4.

Well, that's exactly the question we are debating here. If we
land these two patches, then tada P2P support will work in 1.3.
FWIW, I am strongly in favor of doing this, and as far as I
know, the objections to it are largely procedural. The P2P
features do not significantly interact with other features
and therefore are unlikely to cause regressions.
yeah, we have to be a bit procedural sometimes, because if we aren't people just believe that it's fine to backport ad vitam eternam.

What I'd really like to know is whether we have tests for the P2P features? I guess not on b2g since this e10s issue was caught late in the cycle.
The DNS patch has tests.
(In reply to Doug Turner (:dougt) from comment #51)
> The DNS patch has tests.

That's not my question...
(In reply to Fabrice Desré [:fabrice] from comment #50)
> yeah, we have to be a bit procedural sometimes, because if we aren't people
> just believe that it's fine to backport ad vitam eternam.
> 
> What I'd really like to know is whether we have tests for the P2P features?
> I guess not on b2g since this e10s issue was caught late in the cycle.

We have a full suite of mochitests but they don't do any multiple
machine testing and therefore will not catch this kind of issue. We
have a new guy tasked to work on this (Nils) but he just started.
Ok guys, let's land that on 1.3
blocking-b2g: --- → 1.3+
(In reply to Fabrice Desré [:fabrice] from comment #54)
> Ok guys, let's land that on 1.3

This has no reason to land on 1.3. P2P has moved to 1.4.
blocking-b2g: 1.3+ → ---
(In reply to Eric Rescorla (:ekr) from comment #53)
> (In reply to Fabrice Desré [:fabrice] from comment #50)
> > yeah, we have to be a bit procedural sometimes, because if we aren't people
> > just believe that it's fine to backport ad vitam eternam.
> > 
> > What I'd really like to know is whether we have tests for the P2P features?
> > I guess not on b2g since this e10s issue was caught late in the cycle.
> 
> We have a full suite of mochitests but they don't do any multiple
> machine testing and therefore will not catch this kind of issue. We
> have a new guy tasked to work on this (Nils) but he just started.

Those mochitests aren't preffed on b2g right now.

See http://hg.mozilla.org/mozilla-central/file/f1abdc201968/testing/mochitest/b2g.json#l305.
Please stop changing flags without discussing first. Comment 49 states that the patches in this bug are the only ones blocking p2p to work on 1.3. So it looks clearly better to take them, the other option being to backout/disable p2p code that is already on 1.3.
(In reply to Fabrice Desré [:fabrice] from comment #57)
> Please stop changing flags without discussing first. Comment 49 states that
> the patches in this bug are the only ones blocking p2p to work on 1.3. So it
> looks clearly better to take them, the other option being to backout/disable
> p2p code that is already on 1.3.

Flipping a blocking flag will force a landing, which isn't happening here, as this is not a committed feature. It is not better to take them on b2g, as the P2P code is already disabled on 1.3 anyways via a preference. None of the P2P tests are turned on in B2G right now, so there's no proof of safety that this can be turned on 1.3. P2P is a highly complex feature, so trying to rush a landing is not a good idea.

No tests running reliably + highly complex feature = bad idea for 1.3 at this point.
(In reply to Jason Smith [:jsmith] from comment #58)
> (In reply to Fabrice Desré [:fabrice] from comment #57)
> > Please stop changing flags without discussing first. Comment 49 states that
> > the patches in this bug are the only ones blocking p2p to work on 1.3. So it
> > looks clearly better to take them, the other option being to backout/disable
> > p2p code that is already on 1.3.
> 
> Flipping a blocking flag will force a landing, which isn't happening here,
> as this is not a committed feature. It is not better to take them on b2g, as
> the P2P code is already disabled on 1.3 anyways via a preference.

But this bug is what changes the preference. And the tests are turned off
because the feature is turned off. So, we should be able to turn the subset
of tests that now should work on now.


> None of
> the P2P tests are turned on in B2G right now, so there's no proof of safety
> that this can be turned on 1.3. P2P is a highly complex feature, so trying
> to rush a landing is not a good idea.

It's a highly complex feature which has already seen very extensive deployment
in desktop. What we are talking about here is only the changes which are
needed to make it work in b2g.


> No tests running reliably + highly complex feature = bad idea for 1.3 at
> this point.

I'm really not following your argument. Nearly all the code for this feature
is in the tree already and we have an easy kill switch in case we find it
causes damage. What exactly is the risk you see to landing this patch now
given that we can always turn it off later?
ekr, can you push to try these patch + one that re-enables tests on b2g?
Willdo.
(In reply to Eric Rescorla (:ekr) from comment #59)
> (In reply to Jason Smith [:jsmith] from comment #58)
> > (In reply to Fabrice Desré [:fabrice] from comment #57)
> > > Please stop changing flags without discussing first. Comment 49 states that
> > > the patches in this bug are the only ones blocking p2p to work on 1.3. So it
> > > looks clearly better to take them, the other option being to backout/disable
> > > p2p code that is already on 1.3.
> > 
> > Flipping a blocking flag will force a landing, which isn't happening here,
> > as this is not a committed feature. It is not better to take them on b2g, as
> > the P2P code is already disabled on 1.3 anyways via a preference.
> 
> But this bug is what changes the preference. And the tests are turned off
> because the feature is turned off. So, we should be able to turn the subset
> of tests that now should work on now.

The tests are designed to flip the preference when the run, so they should be possible to ran even with the preference disabled. 

> 
> 
> > None of
> > the P2P tests are turned on in B2G right now, so there's no proof of safety
> > that this can be turned on 1.3. P2P is a highly complex feature, so trying
> > to rush a landing is not a good idea.
> 
> It's a highly complex feature which has already seen very extensive
> deployment
> in desktop. What we are talking about here is only the changes which are
> needed to make it work in b2g.

Making P2P work on b2g still carries complexity however around the e10s work that was done. I know that was worked on for multiple months way before P2P became a targeted feature. On that regard, I'd imagine there's likely a significant amount of complexity around the e10s implementation for P2P.

> 
> 
> > No tests running reliably + highly complex feature = bad idea for 1.3 at
> > this point.
> 
> I'm really not following your argument. Nearly all the code for this feature
> is in the tree already and we have an easy kill switch in case we find it
> causes damage. What exactly is the risk you see to landing this patch now
> given that we can always turn it off later?

I think my main concern here is that there's no tests actively running to protect against P2P regressions. This feature is highly complex, so having mochitests running on emulator builds to protect against basic test flows is critical. We're also past FC, so we really shouldn't be turning on targeted features post FC unless they are low in complexity (i.e. small enhancements).
(In reply to Jason Smith [:jsmith] from comment #62)
> (In reply to Eric Rescorla (:ekr) from comment #59)
> > (In reply to Jason Smith [:jsmith] from comment #58)
> > > (In reply to Fabrice Desré [:fabrice] from comment #57)
> > > > Please stop changing flags without discussing first. Comment 49 states that
> > > > the patches in this bug are the only ones blocking p2p to work on 1.3. So it
> > > > looks clearly better to take them, the other option being to backout/disable
> > > > p2p code that is already on 1.3.
> > > 
> > > Flipping a blocking flag will force a landing, which isn't happening here,
> > > as this is not a committed feature. It is not better to take them on b2g, as
> > > the P2P code is already disabled on 1.3 anyways via a preference.
> > 
> > But this bug is what changes the preference. And the tests are turned off
> > because the feature is turned off. So, we should be able to turn the subset
> > of tests that now should work on now.
> 
> The tests are designed to flip the preference when the run, so they should
> be possible to ran even with the preference disabled.


Then it sounds like we should have already turned some of them on (though
note that some of those tests are video and so won't work anyway).

As Fabrice asked, I am starting a try run with the tests reenabled. Will
advise in the bug when that is done.



> > > None of
> > > the P2P tests are turned on in B2G right now, so there's no proof of safety
> > > that this can be turned on 1.3. P2P is a highly complex feature, so trying
> > > to rush a landing is not a good idea.
> > 
> > It's a highly complex feature which has already seen very extensive
> > deployment
> > in desktop. What we are talking about here is only the changes which are
> > needed to make it work in b2g.
> 
> Making P2P work on b2g still carries complexity however around the e10s work
> that was done. I know that was worked on for multiple months way before P2P
> became a targeted feature. On that regard, I'd imagine there's likely a
> significant amount of complexity around the e10s implementation for P2P.

Sure. But actually that code landed. We're only talking about the
DNS code here which is like a week's patch.


> I think my main concern here is that there's no tests actively running to
> protect against P2P regressions. This feature is highly complex, so having
> mochitests running on emulator builds to protect against basic test flows is
> critical.

I agree we should have tests running, but again, if it turns out this
is busted we just turn it off. That seems pretty cheap.


> We're also past FC, so we really shouldn't be turning on targeted
> features post FC unless they are low in complexity (i.e. small enhancements).

This seems like a procedural point. I recognize we're looking to make an
exception and the question is whether that exception is one we should be
making and that needs to be argued on the merits. Given that (a) we can
turn it off easily and (b) it's an isolated feature, I think the merits
here are pretty strong.
(In reply to Eric Rescorla (:ekr) from comment #63)
> (In reply to Jason Smith [:jsmith] from comment #62)
> > (In reply to Eric Rescorla (:ekr) from comment #59)
> > > (In reply to Jason Smith [:jsmith] from comment #58)
> > > > (In reply to Fabrice Desré [:fabrice] from comment #57)
> > > > > Please stop changing flags without discussing first. Comment 49 states that
> > > > > the patches in this bug are the only ones blocking p2p to work on 1.3. So it
> > > > > looks clearly better to take them, the other option being to backout/disable
> > > > > p2p code that is already on 1.3.
> > > > 
> > > > Flipping a blocking flag will force a landing, which isn't happening here,
> > > > as this is not a committed feature. It is not better to take them on b2g, as
> > > > the P2P code is already disabled on 1.3 anyways via a preference.
> > > 
> > > But this bug is what changes the preference. And the tests are turned off
> > > because the feature is turned off. So, we should be able to turn the subset
> > > of tests that now should work on now.
> > 
> > The tests are designed to flip the preference when the run, so they should
> > be possible to ran even with the preference disabled.
> 
> 
> Then it sounds like we should have already turned some of them on (though
> note that some of those tests are video and so won't work anyway).

They aren't on. Any tests that included the exclude tests portion of http://hg.mozilla.org/mozilla-central/file/f1abdc201968/testing/mochitest/b2g.json won't be ran on checkin. Same thing goes for debug emulators - http://hg.mozilla.org/mozilla-central/file/f1abdc201968/testing/mochitest/b2g.json & desktop builds - http://hg.mozilla.org/mozilla-central/file/f1abdc201968/testing/mochitest/b2g-desktop.json.

> > > > None of
> > > > the P2P tests are turned on in B2G right now, so there's no proof of safety
> > > > that this can be turned on 1.3. P2P is a highly complex feature, so trying
> > > > to rush a landing is not a good idea.
> > > 
> > > It's a highly complex feature which has already seen very extensive
> > > deployment
> > > in desktop. What we are talking about here is only the changes which are
> > > needed to make it work in b2g.
> > 
> > Making P2P work on b2g still carries complexity however around the e10s work
> > that was done. I know that was worked on for multiple months way before P2P
> > became a targeted feature. On that regard, I'd imagine there's likely a
> > significant amount of complexity around the e10s implementation for P2P.
> 
> Sure. But actually that code landed. We're only talking about the
> DNS code here which is like a week's patch.

It's landed, but it hasn't gone through any end to end testing yet from QA.

> 
> 
> > I think my main concern here is that there's no tests actively running to
> > protect against P2P regressions. This feature is highly complex, so having
> > mochitests running on emulator builds to protect against basic test flows is
> > critical.
> 
> I agree we should have tests running, but again, if it turns out this
> is busted we just turn it off. That seems pretty cheap.
> 
> 
> > We're also past FC, so we really shouldn't be turning on targeted
> > features post FC unless they are low in complexity (i.e. small enhancements).
> 
> This seems like a procedural point. I recognize we're looking to make an
> exception and the question is whether that exception is one we should be
> making and that needs to be argued on the merits. Given that (a) we can
> turn it off easily and (b) it's an isolated feature, I think the merits
> here are pretty strong.

I think I'd like to know more why an exception needs to made here. In driver & product discussions, we've been pretty adamant on sticking to the FC deadline to ensure that there's established focus on stabilization post FC, not features.
FYI - for any discussion happening above about pref on, let's move the discussion over to bug 945256.
Blocks: b2g-webrtc
Discussed offline - this is now okay to take onto 1.3, as we're going to move forward trying to get this into 1.3. This needs to be nominated for aurora approval for uplift.

Leaving needinfo on jduell to put aurora approval when we're back in the office in Jan.
Flags: needinfo?(jduell.mcbugs)
Comment on attachment 8346057 [details] [diff] [review]
v8: with windows bustage fix.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): needed for P2P e10s
User impact if declined:  P2P e10s won't work (but can be disabled)
Testing completed (on m-c, etc.): xpcshell test
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none

There's a lot of code here but it's all boilerplate IPDL. The actual logic is fairly trivial--we're just remoting the args for AsyncResolve to the parent, and then schlepping back the answer. The hard part was getting it to compile (plus one dumb refcount error I had to debug). Overall I'd say it's one of the lower-risk IPDL patches I've written.   Also this has zero-touch for existing codepaths--if e10s P2P doesn't work and gets turned off, nothing else will run this code.
Attachment #8346057 - Flags: approval-mozilla-aurora?
Flags: needinfo?(jduell.mcbugs)
Comment on attachment 8344580 [details] [diff] [review]
enable PeerConnections for b2g rs=ekr

This patch is also needed on aurora if we take the other patch--turns on e10s P2P.
Attachment #8344580 - Flags: review+
Attachment #8344580 - Flags: approval-mozilla-aurora?
Attachment #8346057 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8344580 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Does this need testing in Firefox Desktop builds? If so, please advise.
Flags: needinfo?(jduell.mcbugs)
No testing for desktop needed until we start using multiprocess desktop.  For now it's Firefox OS specific.
Flags: needinfo?(jduell.mcbugs)
The pref-on patch never re-landed on inbound after the backout

https://hg.mozilla.org/integration/mozilla-inbound/rev/811f4c3267b6

Added fun: this has been checked into Aurora, which it should not have been - for Aurora, we aren't enabling video for peerconnections, so I'll need to land an update to that on Aurora (turn off one pref).
Comment on attachment 8361937 [details] [diff] [review]
disable video peerconnections on B2G 1.3 (accidentally enabled)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): this bug - accidental uplift of patch only meant for inbound/29

User impact if declined: video peerconnections will be on in B2G 1.3 (and they're not supposed to be)

Testing completed (on m-c, etc.): N/A
 
Risk to taking this patch (and alternatives if risky): None

String or IDL/UUID changes made by this patch: none
Attachment #8361937 - Flags: review?(ekr)
Attachment #8361937 - Flags: approval-mozilla-aurora?
Attachment #8361937 - Attachment is obsolete: true
Attachment #8361937 - Flags: review?(ekr)
Attachment #8361937 - Flags: approval-mozilla-aurora?
Comment on attachment 8361938 [details] [diff] [review]
disable video peerconnections on B2G 1.3 (accidentally enabled)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): this bug - accidental uplift of patch only meant for inbound/29

User impact if declined: video peerconnections will be on in B2G 1.3 (and they're not supposed to be)

Testing completed (on m-c, etc.): N/A
 
Risk to taking this patch (and alternatives if risky): None

String or IDL/UUID changes made by this patch: none
Attachment #8361938 - Flags: review?(ekr)
Attachment #8361938 - Flags: approval-mozilla-aurora?
Comment on attachment 8361938 [details] [diff] [review]
disable video peerconnections on B2G 1.3 (accidentally enabled)

Review of attachment 8361938 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #8361938 - Flags: review?(ekr) → review+
Duplicate of this bug: 945256
Attachment #8361938 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.