Closed Bug 1341506 Opened 7 years ago Closed 7 years ago

Make nsChannelClassifier::IsTrackerWhitelisted() async

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: hchang, Assigned: hchang)

References

Details

Attachments

(2 files, 2 obsolete files)

nsChannelClassifier::IsTrackerWhitelisted() is a sync call because it makes use of nsIURIClassifier::ClassifyWithLocalTables(), which is also sync call.

Even though ClassifyWithLocalTables() is only a in-memory binary search, the thread which serves the search also serves all the similar search tasks in the entire gecko. I don't know the average waiting time for ClassifyWithLocalTables()
in the task queue but since it's called on the main thread, it's always better to
be a async call if it could be blocked by other tasks.
Assignee: nobody → hchang
Blocks: 1334616
It becomes more complicated than I imagined when taking the IPC into account (but should be still straightforward actually.)
(In reply to Henry Chang [:henry][:hchang] from comment #1)
> It becomes more complicated than I imagined when taking the IPC into account
> (but should be still straightforward actually.)

I wish I saw this [1] earlier so that I would have not spent hours adding IPC stuff :(

[1] http://searchfox.org/mozilla-central/rev/39e4b25a076c59d2e7820297d62319f167871449/netwerk/base/nsChannelClassifier.cpp#713
Attachment #8839836 - Flags: review?(francois)
Attachment #8839836 - Flags: review?(ehsan)
Hi Ehsan,

I am not sure how much you are familiar with nsChannelClassifier for the review
so please feel free to drop the review if you can't do the review.

The patch is to have a async version of classifyWithLocalTables and make use of it.
That's it. I also did a IPC version of it but realized it can only be called
in the parent process so I removed the IPC part.

The idea is to early return from the first |OnClassifyComplete| (if we initially
identified the URI is a tracker) then await another |OnClassifyComplete| callback.
In the second callback we will know if the URI is a real tracker.

In short, I rename the original impl of |OnClassifyComplete| to |OnClassifyCompleteInternal|.
The first and second |OnClassifyComplete|s will delegate to |OnClassifyCompleteInternal|
with a flag to indicate if the whitelist tracker check has been done.

No test case, no thorough test. Only play around to make sure there's no obvious crash.
I will consider this patch as a plan "A+" if the "background DB update" cannot mitigate
Bug 1334616 very well.
This is Monica's explanation on why ClassifyLocalWithTables() has to be synchronous: https://bugzilla.mozilla.org/show_bug.cgi?id=1100024#c2

That call is done within nsHttpChannel::BeginConnect(): http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.cpp#6115

and the presence on the blacklist disables speculative connections: http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.cpp#598

without looking at whether or not the tracker is whitelisted.

The place where we call IsTrackerWhitelisted() is from OnClassifyComplete(): http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/base/nsChannelClassifier.cpp#745

which is the callback for the asynchronous Classify() call from within AsyncOpen2(): https://dxr.mozilla.org/mozilla-central/rev/b06968288cff469814bf830aa90f1c84da490f61/netwerk/base/nsBaseChannel.cpp#318

So as far as I can tell, Monica's comments don't apply to this and so your change should be safe.

However, ClassifyLocalWithTables() is supposed to only take a few milliseconds at most: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-02-21&keys=__none__!__none__!__none__&max_channel_version=nightly%252F54&measure=URLCLASSIFIER_CLASSIFYLOCAL_TIME&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-02-08&table=0&trim=1&use_submission_date=0

and if it's not, we'll be in trouble when we call it from within HttpChannel::BegingConnect(). So making only one use of it (checking against the whitelist) asynchronous might be hiding problems with our implementation.

So I think your change should be safe, but I'm not sure it's a good idea to do it since it would hide the root cause of ClassifyLocalWithTables() taking too long.

What do you think? Is my analysis correct?
Attachment #8839836 - Flags: review?(francois)
This profile <https://perfht.ml/2miKkrd> shows that this can block on disk I/O, so it's not possible to asset that this is supposed to take a few milliseconds.  If we want to ensure that, we'd need to do asynchronous I/O.

(Please note that this is blocking a Quantum DOM feature that I'm hoping to disentangle from the issue of asynchronous I/O, so I think this patch is valuable if it avoids the main thread blocking wait (I still haven't managed to look at the patch, sorry...)
Francois,

Thanks for pointing me Monica's comment!

Regarding "hiding the root cause", what do you think if we measure the 
average waiting time of each lookup event? (i.e. the time that a request
is added to the queue and the time that a request is processed.)

Anyways, let's see how the background update reduce the blocking time.
If a few millisecond turns out unacceptable on the main thread, we can then
take the async ClassifyLocalWithTables() approach.
Attachment #8840386 - Flags: review?(ehsan)
I've added the e10s support back so that we can now have a mochitest test case 
for the new function.

So, it sounds like we'd better land this patch according to comment 7.
As for the telemetry, I can instead measure the time difference between
the async function start time and the callback time, which can still be
an indicator of "how much improvement is brought by background DB
update."

Francois, do you have any objection or concern about landing this patch?
(In reply to Henry Chang [:henry][:hchang] from comment #8)
> Anyways, let's see how the background update reduce the blocking time.
> If a few millisecond turns out unacceptable on the main thread, we can then
> take the async ClassifyLocalWithTables() approach.

FTR I consider *any* wait on the UI thread to be unacceptable.  We've done a lot of work to ensure the UI stays responsive at all times, and in my experience any synchronous wait no matter how small we think it is on the average case hurts a lot in bad cases (for example when the system is otherwise under load.)  Also remember that our frame budget is 16ms, so a few milliseconds is a large amount of time to waste even if there is a guarantee that the wait will never exceed a few milliseconds.
(In reply to François Marier [:francois] from comment #6)
> This is Monica's explanation on why ClassifyLocalWithTables() has to be
> synchronous: https://bugzilla.mozilla.org/show_bug.cgi?id=1100024#c2
> 
> That call is done within nsHttpChannel::BeginConnect():
> http://searchfox.org/mozilla-central/rev/
> b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.
> cpp#6115
> 
> and the presence on the blacklist disables speculative connections:
> http://searchfox.org/mozilla-central/rev/
> b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.
> cpp#598
> 
> without looking at whether or not the tracker is whitelisted.

Note bug 1324053 here which made speculative connections to work when we're only annotating the channel with tracking status.  So this path is only in effect when TP is really on, not just when we're annotating channels.  (IOW, I agree with your evaluation.)
(In reply to François Marier [:francois] from comment #6)
> However, ClassifyLocalWithTables() is supposed to only take a few
> milliseconds at most:
> https://telemetry.mozilla.org/new-pipeline/dist.html#!
> cumulative=0&end_date=2017-02-21&keys=__none__!__none__!
> __none__&max_channel_version=nightly%252F54&measure=URLCLASSIFIER_CLASSIFYLOC
> AL_TIME&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submiss
> ions&start_date=2017-02-08&table=0&trim=1&use_submission_date=0
> 
> and if it's not, we'll be in trouble when we call it from within
> HttpChannel::BegingConnect().

FWIW that's exactly what's happening in bug 1325054.
Comment on attachment 8839836 [details]
Bug 1341506 - Part 1: Implement and use nsIURIClassifier.asyncClassifyLocalWithTables.

https://reviewboard.mozilla.org/r/114416/#review116442

Note that I'm not familiar much with the url-classifier bits, so I defer to Francois on this parts.

::: netwerk/base/nsChannelClassifier.cpp:690
(Diff revision 4)
> +
> +  NS_DECL_THREADSAFE_ISUPPORTS
> +  NS_DECL_NSIURICLASSIFIERCALLBACK
> +
> +private:
> +  virtual ~IsTrackerWhitelistedCallback() = default;

This shouldn't need to be virtual.

::: netwerk/base/nsChannelClassifier.cpp:776
(Diff revision 4)
>  
>    nsCOMPtr<nsIURI> whitelistURI;
>    rv = NS_NewURI(getter_AddRefs(whitelistURI), whitelistEntry);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> -  // Check whether or not the tracker is in the entity whitelist
> +  IsTrackerWhitelistedCallback* cb =

Please avoid using raw pointers on the stack, make this a RefPtr instead.

::: netwerk/base/nsChannelClassifier.cpp:796
(Diff revision 4)
> +nsresult
> +nsChannelClassifier::OnClassifyCompleteInternal(nsresult aErrorCode,
> +                                                const nsACString& aList,
> +                                                const nsACString& aProvider,
> +                                                const nsACString& aPrefix,
> +                                                bool aDidCheckWhitelistedTracker)

This boolean argument sucks.  Why not just move the block about IsTrackerWhitelisted() to OnClassifyComplete(), and call OnClassifyCompleteInternal() in both OnClassifyComplete() and IsTrackerWhitelistedCallback::OnClassifyComplete()?

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1678
(Diff revision 4)
> +    }
> +
> +    nsCOMPtr<nsIRunnable> cbRunnable =
> +      NS_NewRunnableFunction([callback, matchedLists] () -> void {
> +        // |callback| is captured as const value so ...
> +        auto cb = const_cast<nsIURIClassifierCallback*>(callback.get());

What's the const_cast for?  The argument isn't const.
Attachment #8839836 - Flags: review?(ehsan) → review+
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116446

::: dom/ipc/PURLClassifier.ipdl:28
(Diff revision 1)
> -protocol PURLClassifier
> +sync protocol PURLClassifier
>  {
>    manager PContent;
>  
> +parent:
> +  sync Classify(Principal principal, bool useTrackingProtection)

We have decided to not add more sync IPC messages (see bug 1336919).  So I'd rather not do it like this, especially since this API is asynchronous.
Attachment #8840386 - Flags: review?(ehsan) → review-
Comment on attachment 8839836 [details]
Bug 1341506 - Part 1: Implement and use nsIURIClassifier.asyncClassifyLocalWithTables.

https://reviewboard.mozilla.org/r/114416/#review116442

Thanks for review Ehsan!

> This shouldn't need to be virtual.

If I remember correctly, the compiler complained about that. I'll check again.

> This boolean argument sucks.  Why not just move the block about IsTrackerWhitelisted() to OnClassifyComplete(), and call OnClassifyCompleteInternal() in both OnClassifyComplete() and IsTrackerWhitelistedCallback::OnClassifyComplete()?

Not very clear what you suggested here but I'll try to remvoe the boolean arg. The overall call flow might be refactored a little bit to avoid the use of the boolean arg.

> What's the const_cast for?  The argument isn't const.

It's because the captured values for c++ lambda are "const" by default :(
There are many other ways to get around this (e.g. mutable or [&]) and I think
de-constness is the best one.
(In reply to :Ehsan Akhgari from comment #7)
> This profile <https://perfht.ml/2miKkrd> shows that this can block on disk
> I/O, so it's not possible to asset that this is supposed to take a few
> milliseconds.  If we want to ensure that, we'd need to do asynchronous I/O.

Ah I hadn't realized this was blocking on _disk_ I/O too.

It's definitely blocking on network I/O in the case of updates (hence the worker thread refactor), but if this is blocking on disk too, then I agree it makes sense to make IsTrackerWhitelisted() async.
(In reply to Henry Chang [:henry][:hchang] from comment #18)
> Comment on attachment 8839836 [details]
> Bug 1341506 - Part 1: Implement and use
> nsIURIClassifier.asyncClassifyLocalWithTables.
> 
> https://reviewboard.mozilla.org/r/114416/#review116442
> 
> Thanks for review Ehsan!
> 
> > This shouldn't need to be virtual.
> 
> If I remember correctly, the compiler complained about that. I'll check
> again.

The compiler complains if your dtor is *public* and non-virtual, I'm pretty sure.  The Release() function is what calls the dtor, and here the Release() function is defined in the leaf class so it will always call the correct dtor if it isn't virtual.

> > This boolean argument sucks.  Why not just move the block about IsTrackerWhitelisted() to OnClassifyComplete(), and call OnClassifyCompleteInternal() in both OnClassifyComplete() and IsTrackerWhitelistedCallback::OnClassifyComplete()?
> 
> Not very clear what you suggested here but I'll try to remvoe the boolean
> arg. The overall call flow might be refactored a little bit to avoid the use
> of the boolean arg.

Let me rephrase: I think you can make OnClassifyComplete() check IsTrackerWhitelisted() and make OnClassifyCompleteInternal() assume the caller has dealt with it.  This way, if OnClassifyComplete() needs to check whether a tracker is white-listed, it kicks of the async work, and when it's finished it calls OnClassifyCompleteInternal() to continue, otherwise if we don't need to check whether a tracker is white-listed, you can continue to call OnClassifyCompleteInternal() directly.  Basically make OnClassifyCompleteInternal() only do what it does when in your current patch the bool arg is true.

> > What's the const_cast for?  The argument isn't const.
> 
> It's because the captured values for c++ lambda are "const" by default :(
> There are many other ways to get around this (e.g. mutable or [&]) and I
> think
> de-constness is the best one.

Silly C++!  OK, this is fine.
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116446

> We have decided to not add more sync IPC messages (see bug 1336919).  So I'd rather not do it like this, especially since this API is asynchronous.

It's inheritedly sync [1]. 

To explain a bit more about the change:

We now have two IPC functions for URIClassifier in PContent.ipdl:

1) sync PURLClassifier(Principal principal, bool useTrackingProtection)
        returns (bool success);
2) sync ClassifyLocal(URIParams uri, nsCString tables)
        returns (nsresult rv, nsCString[] results);

In this patch, I'd like to add an async IPC for AsyncClassifyLocal. 
The simplest way is to add yet another new async IPC to PContent.ipdl.
However, that would make every new URLClassifier IPC call expand PContent.ipdl
a little bit. 

What I chose is to avoid adding every URIClassifier IPC calls to PContent.
So, I decided to pull every URLClassifier IPC calls into PURLClassifier.
(That said, I forgot moving "sync ClassifyLocal" from  PContent to PURLClassifier)

If you think my approach is too confusing or not making sense, I can just
add a new IPC call to PContent.ipdl.

[1] http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/dom/ipc/PContent.ipdl#833
(In reply to François Marier [:francois] from comment #19)
> (In reply to :Ehsan Akhgari from comment #7)
> > This profile <https://perfht.ml/2miKkrd> shows that this can block on disk
> > I/O, so it's not possible to asset that this is supposed to take a few
> > milliseconds.  If we want to ensure that, we'd need to do asynchronous I/O.
> 
> Ah I hadn't realized this was blocking on _disk_ I/O too.
> 
> It's definitely blocking on network I/O in the case of updates (hence the
> worker thread refactor), but if this is blocking on disk too, then I agree
> it makes sense to make IsTrackerWhitelisted() async.

Out of curiosity, isn't waiting on the network worse than waiting on the disk?  :-)
(In reply to Henry Chang [:henry][:hchang] from comment #21)
> Comment on attachment 8840386 [details]
> Bug 1341506 - Part 2: Deal with content process case.
> 
> https://reviewboard.mozilla.org/r/114902/#review116446
> 
> > We have decided to not add more sync IPC messages (see bug 1336919).  So I'd rather not do it like this, especially since this API is asynchronous.
> 
> It's inheritedly sync [1]. 
> 
> To explain a bit more about the change:
> 
> We now have two IPC functions for URIClassifier in PContent.ipdl:
> 
> 1) sync PURLClassifier(Principal principal, bool useTrackingProtection)
>         returns (bool success);
> 2) sync ClassifyLocal(URIParams uri, nsCString tables)
>         returns (nsresult rv, nsCString[] results);
> 
> In this patch, I'd like to add an async IPC for AsyncClassifyLocal. 
> The simplest way is to add yet another new async IPC to PContent.ipdl.
> However, that would make every new URLClassifier IPC call expand
> PContent.ipdl
> a little bit. 

Thanks for the explanation, now this makes more sense.

> What I chose is to avoid adding every URIClassifier IPC calls to PContent.
> So, I decided to pull every URLClassifier IPC calls into PURLClassifier.
> (That said, I forgot moving "sync ClassifyLocal" from  PContent to
> PURLClassifier)

I think that's what confused me and made me think that you are actually adding a new sync IPC call and not just moving the existing ones.  If you remove the PContent sync message then I think your patch makes more sense to me.  Note that you still need an IPC peer review for the sync-messages.ini changes, but that shouldn't be an issue if you're just moving stuff around.
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116472

::: dom/ipc/PContent.ipdl:834
(Diff revision 2)
>  
>      async PFlyWebPublishedServer(nsString name, FlyWebPublishOptions params);
>  
> -    sync PURLClassifier(Principal principal, bool useTrackingProtection)
> +    async PURLClassifier();
> -        returns (bool success);
>      sync ClassifyLocal(URIParams uri, nsCString tables)

Didn't we agree to remove this?  :-)
Attachment #8840386 - Flags: review?(ehsan) → review-
Comment on attachment 8839836 [details]
Bug 1341506 - Part 1: Implement and use nsIURIClassifier.asyncClassifyLocalWithTables.

https://reviewboard.mozilla.org/r/114416/#review116462

The patch looks good to me. Once you address Ehsan's comments and restore the missing LOG line, r+.

Also, please add the following comment to nsHttpChannel::BeginConnect(): http://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.cpp#6117

    // We cannot check the entity whitelist here (IsTrackerWhitelisted()) because that method is asynchronous
    // and we need to run synchronously here. See https://bugzilla.mozilla.org/show_bug.cgi?id=1100024#c2.

I want to make sure we don't accidentally break this in the future.

::: netwerk/base/nsChannelClassifier.cpp:711
(Diff revision 4)
> +                                                 const nsACString& /*aProvider*/,
> +                                                 const nsACString& /*aPrefix*/)
> +{
> +  nsresult rv;
> +  if (aLists.IsEmpty()) {
> +    // Not on the white list. Still an tracker URI.

Missing LOG().

::: netwerk/base/nsChannelClassifier.cpp:803
(Diff revision 4)
>      // Should only be called in the parent process.
>      MOZ_ASSERT(XRE_IsParentProcess());
>  
>      if (aErrorCode == NS_ERROR_TRACKING_URI &&
> -        NS_SUCCEEDED(IsTrackerWhitelisted())) {
> -      LOG(("nsChannelClassifier[%p]:OnClassifyComplete tracker found "
> +        !aDidCheckWhitelistedTracker &&
> +        NS_SUCCEEDED(IsTrackerWhitelisted(aList, aProvider, aPrefix))) {

We should keep this LOG line because it's very useful for debugging.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1679
(Diff revision 4)
> +
> +    nsCOMPtr<nsIRunnable> cbRunnable =
> +      NS_NewRunnableFunction([callback, matchedLists] () -> void {
> +        // |callback| is captured as const value so ...
> +        auto cb = const_cast<nsIURIClassifierCallback*>(callback.get());
> +        cb->OnClassifyComplete(NS_OK,           // don't care.

nit: instead of "don't care", you could also say "not used"
Attachment #8839836 - Flags: review?(francois) → review+
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116472

> Didn't we agree to remove this?  :-)

I only wanted to push a new patch part 3 for telemetry but the reviewboard automatically requested a review for all previous patches. That's so annoying. Sorry :(
Haha, no worries, reviewboard can be quite surprising sometimes.  :-)
Comment on attachment 8840507 [details]
Bug 1341506 - Part 2: Add telemetry for AsyncClassifyLocalWithTables.

https://reviewboard.mozilla.org/r/114990/#review116498

datareview+
Attachment #8840507 - Flags: review+
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

Kanru,

Could you help review new sync messages which I moved around? They were PContent sync messages and now are sync messages of PURLClassifier. One of the sync message is associated with a bug.

For your information, the purpose of this bug is to add a new url-classifier related message. Url-classifier related messages used to be PContent messages. In order to not grow PContent every time we add new url-classifier IPC message, I decided to put all url-classifer messages to PURLClassifier.
Attachment #8840386 - Flags: review?(kchen)
Attachment #8840386 - Flags: review?(ehsan)
Attachment #8840386 - Flags: review?(wmccloskey)
In case Kan-ru is too busy to do the review, add Bill as reviewer, too.
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116662

::: ipc/ipdl/sync-messages.ini:1010
(Diff revision 3)
> +[PURLClassifier::Classify]
> +description = Was PContent::PURLClassifier.
> +[PURLClassifier::ClassifyLocal]
> +description = To be replaced by PURLClassifier::AsyncClassifyLocal. See Bug 1342333

r=me since this only moves around the sync messages.

It would be clearer if the descriptions say why sync messages are needed here.
Attachment #8840386 - Flags: review?(kchen) → review+
Attachment #8840386 - Flags: review?(wmccloskey)
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116916

::: dom/ipc/PURLClassifier.ipdl:29
(Diff revision 3)
>  {
>    manager PContent;
>  
> +parent:
> +  // Moved from PContent.ipdl.
> +  sync Classify(Principal principal, bool useTrackingProtection)

Do you mind filing a similar bug for this one as well?  Kudos if you want to take that one as well.  :-)

::: dom/ipc/PURLClassifier.ipdl:41
(Diff revision 3)
> +    returns (nsresult rv, nsCString[] results);
> +
> +  async AsyncClassifyLocal(URIParams uri, nsCString tables);
> +
>  child:
>    async __delete__(MaybeInfo info, nsresult errorCode);

This is a pretty weird protocol now, with peculiar semantics.  You're supposed to setup the actors, and then send either of the three messages to the parent, and the parent deletes the actor and delivers the result no matter which message you sent!!!

I think this is fine for now since we're planning to make more changes, but please at least document how this setup currently works for the benefit of others reading this code.

::: dom/ipc/URLClassifierParent.cpp:54
(Diff revision 3)
> +                                       nsTArray<nsCString>* aResults)
> +{
> +  MOZ_ASSERT(aResults);
> +  nsCOMPtr<nsIURI> uri = DeserializeURI(aURI);
> +  if (!uri) {
> +    return IPC_FAIL_NO_REASON(this);

returning IPC_FAIL_NO_REASON crashe the child, pretty sure you don't want that anywhere here.  r- for that.

::: dom/ipc/URLClassifierParent.cpp:89
(Diff revision 3)
> +    // classifier returns successfully but doesn't perform a lookup as the
> +    // classification not yielding any results, so we just kill the child actor
> +    // without ever calling out callback in both cases.
> +    // This means that code using this in the child process will only get a hit
> +    // on its callback if some classification actually happens.
> +    ClassificationFailed();

Why do you need to do this here?  This is an async function, so if you want to communicate failure you can do that aynchronously.

::: ipc/ipdl/sync-messages.ini:1011
(Diff revision 3)
>  [PHandlerService::Exists]
>  description =
>  [PHandlerService::GetTypeFromExtension]
>  description =
> +[PURLClassifier::Classify]
> +description = Was PContent::PURLClassifier.

This one should get a bug number as well (see above)

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1571
(Diff revision 3)
>    NS_ENSURE_ARG(aPrincipal);
>  
>    if (XRE_IsContentProcess()) {
>      using namespace mozilla::dom;
>      auto actor = static_cast<URLClassifierChild*>
> -      (ContentChild::GetSingleton()->
> +      (ContentChild::GetSingleton()->SendPURLClassifierConstructor());

Now we're doing two sync IPCs here.  :(  We should really fix this quickly, preferably in the same release as when this patch lands.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1643
(Diff revision 3)
> +  // Check and deal with content process case.
> +  if (XRE_IsContentProcess()) {
> +    using namespace mozilla::dom;
> +    using namespace mozilla::ipc;
> +    auto actor = static_cast<URLClassifierChild*>
> +      (ContentChild::GetSingleton()->SendPURLClassifierConstructor());

Another sync IPC incurred here too.  :((

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1725
(Diff revision 3)
>    nsresult rv;
>    if (XRE_IsContentProcess()) {
>      using namespace mozilla::dom;
>      using namespace mozilla::ipc;
> +    auto actor = static_cast<URLClassifierChild*>
> +      (ContentChild::GetSingleton()->SendPURLClassifierConstructor());

Here also we're sending two sync IPC messages now... :(
Attachment #8840386 - Flags: review?(ehsan) → review-
Comment on attachment 8840386 [details]
Bug 1341506 - Part 2: Support e10s and pull url classifier e10s message in PURLClassifier.

https://reviewboard.mozilla.org/r/114902/#review116916

Thank for the review!

I noted that the next merging day is coming (3/6), do you think this patch (ipc part) is too risky to be landed one week before?

Some alternatives:

1) Skip the IPC part since it will not be called in the child process. http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/netwerk/base/nsChannelClassifier.cpp#738

2) Re-do something like http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/PContent.ipdl#833 (i.e. yet another protocol with no message other than delete.)

> This is a pretty weird protocol now, with peculiar semantics.  You're supposed to setup the actors, and then send either of the three messages to the parent, and the parent deletes the actor and delivers the result no matter which message you sent!!!
> 
> I think this is fine for now since we're planning to make more changes, but please at least document how this setup currently works for the benefit of others reading this code.

Do you mean the original design was also weird or it becomes weired after the changes?

The very first e10s support for URLClassifier was written by you so I assume it becomes weired when I pull every thing to URLClassifier protocol.

> returning IPC_FAIL_NO_REASON crashe the child, pretty sure you don't want that anywhere here.  r- for that.

It's copied from

http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/ContentParent.cpp#4994

Let me find if any error else I should return here.

> Now we're doing two sync IPCs here.  :(  We should really fix this quickly, preferably in the same release as when this patch lands.

Is there any other approach I can take? Sorry for such a dumb question :(

I mean the parent-side actor constuction is inevitably to be sync and SendClassify() is inheritedly to be sync.
Maybe call SendClassify() asynchronoulsy in the next tick to yield to other main thread task a little bit?

Or you just refer to the necessity of making Classify() a async call?

> Here also we're sending two sync IPC messages now... :(

Also, since there is no callback assocaited with this function, the actor will never be deleted :(
Attachment #8840386 - Flags: review?(ehsan)
Comment on attachment 8841512 [details]
Bug 1341506 - Part 2.1: Refined IPC.

Hi Ehsan,

I tried to address the "weirdness" issue of modified PURLClassifier 
in the new patch. (just to demonstrate my idea and see if the change 
is too risky too land one week before merge day so I didn't override 
the old patch.)

1) Now DBService will have a persistent PURLClassifierChild (i.e. composition)

2) The child-side DBService will create PURLClassifierChild on Init() 
   and destroy on ~DBService() 
   (actually I am not sure about the destruction time. Can it be right 
    before ~DBService()?)

3) For async call, PURLClassifierChild will associate the callback with an id 
   and id will be exchanged in IPC calls. (I guess many other gecko IPCs use this idiom, too.)

That's all. The idea is fairly simple but the code change is massive. 
I had a preliminary test and it seemed good.

Do you mind having a look and suggesting the approach I should take at this point?
Thanks!
Attachment #8841512 - Flags: feedback?(ehsan)
(In reply to Henry Chang [:henry][:hchang] from comment #39)
> I noted that the next merging day is coming (3/6), do you think this patch
> (ipc part) is too risky to be landed one week before?
> 
> Some alternatives:
> 
> 1) Skip the IPC part since it will not be called in the child process.
> http://searchfox.org/mozilla-central/rev/
> 90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/netwerk/base/nsChannelClassifier.
> cpp#738
> 
> 2) Re-do something like
> http://searchfox.org/mozilla-central/rev/
> 90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/PContent.ipdl#833 (i.e. yet
> another protocol with no message other than delete.)

Hmm, are we planning to use this new function in the child process for anything?  If not, I think the best plan going forward is to return NS_ERROR_NOT_IMPLEMENTED in the child process and not try to fix this unless we have to when we have a use case for this function in the child process.  What do you think?

> > This is a pretty weird protocol now, with peculiar semantics.  You're supposed to setup the actors, and then send either of the three messages to the parent, and the parent deletes the actor and delivers the result no matter which message you sent!!!
> > 
> > I think this is fine for now since we're planning to make more changes, but please at least document how this setup currently works for the benefit of others reading this code.
> 
> Do you mean the original design was also weird or it becomes weired after
> the changes?
> 
> The very first e10s support for URLClassifier was written by you so I assume
> it becomes weired when I pull every thing to URLClassifier protocol.

In the original e10s support change, I intended PURLClassifier to be a one-off protocol that only exists for the purpose of delivering the asynchronous delete response.  Basically the goal was for the child to create the protocol to ask a question about the classification of a URL and get the answer as the protocol deletion message.  The other way to do that would be by just having a couple of messages on PContent.  I didn't really expect us to add more functionality to the protocol.  :-)

If we want to add more functionality to the protocol, I think we need to change its lifetime rule to not be around asking a single question from the parent process, something like your new patch is doing.  However, this is a lot of work, and I prefer to not have to do it unless if we have a good reason to.  :-)

> > returning IPC_FAIL_NO_REASON crashe the child, pretty sure you don't want that anywhere here.  r- for that.
> 
> It's copied from
> 
> http://searchfox.org/mozilla-central/rev/
> 90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/ContentParent.cpp#4994
> 
> Let me find if any error else I should return here.

Those are probably my mistakes...  But it seems like they haven't blown up yet, most likely because we aren't taking those failure paths.  See bug 1325255 about the same issue in a slightly earlier part of the code.

> > Now we're doing two sync IPCs here.  :(  We should really fix this quickly, preferably in the same release as when this patch lands.
> 
> Is there any other approach I can take? Sorry for such a dumb question :(
> 
> I mean the parent-side actor constuction is inevitably to be sync and
> SendClassify() is inheritedly to be sync.
> Maybe call SendClassify() asynchronoulsy in the next tick to yield to other
> main thread task a little bit?
> 
> Or you just refer to the necessity of making Classify() a async call?

The better approach would be to fix the lifetime of the protocol to not have to create it synchronously like this.  But again, I prefer to not do that if we can avoid it.  The reason is that the existing protocol has sync IPC issues that we should fix, so I'm really reluctant to add more if we can avoid it.

> > Here also we're sending two sync IPC messages now... :(
> 
> Also, since there is no callback assocaited with this function, the actor
> will never be deleted :(

Yeah, again this is due to the lifetime of the protocol being framed around asking a single question from the parent.
Flags: needinfo?(hchang)
(In reply to :Ehsan Akhgari from comment #45)
> (In reply to Henry Chang [:henry][:hchang] from comment #39)
> > I noted that the next merging day is coming (3/6), do you think this patch
> > (ipc part) is too risky to be landed one week before?
> > 
> > Some alternatives:
> > 
> > 1) Skip the IPC part since it will not be called in the child process.
> > http://searchfox.org/mozilla-central/rev/
> > 90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/netwerk/base/nsChannelClassifier.
> > cpp#738
> > 
> > 2) Re-do something like
> > http://searchfox.org/mozilla-central/rev/
> > 90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/PContent.ipdl#833 (i.e. yet
> > another protocol with no message other than delete.)
> 
> Hmm, are we planning to use this new function in the child process for
> anything?  If not, I think the best plan going forward is to return
> NS_ERROR_NOT_IMPLEMENTED in the child process and not try to fix this unless
> we have to when we have a use case for this function in the child process. 
> What do you think?
> 

Given AsyncClassifyLocalWithTables is to replace ClassifyLocalWithTables,
if we can make sure all occurrences of ClassifyLocalWithTables() are on parent process
then we don't need the content side support:

1) http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/base/nsDocument.cpp#13067

Not sure if it can be on child.

2) http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/netwerk/protocol/http/nsHttpChannel.cpp#6126

IIRC, nsHttpChannel::BeginConnect can only be called on parent.

3) http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/netwerk/base/nsChannelClassifier.cpp#720

What I am going to fix in this patch.

4) http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/dom/ipc/ContentParent.cpp#5001

Should be gone away if we fixed (1)(2)(3).

If (1) will never be on child side, we can forget the e10s support for AsyncClassifyLocalWithTables
(at least in the current code base ^^")
Flags: needinfo?(hchang)
Comment on attachment 8841512 [details]
Bug 1341506 - Part 2.1: Refined IPC.

Drop the feedback according to comment 45.
Attachment #8841512 - Flags: feedback?(ehsan)
Hi Ehsan,

According to comment 45, patch for IPC support is not needed in this bug. (return NS_ERROR_NOT_IMPLEMENTED instead.) Can I just strip patch part 2 and add "return NS_ERROR_NOT_IMPLEMENTED" to part 1 then land the patches? Thanks!
Flags: needinfo?(ehsan)
(1) in comment 45 always happens in the child process.  That being said, we can fix that without using asyncClassifyLocalWithTables (at least in theory).  The rest of the cases are parent-only.  So let's just return NS_ERROR_NOT_IMPLEMENTED here.  :-)

Sorry I didn't think of this earlier.
Flags: needinfo?(ehsan)
Attachment #8840386 - Attachment is obsolete: true
Attachment #8841512 - Attachment is obsolete: true
Blocks: 1343425
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/399c4050991d
Part 1: Implement and use nsIURIClassifier.asyncClassifyLocalWithTables. r=Ehsan,francois
https://hg.mozilla.org/integration/autoland/rev/927f95814b29
Part 2: Add telemetry for AsyncClassifyLocalWithTables. r=francois
https://hg.mozilla.org/mozilla-central/rev/399c4050991d
https://hg.mozilla.org/mozilla-central/rev/927f95814b29
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Blocks: 1342333
You need to log in before you can comment on or make changes to this bug.