Closed Bug 1100024 Opened 5 years ago Closed 5 years ago

Tracking Protection does not block some resources, although it reports that it has.

Categories

(Core :: DOM: Security, defect)

36 Branch
x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: michael.oneill, Assigned: mmc)

References

(Depends on 1 open bug, )

Details

Attachments

(2 files, 5 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20141115030205

Steps to reproduce:

Load Fiddler.

Load Firefox 36.0a1.

Enable WebConsole

Visit http://stltoday.com




Actual results:

See screenshots here http://baycloud.com/FireFoxTrackingProtectionbug

Fiddler shows that resource in domain leeenterprises.112.2o7.net has in fact not been blocked, although WebConsole.console and WebConsole.network both show it as being blocked.


Expected results:

The resource should have been blocked. Notice the request contains a UID cookie so the user was tracked.
Tracking Protection was enabled (obviously)
Hardware: x86 → x86_64
Blocks: 1029886
Michael, thanks for the detailed bug report and screenshots. Pat and I looked at this and the short answer is that the only reason that tracking protection ever works is that it when wins a race condition between making speculative connections (possibly including pre-fetching the entire resource) and cancelling the channel on finding that a URI should be blocked.

nsChannelClassifier was originally intended for preventing loading of phishing and malware pages. When it is invoked, it suspends the channel until a verdict is reached (https://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsChannelClassifier.cpp#290). However, suspension refers only to receiving network events like onStart and onStop and actually occurs after nsHttpChannel::BeginConnect (https://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#4964). The actual channel is still alive on a separate networking thread and may still send and receive requests. If the classification callback is received before any data (handshake, prefetching, etc.) is transferred then things work as intended. If not, then there is a problem. In the phishing case this is fine because the DOM will not render though the resource is fetched. In the malware case, this may result in driveby downloads (which should be mitigated by application reputation) but nothing that requires user interaction will be able to compromise the user. For tracking protection, this is completely broken.

If the Classify API were synchronous, we could simply invoke it before calling BeginConnect. However, since local lookups may spawn remote gethash network requests, we can't just make the Classify API synchronous. After talking to Pat, here is one possible solution that allows us to take advantage of speculative fetching in the common case but not do it when there is a possibility that the URI appears on any blocklist.
1) Split nsIUrlClassifier.Classify into 2 parts, ClassifyLocal and ClassifyRemote. ClassifyLocal is blocking, ClassifyRemote is async.
2) Call ClassifyLocal in nsHttpChannel::BeginConnect before any speculative connections are made. If the result is OK, make speculative connections as usual.
3) If the result is not OK, don't do any speculative connections and wait on the result of ClassifyRemote to unsuspend the channel as usual. This would limit any performance hit from skipping speculative connections to channels which we are likely to kill anyway.

Note that nsChannelClassifier appears in both nsHttpChannel and nsBaseChannel so both callers would have to be fixed.

gcp, what do you think? Pat, please correct me if I summarized wrong.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mcmanus)
Flags: needinfo?(gpascutto)
Sounds good to me. One thing to think about: the false positive hit rate in the DB is like 1/7000? So once every 7k requests you'll have a delay. That sounds acceptable/negligible. If it were more, we could consider splitting the behavior by table type. My thinking is that the tracking DB is a lot smaller (would that always be true?) so it'll have far fewer false positives and the perf impact would be essentially zero. (And you'd only block for tracking hits, not phishing/malware hits). But it doesn't sound like that'd be needed.
Flags: needinfo?(gpascutto)
Thanks Monica, quick question: How do ad blocker extensions work? Do they get in before nsHttpChannel::BeginConnect, if so why not just do that?

Mike
Ad blockers don't query a remote server to verify false positives, I presume. The SafeBrowsing DB is heavily compressed.
(In reply to michael.oneill from comment #4)
> Thanks Monica, quick question: How do ad blocker extensions work? Do they
> get in before nsHttpChannel::BeginConnect, if so why not just do that?

They use nsIContentPolicy.shouldLoad, which is called before the channel is even created in each separate object loader. The reason we didn't take this approach is that shouldLoad is synchronous and the safebrowsing lookup API is async, so it is hard to wrap an async call in a synchronous one without causing lots of problems.

> Mike
(In reply to Gian-Carlo Pascutto [:gcp] from comment #3)
> Sounds good to me. One thing to think about: the false positive hit rate in
> the DB is like 1/7000? So once every 7k requests you'll have a delay. That
> sounds acceptable/negligible. If it were more, we could consider splitting
> the behavior by table type. My thinking is that the tracking DB is a lot
> smaller (would that always be true?) so it'll have far fewer false positives
> and the perf impact would be essentially zero. (And you'd only block for
> tracking hits, not phishing/malware hits). But it doesn't sound like that'd
> be needed.

Alrighty then. I think that's an OK false positive rate. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1108009 to do the safebrowsing work.
right - just before begin connect do classify local.. and if classify remote is required return from async open without doing beginConnect.. as long as classifyRemote() remote is a rare event that ought to be fine.

happy to help review things here as needed.
Flags: needinfo?(mcmanus)
Assignee: nobody → mmc
Status: NEW → ASSIGNED
Comment on attachment 8540947 [details] [diff] [review]
Skip SpeculativeConnect if the principal is on a local blocklist

Review of attachment 8540947 [details] [diff] [review]:
-----------------------------------------------------------------

This skips SpeculativeConnect() and DNS prefetching if the channel's principal is on a local blocklist. In theory we could restrict this for principals that are on the malware or tracking blocklists -- however since people very rarely visit malware or phishing sites, that is probably fine.
Attachment #8540947 - Flags: review?(sworkman)
Attachment #8540947 - Flags: review?(mcmanus)
hey monica - this diff only addresses one, minor subset, of this bug at least as I understand it.. is it just part of a series?
Hey Pat,

The safebrowsing work was done in bug 1100024. I ended up not adding ClassifyRemote -- because in order for that call to make sense the caller would have to save hash prefixes and so on that ClassifyRemote needs to save work. So instead I just added nsHttpChannel::mLocalBlocklist to store the result of ClassifyLocal, and if that's true, skip SpeculativeConnect.

Are there additional sources of speculative connections?

Thanks,
Monica
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #13)
> The safebrowsing work was done in bug 1100024.

Um, I meant https://bugzilla.mozilla.org/show_bug.cgi?id=1108009.
So this would suppress speculative connect (i.e. tcp and ssl handshakes), but it won't suppress actual connects and http transactions :) (tcp and ssl handshakes + GET and cookies!)

For that to happen you really need to prevent Connect() from being called.. one way to do that is setting mCanceled = true from your tracking list check.
Comment on attachment 8540947 [details] [diff] [review]
Skip SpeculativeConnect if the principal is on a local blocklist

Review of attachment 8540947 [details] [diff] [review]:
-----------------------------------------------------------------

I think we need to block Connect() not just the speculative bits..

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +241,5 @@
>      , mConcurentCacheAccess(0)
>      , mIsPartialRequest(0)
>      , mHasAutoRedirectVetoNotifier(0)
>      , mPushedStream(nullptr)
> +    , mClassifier(nullptr)

don't init a smart ptr, but I think this should be a stack ptr instead of a member anyhow

@@ +441,5 @@
> +    // don't speculate if we are on a local blocklist, on uses of the offline
> +    // application cache, if we are offline, when doing http upgrade (i.e.
> +    // websockets bootstrap), or if we can't do keep-alive (because then we
> +    // couldn't reuse the speculative connection anyhow).
> +    if (mLocalBlocklist || mApplicationCache || gIOService->IsOffline() ||

you might be able to turn mLocalBlocklist into mCanceled and get rid of the former

@@ +4873,5 @@
>      }
> +    // Check to see if this principal exists on local blocklists.
> +    if (mLoadFlags & LOAD_CLASSIFY_URI) {
> +        if (!mClassifier) {
> +            mClassifier = do_GetService(NS_URICLASSIFIERSERVICE_CONTRACTID);

generally this code is only going to be called once for the channel instance, so mClassifier can just live on the stack instead of as a member

@@ +4882,5 @@
> +                                           false);
> +            nsresult response = NS_OK;
> +            mClassifier->ClassifyLocal(principal, tp, &response);
> +            if (NS_FAILED(response)) {
> +                LOG(("nsHttpChannel::Skipping speculative connection [this=%p]", this));

probly log that it failed a classification check rather than the speculative bit which is actually being skipped elsewhere

@@ +4883,5 @@
> +            nsresult response = NS_OK;
> +            mClassifier->ClassifyLocal(principal, tp, &response);
> +            if (NS_FAILED(response)) {
> +                LOG(("nsHttpChannel::Skipping speculative connection [this=%p]", this));
> +                mLocalBlocklist = true;

consider whether it is enough to make this mCanceled = true and getting rid of the new bool. I'm ok with the bool if we need it.
Attachment #8540947 - Flags: review?(mcmanus) → review-
Is it kosher to set mCanceled to false after setting it to true? If not, there may be a new interface in my future...

nsChannelClassifier will need to call nsHttpChannel::Connect() as a result of a good verdict.
Attachment #8540947 - Flags: review?(sworkman)
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #17)
> Is it kosher to set mCanceled to false after setting it to true?

I would say no.

> If not,
> there may be a new interface in my future...
> 
> nsChannelClassifier will need to call nsHttpChannel::Connect() as a result
> of a good verdict.

I'm not sure I understand this.. classifylocal can decide a principal should be blocked (and call ::Cancel(reason)).. but later the normal classification code would think it should not be blocked?
if it helps - the callonmodifyrequestobserver() stub is where addons will often cancel the channel. You definitely want your code to run shortly after that as on-modify-request can modify all the important things (including perhaps the url, though I'd need to double check that).
> > nsChannelClassifier will need to call nsHttpChannel::Connect() as a result
> > of a good verdict.
> 
> I'm not sure I understand this.. classifylocal can decide a principal should
> be blocked (and call ::Cancel(reason)).. but later the normal classification
> code would think it should not be blocked?

So, safebrowsing makes gethash requests over the network in case a hash is found on a local blocklist. This allows us to keep truncated hash data (32 bits instead of 256 bits). The gethash response includes all of the full hashes that match prefixes that were found on the blocklist. That's why nsIURIClassifier.Classify is async.

I added nsIURIClassifier.ClassifyLocal just to check for presence on local blocklists, but it's not enough data to cancel the channel -- the prefix match may be a collision with something else on the (full hash) blocklist.
classifyLocal needs to return OK or MAYBE.. in the case of maybe a async classifyRemote needs to be done to determine the result. This all has to be done after on-modyify-request but before Connect(). If you don't do it before Connect() the networking will happen on another thread - sending your connection and cookies on the wire even if channel events are suppressed via Suspend().

Given that constraint its easy enough to put it before the speculative stuff.

you're going to need to break BeginConnect() into BeginConnect and ContinnueBeginConnect() (or call them PreConnect1() and PreConnect2() if you like).. if classifyLocal() returns OK then call ContinueBeginConnect() directly, but if you need to call classifyRemote then call ContinueBeginConnect() from the classifyRemote() callback (canceling the channel before doing so if classifyRemote says that's the right thing to do).

make sense?

I don't really understand comment 13 - but I don't see a way around adding a classifyRemote path.
Attachment #8540947 - Attachment is obsolete: true
Attachment #8542615 - Attachment is obsolete: true
Comment on attachment 8542616 [details] [diff] [review]
Don't call Connect if the principal is on a local or remote blocklist

Review of attachment 8542616 [details] [diff] [review]:
-----------------------------------------------------------------

I think this addresses comment 15 and comment 21. I had to add an extra interface to nsIHttpChannelInternal -- since this code runs in the parent I did not implement it in the child.
Attachment #8542616 - Flags: review?(mcmanus)
Attachment #8542616 - Attachment is obsolete: true
Attachment #8542616 - Flags: review?(mcmanus)
Attachment #8542617 - Flags: review?(mcmanus)
Comment on attachment 8542617 [details] [diff] [review]
Don't call Connect if the principal is on a local or remote blocklist

Review of attachment 8542617 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsChannelClassifier.cpp
@@ +210,5 @@
>      return NS_OK;
>  }
>  
>  nsresult
>  nsChannelClassifier::Start(nsIChannel *aChannel)

this has been changed so that it can only return NS_OK.. so it should really be changed to be void and defined to always succeed in the header.

@@ +490,1 @@
>  #ifdef DEBUG

this feedback isn't really part of this patch - this ifdef should be removed. The log macro will opt-out of the real work if logging isn't setup.

(the one a few lines above is more defensible because it does a bunch of URI things before calling the LOG macro

::: netwerk/base/src/nsChannelClassifier.h
@@ +21,5 @@
>      NS_DECL_NSIURICLASSIFIERCALLBACK
>  
> +    // Calls nsIURIClassifier.Classify with the principal of the given channel,
> +    // and cancels the channel on a bad verdict.  If aChannel is
> +    // nsIHttpChannelInternal, nsChannelClassifier must call

I think this contract breaks down badly if someone uses our http stack but implements a different nsI*classifier

@@ +28,4 @@
>      nsresult Start(nsIChannel *aChannel);
>  
>  private:
> +    // True if the channel is on the allow list.

i think this is a stray comment now

::: netwerk/protocol/http/HttpBaseChannel.cpp
@@ +1403,5 @@
>  
>  NS_IMETHODIMP
> +HttpBaseChannel::ContinueBeginConnect()
> +{
> +  return NS_ERROR_NOT_IMPLEMENTED;

First, assert this is the child
Second, assert false as this shouldn't happen in the child :)

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +4951,5 @@
> +        return ContinueBeginConnect();
> +    }
> +    MOZ_ASSERT(!mCanceled && mLocalBlocklist);
> +    // nsChannelClassifier must call ContinueBeginConnect after optionally
> +    // cancelling the channel once we have a remote verdict.

please extend this comment to make it clear you're calling a concrete class that knows about this contract, instead of calling an nsI* that might be overridden.

::: netwerk/protocol/http/nsIHttpChannelInternal.idl
@@ +228,5 @@
> +    /**
> +     * Used only by nsChannelClassifier to resume connecting or abort the
> +     * channel after a remote classification verdict.
> +     */
> +    void continueBeginConnect();

I think this would be better done as a explicit closure on the classifier::start method - but I'm ok with committing what you have here.
Attachment #8542617 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #29)
> Comment on attachment 8542617 [details] [diff] [review]
>
> 
> I think this contract breaks down badly if someone uses our http stack but
> implements a different nsI*classifier

disregard this part of the comment. I thought I had deleted it - I wrote it before I realized you were calling a concrete netwerk class for ::start from nsHttpChannel.
Comment on attachment 8544714 [details] [diff] [review]
Don't call Connect if the principal is on a local or remote blocklist (

Review of attachment 8544714 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, fixed except for adding a closure to Start() -- I tried it but it looked a little weird since nsBaseChannel calls it, too, and we're passing this twice.
Attachment #8544714 - Flags: review+
Attachment #8542617 - Attachment is obsolete: true
Backed out for Talos xperf regressions.
https://hg.mozilla.org/integration/mozilla-inbound/rev/02f0874e51bf

https://treeherder.mozilla.org/logviewer.html#?job_id=5127204&repo=mozilla-inbound

 TEST-UNEXPECTED-FAIL | mainthreadio | File '{profile}\safebrowsing\goog-phish-shavar.cache' was accessed and we were not expecting it: {'Count': 1, 'Duration': 0.02304, 'RunCount': 1}
Thanks ryanvm, it looks like that caught a bug where safebrowsing is not disabled properly in ClassifyLocal (which only checks for empty table names instead of the prefs to enable/disable). I am assuming that talos already disables safebrowsing because otherwise these files would already be on the whitelist.
Attachment #8544863 - Flags: review?(gpascutto)
Comment on attachment 8544863 [details] [diff] [review]
ClassifyLocal should not lookup in any tables if safebrowsing is disabled (

Review of attachment 8544863 [details] [diff] [review]:
-----------------------------------------------------------------

See remarks, but I can live with this either way.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1286,5 @@
>      tables.Append(phishing);
>    }
>    nsAutoCString tracking;
>    Preferences::GetCString(TRACKING_TABLE_PREF, &tracking);
>    if (aTrackingProtectionEnabled && !tracking.IsEmpty()) {

Hmm, given this change it would make much more sense if this was a very similar if (mCheckTrackingProtection && !tracking.IsEmpty()).

If there's nothing preventing aTrackingProtectionEnabled -> mCheckTrackingProtection then r- else r+.
Attachment #8544863 - Flags: review?(gpascutto) → review+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #44)
> Hmm, given this change it would make much more sense if this was a very
> similar if (mCheckTrackingProtection && !tracking.IsEmpty()).
> 
> If there's nothing preventing aTrackingProtectionEnabled ->
> mCheckTrackingProtection then r- else r+.

Unfortunately there is -- nsChannelClassifier sets this as a result of whether or not the load is third-party (because we don't want to block top-level navigations) or on a whitelisted page in addition to the preference. Thanks for the review.
(In reply to Wes Kocher (:KWierso) from comment #41)
> This bc1 failure appears to be from this patch also:
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=5136398&repo=mozilla-inbound

I retriggered and got some green runs on https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8acec4afd6f8. However, it does look like this patch makes that test more flaky. I submitted a patch for review to disable the flaky test on windows.
Depends on: 1120145
https://hg.mozilla.org/mozilla-central/rev/6c8be2f5b065
https://hg.mozilla.org/mozilla-central/rev/671ad56e6e12
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
I think this is causing regressions, backing out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Btw, the only part that needs to be backed out is https://hg.mozilla.org/integration/mozilla-inbound/rev/671ad56e6e12, and that can be relanded after https://bugzilla.mozilla.org/show_bug.cgi?id=1120499 is complete.
Depends on: 1120547
Per bug 1120547 comment 3 - 4, this probably should include a uuid-rev when it re-lands.
https://hg.mozilla.org/mozilla-central/rev/fbe8067063a0
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla37 → ---
* * *
Bug 1120547 - Mark the overrides of nsIHttpChannelInternal::ContinueBeginConnect as override and rev the uuid of the interface; r=mmc
Attachment #8544714 - Attachment is obsolete: true
Comment on attachment 8548548 [details] [diff] [review]
Don't call Connect if the principal is on a local or remote blocklist (

Review of attachment 8548548 [details] [diff] [review]:
-----------------------------------------------------------------

Folded in Ehsan's uuid fix and trying again.
Attachment #8548548 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/25d9ba8ceb7a
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1122691
Depends on: 1128219
You need to log in before you can comment on or make changes to this bug.