Closed Bug 1545033 Opened 4 months ago Closed 4 months ago

Consider filling hash value

Categories

(Toolkit :: Safe Browsing, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: xeonchen, Assigned: dimi)

References

Details

Attachments

(4 files)

I'm working on Origin Telemetry, and the goal is to report the usage of each tracking protection rules. In the current desgin, we don't really understand the reason a resource is blocked, instead we only maintain the origin domain.
In [1] there are some rules for yandex with path included, increase the difficulty to compare the rule and the URI which we're connecting to, especially in ContentBlockingLog, which has only prepath stored.

If we can obtain the hash of the resource, it would make it much easier to compare with the table and make the telemetry more accurate.

[1] https://searchfox.org/mozilla-central/rev/d33d470140ce3f9426af523eaa8ecfa83476c806/toolkit/components/telemetry/core/TelemetryOriginData.inc

Flags: needinfo?(dlee)
Type: defect → enhancement
Priority: -- → P1

Hi Ehsan,
I would like to hear your suggestion here.
The problem is that origin telemetry needs the hash matched in safe browsing database to map the blocked URL to the entry in our tracking list.
My question is, if trackingAnnotationcan code[1] can get the matched "hash", what is the best way to pass the information to ContentBlockingLog.
I spent some time checking related code, but can't figure out a clever way to do that(not sure if store the information in the channel is too messy), do you have any suggestion? thanks!

[1] https://searchfox.org/mozilla-central/rev/ee3905439acbf81e9c829ece0b46d09d2fa26c5c/netwerk/url-classifier/UrlClassifierFeatureTrackingAnnotation.cpp#118

Flags: needinfo?(dlee) → needinfo?(ehsan)
Assignee: nobody → dlee
Status: NEW → ASSIGNED

Discussed with Gary, the plan is to implement something like SetMatchInfo[1]

[1] https://searchfox.org/mozilla-central/rev/ee3905439acbf81e9c829ece0b46d09d2fa26c5c/netwerk/base/nsIClassifiedChannel.idl#28

Flags: needinfo?(ehsan)

Hi Gary,
Would you mind checking if this patch works for you? thanks!

Comment on attachment 9059303 [details] [diff] [review]
WIP patch for testing.patch

Review of attachment 9059303 [details] [diff] [review]:
-----------------------------------------------------------------

I tested by loading reddit.com, and during page loading it always hits an assertion that the both list/hash are empty.
It's because sometimes |HttpChannelChild::SetMatchedTrackingInfo| is called after |nsGlobalWindowOuter::NotifyContentBlockingEvent|

We probably need some kind of synchronization mechanism.

(In reply to Gary Chen [:xeonchen] from comment #4)

Comment on attachment 9059303 [details] [diff] [review]
WIP patch for testing.patch

Review of attachment 9059303 [details] [diff] [review]:

I tested by loading reddit.com, and during page loading it always hits an
assertion that the both list/hash are empty.
It's because sometimes |HttpChannelChild::SetMatchedTrackingInfo| is called
after |nsGlobalWindowOuter::NotifyContentBlockingEvent|

We probably need some kind of synchronization mechanism.

This is not because |HttpChannelChild::SetMatchedTrackingInfo| is called after |nsGlobalWindowOuter::NotifyContentBlockingEvent|.

The assertion happened because |AntiTrackingCommon::NotifyBlockingDecision|[1] passes top-level window's channel(not a tracker), and in |nsGlobalWindowOuter::NotifyContentBlockingEvent|, we expect the channel has tracker's information.

I think what we need to do is to make sure |nsGlobalWindowOuter::NotifyContentBlockingEvent| is the right place to record since the "aChannel" parameter in the function is not always the "channel" marked as a tracker.

[1] https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/toolkit/components/antitracking/AntiTrackingCommon.cpp#1789

(In reply to Dimi Lee [:dimi][:dlee] from comment #5)

The assertion happened because |AntiTrackingCommon::NotifyBlockingDecision|[1] passes top-level window's channel(not a tracker), and in |nsGlobalWindowOuter::NotifyContentBlockingEvent|, we expect the channel has tracker's information.

[1] https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/toolkit/components/antitracking/AntiTrackingCommon.cpp#1789

I don't know if the behavior you mentioned is expected.
It seems to me that nsGlobalWindowOuter::NotifyContentBlockingEvent should be called with the corresponding channel that was blocked/unblocked.

I bet Ehsan knows more about it.

Flags: needinfo?(ehsan)

I'm not quite sure what the question being asked here is...

Going over the comments here, there seems to be an assumption around NotifyContentBlockingEvent being always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)

This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for STATE_COOKIES_LOADED we pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.

Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!

Flags: needinfo?(ehsan)

(In reply to :Ehsan Akhgari from comment #7)

I'm not quite sure what the question being asked here is...

Going over the comments here, there seems to be an assumption around NotifyContentBlockingEvent being always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)

This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for STATE_COOKIES_LOADED we pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.

Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!

This refers to my WIP patch in Attachment 9060025 [details] [diff], where the event code is always nsIWebProgressListener::STATE_COOKIES_BLOCKED_TRACKER.

Shouldn't aChannel the trakcer channel when aEvent is STATE_COOKIES_BLOCKED_TRACKER?

Flags: needinfo?(ehsan)

(In reply to Gary Chen [:xeonchen] from comment #8)

(In reply to :Ehsan Akhgari from comment #7)

I'm not quite sure what the question being asked here is...

Going over the comments here, there seems to be an assumption around NotifyContentBlockingEvent being always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)

This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for STATE_COOKIES_LOADED we pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.

Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!

This refers to my WIP patch in Attachment 9060025 [details] [diff], where the event code is always nsIWebProgressListener::STATE_COOKIES_BLOCKED_TRACKER.

Shouldn't aChannel the trakcer channel when aEvent is STATE_COOKIES_BLOCKED_TRACKER?

No, not necessarily, you need to look at the call sites to see what they pass in. :-) For example, see this call site: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#872. Here we pass the document channel for the parent window. This of course is another example: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#899.

Yet another example is https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#1233 (where we notify here: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/dom/base/nsContentUtils.cpp#8453)

I stopped looking at this point but pretty sure there are more examples.

If you explain what you're trying to do perhaps I can help suggest a better approach? I don't completely understand comment 0 nor your WIP patch...

Flags: needinfo?(ehsan)

In origin telemetry, we want to record the matching statistic of each entry
in our tracking tables. To identify which entry a given URL matches, it needs
the hash value that matches the safe browsing database.

This patch passes the hash value to ProcessChannel so Features can obtain the
information and pass it.

Note that it is possible that an URL may find multiple matches. If an URL matches
hash A of list 1 and hash B of list 2, the parameter in ProcessChannel looks like:
aList = [list 1, list2]
aHashes = [hash A, hash B]

This patch adds |setMatchedTrackingInfo| to channel to report matches that
are found in tracking annotation tables.

We have done something similar in nsIClassifiedChannel::setMatchedInfo to
report phishing protection matches.

(In reply to Dimi Lee [:dimi][:dlee] from comment #12)

The ongoing discussion here doesn't affect the way SafeBrowsing/UrlClassifier stores the information.
Submit the patch to review.

(In reply to :Ehsan Akhgari from comment #9)

If you explain what you're trying to do perhaps I can help suggest a better approach? I don't completely understand comment 0 nor your WIP patch...

In bug 1539536, we tried to use prio to perform the origin telemetry.
The telemetry API requires an aOrigin argument that matches one of the predefined items.
For example, doubleclick.net or maps.google.com.
In ContentBlockingLog, the mOrigin keeps the nsIURI.prePath of the origin, so

  1. we don't know how to convert from prePath to valid origin. We considered removing scheme and try eTLD+1 and eTLD+2, but it doesn't seem like a good solution.
  2. There's no way to match rules containing path such as yandex.ru/clck/click.

So, the WIP of bug 1544598 and this bug, we tried to use hashes that was matched by UrlClassifier instead of origins.
This will add an extra argument containing the hash to ContentBlockingLog::RecordLog, where the hash is queried from the nsIChannel here. And like you mentioned above, this channel isn't necessary to be the tracking resource.

I'm thinking alternative solutions here, which could be:

  1. save the hash in ContentBlockingLog as a Maybe<nsCString>, and skip those LogEntry without hash.
  2. find another way to link the channel and the entry in ContentBlockingLog.

The first one seems eaiser, but I'm not sure about the correctness for the study, can you give some comments?

Flags: needinfo?(ehsan)

Thanks a lot for the explanation, now I understand the goal here.

Unfortunately this isn't something the code currently supports. Our reporting infrastructure (AntiTrackingCommon::NotifyBlockingDecision()) is only designed to attach reports to top-level pages right now, and we never had any requirements to keep data from a third-party tracking channel that could have resulted in a blocking decision around, so we now need to do the work of making this information available.

The task here is basically adding a new optional channel argument to AntiTrackingCommon::NotifyBlockingDecision(), let's say aTrackingChannel which would be the third-party tracking channel that was involved in making the blocking decision, if any. Now, NotifyBlockingDecision() is called in different places in the code where we determine something was blocked or unblocked for various reasons. In each place we need to see if a) the blocking had something to do with the channel being tracking, and b) which channel was tested to be a third-party tracking resource (the channel which we call IsThirdPartyTrackingResource() or equivalent on), and pipe that channel all the way down to where we call NotifyBlockingDecision and pass it as aTrackingChannel.

Once you have that channel there, you can then easily pass it to NotifyContentBlockingEvent and from there QI to nsIClassifiedChannel and grab the hash etc.

I hope this helps, let me know if you have any questions.

Flags: needinfo?(ehsan)
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/979eea4eabff
P1. Pass matched hash values to ProcessChannel. r=baku
https://hg.mozilla.org/integration/autoland/rev/77ca645609fa
P2. Add SetMatchedTrackingInfo in nsIClassifiedChannel. r=baku
https://hg.mozilla.org/integration/autoland/rev/411778eb8daf
P3. SetTrackingInfo in UrlClassifierFeatureTrackingAnnotation. r=baku
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Blocks: 1548486
Depends on: 1548517
You need to log in before you can comment on or make changes to this bug.