Consider filling hash value
Categories
(Toolkit :: Safe Browsing, enhancement, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox68 | --- | fixed |
People
(Reporter: xeonchen, Assigned: dimi)
References
Details
Attachments
(4 files)
I'm working on Origin Telemetry, and the goal is to report the usage of each tracking protection rules. In the current desgin, we don't really understand the reason a resource is blocked, instead we only maintain the origin domain.
In [1] there are some rules for yandex with path included, increase the difficulty to compare the rule and the URI which we're connecting to, especially in ContentBlockingLog, which has only prepath stored.
If we can obtain the hash of the resource, it would make it much easier to compare with the table and make the telemetry more accurate.
| Assignee | ||
Updated•7 years ago
|
| Assignee | ||
Comment 1•7 years ago
|
||
Hi Ehsan,
I would like to hear your suggestion here.
The problem is that origin telemetry needs the hash matched in safe browsing database to map the blocked URL to the entry in our tracking list.
My question is, if trackingAnnotationcan code[1] can get the matched "hash", what is the best way to pass the information to ContentBlockingLog.
I spent some time checking related code, but can't figure out a clever way to do that(not sure if store the information in the channel is too messy), do you have any suggestion? thanks!
| Assignee | ||
Updated•7 years ago
|
| Assignee | ||
Comment 2•7 years ago
|
||
Discussed with Gary, the plan is to implement something like SetMatchInfo[1]
| Assignee | ||
Comment 3•7 years ago
|
||
Hi Gary,
Would you mind checking if this patch works for you? thanks!
| Reporter | ||
Comment 4•7 years ago
•
|
||
| Assignee | ||
Comment 5•7 years ago
|
||
(In reply to Gary Chen [:xeonchen] from comment #4)
Comment on attachment 9059303 [details] [diff] [review]
WIP patch for testing.patchReview of attachment 9059303 [details] [diff] [review]:
I tested by loading reddit.com, and during page loading it always hits an
assertion that the both list/hash are empty.
It's because sometimes |HttpChannelChild::SetMatchedTrackingInfo| is called
after |nsGlobalWindowOuter::NotifyContentBlockingEvent|We probably need some kind of synchronization mechanism.
This is not because |HttpChannelChild::SetMatchedTrackingInfo| is called after |nsGlobalWindowOuter::NotifyContentBlockingEvent|.
The assertion happened because |AntiTrackingCommon::NotifyBlockingDecision|[1] passes top-level window's channel(not a tracker), and in |nsGlobalWindowOuter::NotifyContentBlockingEvent|, we expect the channel has tracker's information.
I think what we need to do is to make sure |nsGlobalWindowOuter::NotifyContentBlockingEvent| is the right place to record since the "aChannel" parameter in the function is not always the "channel" marked as a tracker.
| Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Dimi Lee [:dimi][:dlee] from comment #5)
The assertion happened because |AntiTrackingCommon::NotifyBlockingDecision|[1] passes top-level window's channel(not a tracker), and in |nsGlobalWindowOuter::NotifyContentBlockingEvent|, we expect the channel has tracker's information.
I don't know if the behavior you mentioned is expected.
It seems to me that nsGlobalWindowOuter::NotifyContentBlockingEvent should be called with the corresponding channel that was blocked/unblocked.
I bet Ehsan knows more about it.
Comment 7•7 years ago
|
||
I'm not quite sure what the question being asked here is...
Going over the comments here, there seems to be an assumption around NotifyContentBlockingEvent being always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)
This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for STATE_COOKIES_LOADED we pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.
Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!
| Reporter | ||
Comment 8•7 years ago
|
||
(In reply to :Ehsan Akhgari from comment #7)
I'm not quite sure what the question being asked here is...
Going over the comments here, there seems to be an assumption around
NotifyContentBlockingEventbeing always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for
STATE_COOKIES_LOADEDwe pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!
This refers to my WIP patch in Attachment 9060025 [details] [diff], where the event code is always nsIWebProgressListener::STATE_COOKIES_BLOCKED_TRACKER.
Shouldn't aChannel the trakcer channel when aEvent is STATE_COOKIES_BLOCKED_TRACKER?
| Reporter | ||
Updated•7 years ago
|
Comment 9•7 years ago
|
||
(In reply to Gary Chen [:xeonchen] from comment #8)
(In reply to :Ehsan Akhgari from comment #7)
I'm not quite sure what the question being asked here is...
Going over the comments here, there seems to be an assumption around
NotifyContentBlockingEventbeing always called with channels belonging to a tracker. This is certainly not true. In fact, there is nothing guaranteeing that this function is called for anything that is related to the presence of trackers on the page whatsoever. (e.g. the function can get called if you're on foo.example which sets a single cookie and doesn't load any third-party resources.)This function can be called with many different content blocking codes: https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/dom/base/nsGlobalWindowOuter.cpp#5424. The channel argument passed to each one would be a different channel object, and it depends on the content blocking event code which exact channel object you'd receive here. For example, as comment 5 noted for
STATE_COOKIES_LOADEDwe pass the channel belonging to the top-level window. That is an example of a case where no trackers would be involved at all.Please let me know if this helps, and if not please give me some more information about what you need to know here. Thanks!
This refers to my WIP patch in Attachment 9060025 [details] [diff], where the event code is always
nsIWebProgressListener::STATE_COOKIES_BLOCKED_TRACKER.Shouldn't
aChannelthe trakcer channel whenaEventisSTATE_COOKIES_BLOCKED_TRACKER?
No, not necessarily, you need to look at the call sites to see what they pass in. :-) For example, see this call site: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#872. Here we pass the document channel for the parent window. This of course is another example: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#899.
Yet another example is https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/toolkit/components/antitracking/AntiTrackingCommon.cpp#1233 (where we notify here: https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/dom/base/nsContentUtils.cpp#8453)
I stopped looking at this point but pretty sure there are more examples.
If you explain what you're trying to do perhaps I can help suggest a better approach? I don't completely understand comment 0 nor your WIP patch...
| Assignee | ||
Comment 10•7 years ago
|
||
In origin telemetry, we want to record the matching statistic of each entry
in our tracking tables. To identify which entry a given URL matches, it needs
the hash value that matches the safe browsing database.
This patch passes the hash value to ProcessChannel so Features can obtain the
information and pass it.
Note that it is possible that an URL may find multiple matches. If an URL matches
hash A of list 1 and hash B of list 2, the parameter in ProcessChannel looks like:
aList = [list 1, list2]
aHashes = [hash A, hash B]
| Assignee | ||
Comment 11•7 years ago
|
||
This patch adds |setMatchedTrackingInfo| to channel to report matches that
are found in tracking annotation tables.
We have done something similar in nsIClassifiedChannel::setMatchedInfo to
report phishing protection matches.
| Assignee | ||
Comment 12•7 years ago
|
||
| Assignee | ||
Comment 13•7 years ago
•
|
||
(In reply to Dimi Lee [:dimi][:dlee] from comment #12)
The ongoing discussion here doesn't affect the way SafeBrowsing/UrlClassifier stores the information.
Submit the patch to review.
| Reporter | ||
Comment 14•7 years ago
|
||
(In reply to :Ehsan Akhgari from comment #9)
If you explain what you're trying to do perhaps I can help suggest a better approach? I don't completely understand comment 0 nor your WIP patch...
In bug 1539536, we tried to use prio to perform the origin telemetry.
The telemetry API requires an aOrigin argument that matches one of the predefined items.
For example, doubleclick.net or maps.google.com.
In ContentBlockingLog, the mOrigin keeps the nsIURI.prePath of the origin, so
- we don't know how to convert from
prePathto valid origin. We considered removing scheme and try eTLD+1 and eTLD+2, but it doesn't seem like a good solution. - There's no way to match rules containing path such as
yandex.ru/clck/click.
So, the WIP of bug 1544598 and this bug, we tried to use hashes that was matched by UrlClassifier instead of origins.
This will add an extra argument containing the hash to ContentBlockingLog::RecordLog, where the hash is queried from the nsIChannel here. And like you mentioned above, this channel isn't necessary to be the tracking resource.
I'm thinking alternative solutions here, which could be:
- save the hash in
ContentBlockingLogas aMaybe<nsCString>, and skip thoseLogEntrywithout hash. - find another way to link the channel and the entry in
ContentBlockingLog.
The first one seems eaiser, but I'm not sure about the correctness for the study, can you give some comments?
Comment 15•7 years ago
|
||
Thanks a lot for the explanation, now I understand the goal here.
Unfortunately this isn't something the code currently supports. Our reporting infrastructure (AntiTrackingCommon::NotifyBlockingDecision()) is only designed to attach reports to top-level pages right now, and we never had any requirements to keep data from a third-party tracking channel that could have resulted in a blocking decision around, so we now need to do the work of making this information available.
The task here is basically adding a new optional channel argument to AntiTrackingCommon::NotifyBlockingDecision(), let's say aTrackingChannel which would be the third-party tracking channel that was involved in making the blocking decision, if any. Now, NotifyBlockingDecision() is called in different places in the code where we determine something was blocked or unblocked for various reasons. In each place we need to see if a) the blocking had something to do with the channel being tracking, and b) which channel was tested to be a third-party tracking resource (the channel which we call IsThirdPartyTrackingResource() or equivalent on), and pipe that channel all the way down to where we call NotifyBlockingDecision and pass it as aTrackingChannel.
Once you have that channel there, you can then easily pass it to NotifyContentBlockingEvent and from there QI to nsIClassifiedChannel and grab the hash etc.
I hope this helps, let me know if you have any questions.
Comment 16•7 years ago
|
||
Comment 17•7 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/979eea4eabff
https://hg.mozilla.org/mozilla-central/rev/77ca645609fa
https://hg.mozilla.org/mozilla-central/rev/411778eb8daf
Description
•