Closed Bug 1459037 Opened 6 years ago Closed 6 years ago

Break Disconnect category domains into the appropriate category blocklists

Categories

(Cloud Services :: Server: Shavar, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: esawin, Assigned: francois)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [geckoview:klar:p1])

In bug 1425075 we've split the base-track-digest256 blocklist by tracker categories (ads, social, analytic, content).

However, we haven't exposed the "Disconnect" category list individually, which contains URLs that are not duplicated in any other category's list (see https://bugzilla.mozilla.org/show_bug.cgi?id=1425075#c25).

There are two ways to solve this:
1. Distribute "Disconnect" URLs to the corresponding semantic lists (ads, social, analytic, content).
2. Expose "Disconnect" as an individual blocklist, with internal name disconnect-track-digest256.

The advantage of #1 is that no Gecko/GeckoView code needs to change and no unrelated trackers would be blocked.
However, I assume it would be not trivial and involve manual mapping to support.

Supporting #2 in shavar should be trivial. In GeckoView, I don't think it would make sense to expose a Disconnect category, so we would default to blocking Disconnect trackers if any of the ads, social, analytic or content tracker options are selected.
Do we agree on the summary and are we OK with going with solution #2?
Flags: needinfo?(lcrouch)
Flags: needinfo?(francois)
It seems to me that #1 would be ideal since the "Disconnect" category doesn't really make sense to users. The manual mapping is unfortunate, but it's also unlikely to change very often.

If I'm not mistaken, that's what Focus and iOS already do.
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #2)
> It seems to me that #1 would be ideal since the "Disconnect" category
> doesn't really make sense to users. The manual mapping is unfortunate, but
> it's also unlikely to change very often.
> 
> If I'm not mistaken, that's what Focus and iOS already do.

I would love to take solution #1, but I'm not the one who would have to maintain that :)
Solution #2 would be acceptable for GeckoView since we would just err on the side of blocking too much without exposing the Disconnect category in the API.
We need to bump priority here as we need this for Klar/Focus.

Can we proceed with option 2?
Flags: needinfo?(francois)
Priority: -- → P1
Whiteboard: [geckoview:klar]
Whiteboard: [geckoview:klar] → [geckoview:klar:p1]
Luke, what did you do with the Disconnect category when you split the lists in the list creation script?

Can we fix it so that it's distributed in the same way as Klar/Focus does it today? (i.e. Option 1 from comment 0)
Flags: needinfo?(francois)
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #4)
> Can we proceed with option 2?

Note that this option will take a lot longer (i.e. a few weeks) since it require a stage run-through and a new deploy or config update.

As far as I know, Option 1 only requires modifying the list creation script. Luke can correct me if I'm wrong.
Sebastian, do you know how Focus iOS maps the entries from the Disconnect category into Advertising/Social/Analytics? Is there a manual mapping somewhere?
Flags: needinfo?(s.kaspari)
If I can find the mappings that Klar/Focus is currently using for Disconnect->{categories} entries, I can re-create that on shavar within an hour or so.

:davidb - do you know how Focus iOS maps the entries from Disconnection into the other categories?
Flags: needinfo?(lcrouch)
Flags: needinfo?(dbolter)
No idea... Stefan do you know?
Flags: needinfo?(dbolter) → needinfo?(sarentz)
Since :francois and I review the changes to the shavar list contents, we can check the manual mappings.

:francois - which would you like to do: codify this mapping into the lists2safebrowsing.py script, or should we simply modify the list entries in the shavar-prod-lists repo?
Flags: needinfo?(francois)
Thanks for the links Eugen.

(In reply to Luke Crouch [:groovecoder] from comment #11)
> :francois - which would you like to do: codify this mapping into the
> lists2safebrowsing.py script, or should we simply modify the list entries in
> the shavar-prod-lists repo?

We should codify the mapping in the list creation script. We don't want to diverge from the upstream Disconnect list.

I'd suggest moving the https://github.com/mozilla-mobile/focus-android/blob/5868770104a8645c31b086a5e9916f4ddb507cb3/shavar-prod-lists/google_mapping.json script into shavar-prod-lists and then creating twitter_mapping.json:

{
  "categories": {
    "Social": [
      {
        "Twitter": {
          "https://twitter.com/": [
            "backtype.com",
            "crashlytics.com",
            "tweetdeck.com",
            "twimg.com",
            "twitter.com",
            "twitter.jp"
          ]
        }
      }
    ]
  }
}

and then facebook_mapping.json:

{
  "categories": {
    "Social": [
      {
        "Facebook": {
          "http://www.facebook.com/": [
            "facebook.com",
            "facebook.de",
            "facebook.fr",
            "facebook.net",
            "fb.com",
            "atlassolutions.com",
            "friendfeed.com"
          ]
        }
      }
    ]
  }
}

We can move things around later if we want to categorize Facebook Analytics differently (or maybe Twitter's crashlytics.com), but that's not urgent. Going with the existing Focus behavior is a good start.
Flags: needinfo?(francois)
I think we have enough info to move forward with the mapping approach or is there something I can help with?
Flags: needinfo?(lcrouch)
Yup, I'm going to start the mapping work today, thanks everyone! I'm using the work today to on-board a new contractor who will also be helping with shavar + TP list work.
Flags: needinfo?(lcrouch)
Looking thru the code, it will be easier to use a mapping file format like so:

{
  "facebook.com": "Social",
  "facebook.de": "Social",
  ...
  ...
  "google-analytics.com": "Analytics",
  ...
  ...
  "mail.google.com": "Social"
}

That would make it a 5-10 line code change in the lists2safebrowsing.py script.

If that works for everyone, I'll start a new disconnect_mapping.json file in the shavar-prod-lists repo and we can pull that for the mapping when we pull the block-list.

Sound good?
Flags: needinfo?(francois)
Flags: needinfo?(esawin)
Sounds good to me.
Flags: needinfo?(esawin)
Eugen, after the new category mappings are live in the shavar list, do we need to make any GeckoView or Klar code changes to use the new categories?
Flags: needinfo?(esawin)
(In reply to Luke Crouch [:groovecoder] from comment #15)
> If that works for everyone, I'll start a new disconnect_mapping.json file in
> the shavar-prod-lists repo and we can pull that for the mapping when we pull
> the block-list.
> 
> Sound good?

Works for me too. Hopefully the iOS browsers can pull from that too and so we have a consistent mapping across products.
Flags: needinfo?(francois)
(In reply to Chris Peterson [:cpeterson] from comment #17)
> Eugen, after the new category mappings are live in the shavar list, do we
> need to make any GeckoView or Klar code changes to use the new categories?

With the mapping being done in shavar, we don't need any changes in the clients. The Disconnect category entries will be distributed to the existing Advertisement, Social, Analytics and Content categories.
Flags: needinfo?(esawin)
Summary: Break out Disconnect category into its own blocklist → Break Disconnect category domains into the appropriate category blocklists
https://github.com/mozilla-services/shavar-list-creation/pull/54 contains the code for this.

When it's merged, we will need to update stage.ini in https://github.com/mozilla-services/shavar-list-creation-config/ to include ",Disconnect" in the disconnect_categories.
Assigning to get this of our GeckoView triage list. Thanks Luke!
Assignee: nobody → lcrouch
If something needs to happen in Firefox for iOS, can someone please open a new bug with a clear description. I am reading through this bug but I am having a hard time understanding if there is something actionable for my team.
Flags: needinfo?(sarentz)
Blocks: 1467609
(In reply to Stefan Arentz [:st3fan] from comment #22)
> If something needs to happen in Firefox for iOS, can someone please open a
> new bug with a clear description. I am reading through this bug but I am
> having a hard time understanding if there is something actionable for my
> team.

I filed bug 1467609.
Flags: needinfo?(s.kaspari)
(In reply to Luke Crouch [:groovecoder] [on leave until July 16] from comment #20)
> https://github.com/mozilla-services/shavar-list-creation/pull/54 contains
> the code for this.
> 
> When it's merged, we will need to update stage.ini in
> https://github.com/mozilla-services/shavar-list-creation-config/ to include
> ",Disconnect" in the disconnect_categories.

Looks like Luke is on leave, can someone else merge this?
Flags: needinfo?(francois)
We're talking about it on #shavar right now.
Flags: needinfo?(francois)
Commits pushed to master at https://github.com/mozilla-services/shavar-list-creation-config

https://github.com/mozilla-services/shavar-list-creation-config/commit/14924adb2e494ecfe300d70570ec621d4ada2488
Bug 1459037 - Add Disconnect category to ads, analytics and social

This change requires matching changes in the list creation script:

  https://github.com/mozilla-services/shavar-list-creation/pull/54

https://github.com/mozilla-services/shavar-list-creation-config/commit/64eedc8c7977307545955405c7114669e74ddf1c
Merge pull request #29 from fmarier/update-for-bug1459037

Bug 1459037 - Add Disconnect category to ads, analytics and social
It's live on stage now.

If you set this in about:config:

  browser.safebrowsing.provider.mozilla.updateURL = https://shavar.stage.mozaws.net/downloads?client=SAFEBROWSING_ID&appver=%VERSION%&pver=2.2

the Disconnect entries should now be in their respective categories. Can you confirm that it works?
Flags: needinfo?(esawin)
Depends on: 1469699
(In reply to François Marier [:francois] from comment #27)
> It's live on stage now.
> 
> If you set this in about:config:
> 
>   browser.safebrowsing.provider.mozilla.updateURL =
> https://shavar.stage.mozaws.net/
> downloads?client=SAFEBROWSING_ID&appver=%VERSION%&pver=2.2
> 
> the Disconnect entries should now be in their respective categories. Can you
> confirm that it works?

I have done some local testing and it looks good, e.g., facebook.net and doubleclick.net are blocked in the appropriate categories.
Flags: needinfo?(esawin)
Using my new test page for the Disconnect mappings: https://mozilla.github.io/tracking-test/disconnect.html

I tested that it fails to work on prod but that it works fine on stage (using the updateURL from comment 27).
Assignee: lcrouch → francois
Commit pushed to master at https://github.com/mozilla-services/shavar-list-creation

https://github.com/mozilla-services/shavar-list-creation/commit/efc5808ff57e2cfcb053141bfb3cb409a0574721
Merge pull request #54 from mozilla-services/disconnect-mapping

fix bug #1459037: map Disconnect category domains into their other categories
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
(In reply to Luke Crouch [:groovecoder] [on leave until July 16] from comment #20)
> https://github.com/mozilla-services/shavar-list-creation/pull/54 contains
> the code for this.
> 
> When it's merged, we will need to update stage.ini in
> https://github.com/mozilla-services/shavar-list-creation-config/ to include
> ",Disconnect" in the disconnect_categories.

https://github.com/mozilla-services/shavar-list-creation-config/pull/31 is the prod.ini
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This has landed in production and the numbers look good: https://github.com/mozilla-services/shavar-list-creation-config/pull/31#issuecomment-398879275

I also manually tested each category using https://mozilla.github.io/tracking-test/disconnect.html.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.