Open Bug 1362149 Opened 6 years ago Updated 5 years ago

Add ability to ship and use default lists for the URL Classifier

Categories

(Toolkit :: Safe Browsing, enhancement, P5)

55 Branch
enhancement

Tracking

()

People

(Reporter: Felipe, Unassigned)

References

Details

Attachments

(5 files)

As part of the work to make Flash default to Click-to-Play (bug 1317856), we created lists to automatically allow/deny flash usage (bug 1307604), which the main goal is to help users to not see the "Ask to Activate" infobar for the case of 3rd-party ad-trackers that use Flash.

But the problem is that we can't reasonably set flash to Click-to-Play without having these lists already in place, because the user will see a lot of these infobars and have a bad experience, while also they will end up adding these sites to the "Allow and Remember" exceptions, while they shouldn't.

We need some new mechanism to ship a initial list built-in, while still allowing it to be updated. Tracking protection would probably also be interested on it. And if we get it, we can probably get rid of the manual set-up for the test domains that happens in SafeBrowsing.addMozEntries(), which would represent a perf win (see for example bug 779008 and bug 1337043).

Talking with François on #shavar one hacky idea that came up was to ship the db files and immediatelly copy them to the ~/.cache/{profile-path}/safebrowsing folder.

Another thing that I thought would be slightly better (if it works..) would be to have a read-time-only fallback that reads these files directly from the application folder, only if the cached folder doesn't exist. It would still write to the cache destination, so that the default list read from the application folder gets written to the cached folder, and then it continues to work/update as is.
(This is mostly speculation without knowing a lot about how the internals work)
Attached patch proof of conceptSplinter Review
This very simple proof-of-concept works! I'll try to make it into something usable soon
Assignee: nobody → felipc
Status: NEW → ASSIGNED
Blocks: 1365714
Hey Glandium, can you help me here? I'm trying this patch, what I expected to work, but it's not working.. I want to add a new "safebrowsing." directory under "defaults/", but it's not showing up.. Do you know if there's any special handling of this directory to make this not work?

If I change it to something else, e.g., changing

FINAL_TARGET_FILES.defaults.safebrowsing to FINAL_TARGET_FILES.foobar
and
@RESPATH@/defaults/safebrowsing/* to @RESPATH@/foobar/*

then this patch works as expected.
Attachment #8869098 - Flags: feedback?(mh+mozilla)
(also sent the same patch through MozReview if you prefer to read it there)
fwiw, I'm testing this on Mac. The expected file shows up correctly under obj-dir/dist/Nightly.app/Contents/Resources/defaults/safebrowsing/, but not on the .dmg version when I do a mach package
Summary: Ship built-in default lists for the URL Classifier → Add ability to ship and use default lists for the URL Classifier
Comment on attachment 8869288 [details]
Bug 1362149 - Minor code cleanup.

https://reviewboard.mozilla.org/r/140848/#review144412

Thanks for clean this up!

::: toolkit/components/url-classifier/HashStore.cpp:531
(Diff revision 1)
>                      sizeof(struct SubComplete) * mHeader.numSubCompletes -
>                      nsCheckSummedOutputStream::CHECKSUM_SIZE;
>  
>    nsCOMPtr<nsISeekableStream> seekable = do_QueryInterface(mInputStream);
>  
>    rv = seekable->Seek(nsISeekableStream::NS_SEEK_SET, offset);

require nsresult here now
Attachment #8869288 - Flags: review?(dlee) → review+
Hi Felipe,
We will have a Safe Browsing work week next week in Taipei and will discuss this topic at that time.
We want to make sure we think through everything, do you minding waiting for a while?
Comment on attachment 8869098 [details] [diff] [review]
ship-preloaded-lists

Review of attachment 8869098 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know what to tell you... I applied your patch, built, and got objdir/dist/bin/defaults/safebrowsing/felipe.test as a result...

::: browser/components/safebrowsing/preloaded-lists/moz.build
@@ +6,5 @@
> +
> +with Files("**"):
> +    BUG_COMPONENT = ("Toolkit", "Safe Browsing")
> +
> +DIST_SUBDIR = ''

I'd argue that if you want something outside dist/bin/browser/, it shouldn't be in browser/
Attachment #8869098 - Flags: feedback?(mh+mozilla)
Ah, missed comment 5... Of course it's not in the dmg... not directly. It's in omni.ja, like everything else. (Note that default/pref/channel-prefs.js is an exception)
(In reply to Mike Hommey [:glandium] from comment #11)
> Ah, missed comment 5... Of course it's not in the dmg... not directly. It's
> in omni.ja, like everything else. (Note that default/pref/channel-prefs.js
> is an exception)

Is there a way to make it go outside of omni.ja? For two reasons:
- these files will be rarely used and I don't want to incur their cost on omni.ja
- the code that uses them assumes a file in the filesystem and I'd need to make bigger changes to accomodate it inside omni.ja
Flags: needinfo?(mh+mozilla)
(In reply to Dimi Lee[:dimi][:dlee] from comment #9)
> Hi Felipe,
> We will have a Safe Browsing work week next week in Taipei and will discuss
> this topic at that time.
> We want to make sure we think through everything, do you minding waiting for
> a while?

Yeah, no problem. Just keep in mind that this is needed for Firefox 55 (to ship Flash as Click-to-Play, bug 1365714), so ideally I want to have this finished before the next merge.
(In reply to :Felipe Gomes (needinfo me!) from comment #12)
> (In reply to Mike Hommey [:glandium] from comment #11)
> > Ah, missed comment 5... Of course it's not in the dmg... not directly. It's
> > in omni.ja, like everything else. (Note that default/pref/channel-prefs.js
> > is an exception)
> 
> Is there a way to make it go outside of omni.ja? For two reasons:
> - these files will be rarely used and I don't want to incur their cost on
> omni.ja

What cost on omni.ja?

> - the code that uses them assumes a file in the filesystem and I'd need to
> make bigger changes to accomodate it inside omni.ja

Man, I really wish we just had one API to access files, that would be agnostic to whether it's on the FS or in resource:/// or chrome:/// etc.

Anyways, yes, there is a way to put things outside of omni.ja (see OmniJarSubFormatter.is_resource), but it should really be avoided.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8869284 [details]
Bug 1362149 - Let URL Classifier use a default fallback file from the app directory, when a cached file does not exist.

https://reviewboard.mozilla.org/r/140842/#review145432

We (Dimi, Francois, Henry and Thomas) discussed this "preloading" feature and we're generally happy with it. We'd like to restrict it to reduce the likelihood that it will cause problems.

The first thing this needs is a killswitch in case we need to disable it on release in a hotfix. Maybe a boolean like browser.safebrowsing.provider.mozilla.preloadLists = true. We would not set this on the google and google4 providers because we never want to preload these.

Secondly, in order to prevent possible DoS on the shavar server, we need to make sure that the list we're preloading is included in urlclassifier.disallow_completions.

Thirdly, we should restrict preloading to little endian machines. Big Endian machines should not use this as the bundled files are unlikely to load successfully there.

Finally, let's enable browser.safebrowsing.provider.mozilla.preloadLists = true only on Desktop so that it doesn't run on Fennec.

If you need help with any of these, let us know.
Attachment #8869284 - Flags: review-
Thanks for the feedback! I'll start working on these changes
(In reply to Mike Hommey [:glandium] from comment #14)
> (In reply to :Felipe Gomes (needinfo me!) from comment #12)
> > (In reply to Mike Hommey [:glandium] from comment #11)
> > > Ah, missed comment 5... Of course it's not in the dmg... not directly. It's
> > > in omni.ja, like everything else. (Note that default/pref/channel-prefs.js
> > > is an exception)
> > 
> > Is there a way to make it go outside of omni.ja? For two reasons:
> > - these files will be rarely used and I don't want to incur their cost on
> > omni.ja
> 
> What cost on omni.ja?

Since this file won't be read often, I want to avoid adding an extra 100kb in omni.ja to be processed at every startup

> 
> > - the code that uses them assumes a file in the filesystem and I'd need to
> > make bigger changes to accomodate it inside omni.ja
> 
> Man, I really wish we just had one API to access files, that would be
> agnostic to whether it's on the FS or in resource:/// or chrome:/// etc.

Indeed

> 
> Anyways, yes, there is a way to put things outside of omni.ja (see
> OmniJarSubFormatter.is_resource), but it should really be avoided.

Ok, thanks.

Btw, covering a previous point:

(In reply to Mike Hommey [:glandium] from comment #10)
> > +DIST_SUBDIR = ''
> 
> I'd argue that if you want something outside dist/bin/browser/, it shouldn't
> be in browser/

The way I was thinking about this, is that safebrowsing is a toolkit feature, but it's the browser app that is choosing to ship this preloaded lists. This has the benefit of automatically not shipping it to Android.
(In reply to :Felipe Gomes (needinfo me!) from comment #17)
> (In reply to Mike Hommey [:glandium] from comment #14)
> > (In reply to :Felipe Gomes (needinfo me!) from comment #12)
> > > (In reply to Mike Hommey [:glandium] from comment #11)
> > > > Ah, missed comment 5... Of course it's not in the dmg... not directly. It's
> > > > in omni.ja, like everything else. (Note that default/pref/channel-prefs.js
> > > > is an exception)
> > > 
> > > Is there a way to make it go outside of omni.ja? For two reasons:
> > > - these files will be rarely used and I don't want to incur their cost on
> > > omni.ja
> > 
> > What cost on omni.ja?
> 
> Since this file won't be read often, I want to avoid adding an extra 100kb
> in omni.ja to be processed at every startup

All files in omni.ja aren't processed at every startup. I don't know where you got that from. Only the files used during startup are processed during startup. There is no inherent cost to adding files in there.

> Btw, covering a previous point:
> 
> (In reply to Mike Hommey [:glandium] from comment #10)
> > > +DIST_SUBDIR = ''
> > 
> > I'd argue that if you want something outside dist/bin/browser/, it shouldn't
> > be in browser/
> 
> The way I was thinking about this, is that safebrowsing is a toolkit
> feature, but it's the browser app that is choosing to ship this preloaded
> lists. This has the benefit of automatically not shipping it to Android.

You're making it sound like this should all be in resource://app/ ;)
Comment on attachment 8869284 [details]
Bug 1362149 - Let URL Classifier use a default fallback file from the app directory, when a cached file does not exist.

https://reviewboard.mozilla.org/r/140842/#review145740
Attachment #8869284 - Flags: review?(dlee)
Sorry for jumping in. We are having a small safebrowsing workweek in Taipei right now.

We just came out with a very simple solution to avoid changing the load/update
code path at all. The solution is to just copy the pre-loaded lists to .cache
directory without overriding the exiting ones then removed the preloaded one.

This significantly decrease the complexity but retain the feature that this
bug is trying to address.

Felipc, what do you think :)
Flags: needinfo?(felipc)
(In reply to Henry Chang [:hchang] from comment #20)
> This significantly decrease the complexity but retain the feature that this
> bug is trying to address.

It will also eliminate a few corner cases and make it more likely that we can support incremental updates in the future.
(In reply to Henry Chang [:hchang] from comment #20)
> Sorry for jumping in. We are having a small safebrowsing workweek in Taipei
> right now.
> 
> We just came out with a very simple solution to avoid changing the
> load/update
> code path at all. The solution is to just copy the pre-loaded lists to .cache
> directory without overriding the exiting ones then removed the preloaded one.
> 
> This significantly decrease the complexity but retain the feature that this
> bug is trying to address.
> 
> Felipc, what do you think :)

Hmm I don't know, that sounds to me way more complex than this one. Here it's just one extra file read from a fallback location, if one in the .cache doesn't exist. This other option would need to do the same check "does a file exist in .cache?", and then perform a file copy and do the read again. It seems it strictly more work.. But I don't have other context to know what sorts of corner cases there are or how this adds complexity..

One thing I'll mention is that that would add a file write during startup, which is likely to be bad for performance. I was actually hope to enable a significant performance win with this here by entirely removing the addMozEntries() function, which is known to be a perf problem (bug 779008)
Flags: needinfo?(felipc)
Sorry for the late reply, there was a national holiday.

(In reply to :Felipe Gomes (needinfo me!) from comment #22)
> Hmm I don't know, that sounds to me way more complex than this one. Here
> it's just one extra file read from a fallback location, if one in the .cache
> doesn't exist. This other option would need to do the same check "does a
> file exist in .cache?", and then perform a file copy and do the read again.
> It seems it strictly more work.. But I don't have other context to know what
> sorts of corner cases there are or how this adds complexity..
> 

One case we have discussed in WW is that if something goes wrong with the data in fallback directory, for example, .pset and .sbstore files are not sync, we'll detect this while update and treat it as a failure. A fail update will reset files in 'cache' directory but not files in 'fallback' directory, in this case, we will end up with being not able to update because corrupt files never get deleted.

> One thing I'll mention is that that would add a file write during startup,
> which is likely to be bad for performance. I was actually hope to enable a

I thought we should use "move" instead of "copy".
"copy" doesn't solve the issue described above because next time we will still get files from fallback directory (files in cache directory are deleted and fallback directory still has data).

"move" has a problem that we can only use it once, if somehow files in 'cache' directory are deleted then we will not load data from pre-load files (in fallback directory) next time. I am not sure if this is acceptable but at least this prevents a potential problem that Safebrowsing data gets polluted from files never get deleted.

Also, the file operation mentioned here should be done by worker thread, so main thread won't be blocked.

> significant performance win with this here by entirely removing the
> addMozEntries() function, which is known to be a perf problem (bug 779008)

Hi francois,
I remember we have discussed this in WW and you said that we may not get rid of addMozEntries, is that correct? do you still remember the detail?
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #23)
> "copy" doesn't solve the issue described above because next time we will
> still get files from fallback directory (files in cache directory are
> deleted and fallback directory still has data).
> 

Sorry I think i didn't make this clear enough. I'll try to summarize again.

When .pset and .sbstore files in fallback directory are not sync:

The read data from fallback directory method will cause update always fail since HashStore and LookupCache always get corrupt data from fallback directory.

The 'copy' pre-loaded files at startup method only affects the first update after browser launch, it fails at the first time and the next update (without relaunching) we will do a fresh update since HashStore and LookupCache is empty.

The problem of 'copy method' is if we don't have a chance to do a second update after the first one fail. If we only update once and then close browser before next update, the next time browser is opened we will still get data from corrupt pre-loaded files.
(In reply to Dimi Lee[:dimi][:dlee] from comment #23)
> I remember we have discussed this in WW and you said that we may not get rid
> of addMozEntries, is that correct? do you still remember the detail?

In order to generate the test-*-simple files that we want to bundle, we would most likely run a copy of Firefox which calls `addMozEntries` on a few hard-coded URLs.

Also, as per comment 15, not every build of Firefox (e.g. ARM builds) will be able to take advantage of preloaded lists.
Flags: needinfo?(francois)
No longer blocks: 1365714
We've dropped this bug as a blocker to ship Flash as CTP because the license from the original Disconnect.me list does not allow it to be bundled with Firefox.

So I won't be working more on this bug in the near future, but I still think it would be a nice thing to have, specially if it would allow us to get rid of addMozEntries() somehow.
Assignee: felipc → nobody
Status: ASSIGNED → NEW
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.