Closed Bug 842828 Opened 11 years ago Closed 11 years ago

Suppress application reputation checks for binaries already found in malware/whitelists

Categories

(Toolkit :: Downloads API, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: mmc, Assigned: mmc)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed)

Attachments

(3 files, 5 obsolete files)

In Chrome, before the target file is set, there's a call to check locally if the download already has a verdict (good or bad), before calling out to the application reputation API.

http://code.google.com/searchframe#OAMlx_jo-ck/src/chrome/browser/download/chrome_download_manager_delegate.cc&q=checkdownloadurl%20file:%5Esrc/chrome/browser/&l=222

We should do the same thing. This also involves changing the safebrowsing code to download the allowlists and blocklists that Chrome is using.
It looks like we start by adding goog-badbinurl-shavar and goog-downloadwhite-digest256 to the prefs: http://mxr.mozilla.org/mozilla-central/source/browser/app/profile/firefox.js#766


765 // The list of tables that use the gethash request to confirm partial results.
766 pref("urlclassifier.gethashtables", "goog-phish-shavar,goog-malware-shavar");
Assignee: nobody → mmc
I chatted with gcp about this last week. There are currently 2 tables used for safebrowsing local lookups, the application reputation stuff would add 2 more, and there's no mechanism for restricting lookups to the set of tables that are relevant. This is probably a mistake, since we do safebrowsing checks for nearly every http load.

There are 2 classification idls, nsIURIClassifier (which takes an nsIURIClassifierCallback) and nsIUrlClassifierDBService (which takes an nsIUrlClassifierCallback)

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsIUrlClassifierDBService.idl#88
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIURIClassifier.idl#50

The second one is actually used by the nsChannelClassifier (nsBaseChannel, nsHttpChannel), the first one seems to be used mostly for the DB update interfaces.

To my mind the second one is the better interface because it hides the need to know table names from the caller, and just returns a phishing or malware verdict.

This patch extends nsIURIClassifier to include ClassifyDownload, which requires all calls that trickle down into Classifier::Check to include an isDownload param. We could also extend nsIPrincipal to include this, but I think that's not a good place (we seem to be only using nsIPrincipal to stuff a URI into anyway). We could also extend nsIUrlClassifierDBService to include LookupDownload, which would require the same changes to Classify::Check and friends and also require the caller to parse the table names, which seems suboptimal.

This patch does not include parsing updates for the new sha256 format and therefore doesn't include the test changes that we talked about.

gcp, what do you think about this approach?
Attachment #789261 - Flags: feedback?(paolo.mozmail)
Attachment #789261 - Flags: feedback?(paolo.mozmail) → feedback?(gpascutto)
Before reviewing this, a question/remark:
goog-badbinurl-shavar
goog-downloadwhite-digest256

Now, the first one seems similar enough to the current format. But the second one is going to check a SHA256 digest of the download instead of an URL, right? So that one actually needs specific, different handling in Classifier & co that does not try to split the URL and generate all the subhashes to check. (It may also need specific update handling - I haven't seen that part of the spec)

I would recommend we deal with that first. Not checking the normal tables when passing a download URL (i.e. not checking "phish-shavar" "malware-shavar" when we check "goog-badbinurl-shavar") is a nice optimization, but I don't think it's the best approach to try to implement that before the basic functionality is implemented, which means dealing with the "digest256" table.
Comment on attachment 789261 [details] [diff] [review]
Add ClassifyDownload to nsIURIClassifier

Review of attachment 789261 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/Classifier.cpp
@@ +29,5 @@
>  #define TO_DELETE_DIR_SUFFIX NS_LITERAL_CSTRING("-to_delete")
>  #define BACKUP_DIR_SUFFIX    NS_LITERAL_CSTRING("-backup")
>  
> +// Are these safe to define? Does Thunderbird or other clients use different
> +// table names? No, order matters. God, this is dumb.

Iceweasel and co do actually use different names, so this won't work. Aside from being evil.

If you're going to implement it this way, I'd filter on containing "malware" "phish" "badbinurl" "downloadwhite". 

What seems preferable, though, would be to split up the preference (urlclassifier.gethashtables) which contains the actual tablenames into 4 new, seperate ones. So you can obtain the current names from there instead, and fix the crazyness we have in the current code that the order in which you list the tables in urlclassifier.gethashtables matters for their meaning.

@@ +220,1 @@
>  

Note that "isDownload" probably also implies that we have to treat "aSpec" in a totally different manner, i.e. the "check each lookup fragment against the entries in the DB" probably doesn't apply at all.

@@ +463,5 @@
> +    } else if (!isDownload &&
> +               (aTables[i].Equals(NS_LITERAL_CSTRING(DOWNLOADS_MALWARE)) ||
> +                aTables[i].Equals(NS_LITERAL_CSTRING(DOWNLOADS_WHITELIST)))) {
> +      // Download tables don't apply to non-downloads.
> +      aTables.RemoveElementAt(i);

It may be more efficient to construct a new list with the correct elements, given that RemoveElementAt on an nsTArray probably causes the entire array to get copied several times.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1047,5 @@
>    tables.BeginReading(begin);
>    tables.EndReading(end);
> +  if (!mIsDownload) {
> +    if (mCheckMalware &&
> +        FindInReadable(NS_LITERAL_CSTRING("-malware-"), begin, end)) {

See the first remark, if you go this way then put these strings as a constant on top of the file.

Or avoid hardcoding this and split the preference.
Attachment #789261 - Flags: feedback?(gpascutto) → feedback-
To clarify what I mean:

The current patch pretends that we have:

*-badbinurl-shavar
*-downloadwhite-digest256
vs
*-phish-shavar
*-malware-shavar

when in reality from the POV of how the code has to work, we have:

*-downloadwhite-digest256
vs
*-badbinurl-shavar
*-phish-shavar
*-malware-shavar

And eliminating an extra "badbinurl" or "malware+phising" check in some circumstances is a nice extra. But the "isDownload" doesn't look like it allows you to do that, because you still need a way to tell if the aSpec being passed is an URL or a SHA256 hash of a binary.

Ergo, deal with the "digest256" table first and then see how to do the optimization, may be as simple as changing the bool into an enum.
No longer blocks: 904607
Depends on: 904607
Attachment #789261 - Attachment is obsolete: true
Attachment #806942 - Attachment is obsolete: true
Comment on attachment 807435 [details] [diff] [review]
Check local list to suppress remote lookups (

Hi Paolo and Gian-Carlo,

This patch integrates the nsIUrlClassifierDbService.lookup check with the ApplicationReputationService. gcp, after looking at the lookup code I decided it was efficient enough to leave nsIUrlClassifierDbService.idl instead of adding a separate lookup(download=true) flag, or a lookupDownload API. What do you think? I can also break up the separate lists into separate flags instead of a single flag for all the tables, if that's what you think is best.

Paolo, I ended up restructuring the code in ApplicationReputation.cpp not to have a native implementation of nsIApplicationReputationQuery. Instead all of the callback code for nsIStreamListener and nsIUrlClassifierCallback is encapsulated in a private class, PendingLookup, that the application reputation service keeps track in a pending lookup array. I wasn't sure the best way to handle memory cleanups, suggestions welcome.

Thanks,
Monica
Attachment #807435 - Flags: review?(paolo.mozmail)
Attachment #807435 - Flags: feedback?(gpascutto)
>after looking at the lookup code I decided it was efficient enough to leave nsIUrlClassifierDbService.idl instead of adding a separate lookup(download=true) flag, or a lookupDownload API. What do you think

Don't you need a way to signal whether you're passing a SHA256 of a downloaded binary, or an URL of the site where you downloaded from (the latter being chopped up and every partial URL being SHA256ed and tested against the DB)?
Also, can you include your entire current patch stack? Just don't ask review on patches you think are not ready for that. Seeing all the modified code makes understanding this easier designwise, IMHO.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #9)
> >after looking at the lookup code I decided it was efficient enough to leave nsIUrlClassifierDbService.idl instead of adding a separate lookup(download=true) flag, or a lookupDownload API. What do you think
> 
> Don't you need a way to signal whether you're passing a SHA256 of a
> downloaded binary, or an URL of the site where you downloaded from (the
> latter being chopped up and every partial URL being SHA256ed and tested
> against the DB)?

Hi gcp,

The interface for the digest256 lists also takes URLs, actually -- the only thing that differs between digest256 and shavar lists is that digest256 lists always contain the full hash of the URL. The API doesn't actually take the sha256 of downloaded binaries.

I don't have any other patches at the moment -- the remaining patch would be to add the new local lists to be fetched by the list manager and optionally break up that preference into multiple ones. I can work on that and add it (it's really short).

Thanks,
Monica
Comment on attachment 807878 [details] [diff] [review]
Clean up urlclassifier.gethashtables preferences (

Hi Gian-Carlo,

I started touching urlclassifier.gethashtables but ended up cleaning up a lot more stuff to use the newer Preferences:: API. I am not sure if we really need to observe all these prefs, which would make the code a lot cleaner. Also so far as I know these checks are not even enabled on mobile or b2g, but I changed those prefs anyway.

Let me know if you want me to shrink this patch to just the gethashtables changes.

Thanks,
Monica
Attachment #807878 - Flags: review?(gpascutto)
Comment on attachment 807878 [details] [diff] [review]
Clean up urlclassifier.gethashtables preferences (

As I understand it, preferences that are common to all products and used by Toolkit code can be defined once in "modules/libpref/src/init/all.js". They can be optionally overridden in product-specific files when different values are needed.
Comment on attachment 807435 [details] [diff] [review]
Check local list to suppress remote lookups (

Review of attachment 807435 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like an improvement and a simplification, thanks! I'm doing a first-pass review, some questions may seem trivial because I'm less familiar with C++ than JavaScript.

::: toolkit/components/downloads/ApplicationReputation.cpp
@@ +74,5 @@
>  
> +  /**
> +   * Private destructor since we want to be gc'ed by reference counting.
> +   */
> +  ~PendingLookup();

Why is a private destructor needed here?

@@ +104,5 @@
>  }
>  
> +PendingLookup::~PendingLookup() {
> +  mQuery = nullptr;
> +  mCallback = nullptr;

I don't think these are needed?

@@ +111,5 @@
>  nsresult
> +PendingLookup::OnComplete(bool shouldBlock, nsresult rv) {
> +  nsresult res = mCallback->OnComplete(shouldBlock, rv);
> +  mCallback = nullptr;
> +  mQuery = nullptr;

Also these may be unneeded.

@@ +136,2 @@
>    safe_browsing::ClientDownloadRequest req;
> +  nsCOMPtr<nsIURI> aURI;

nit: hm, this shouldn't have had the "a" prefix from the start, I think I missed this in my previous review.

@@ +136,4 @@
>    safe_browsing::ClientDownloadRequest req;
> +  nsCOMPtr<nsIURI> aURI;
> +  nsresult rv = mQuery->GetSourceURI(getter_AddRefs(aURI));
> +  NS_ENSURE_SUCCESS(rv, rv);

We'll need a technique similar to OnStopRequest/OnStopRequestInternal to ensure that failures here still result in an OnComplete call.

@@ +141,4 @@
>    rv = aURI->GetSpec(spec);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
>    req.set_url(spec.get());

nit: empty line after set_url instead of before

::: toolkit/components/downloads/ApplicationReputation.h
@@ +44,5 @@
> +  /**
> +   * Keeps track of pending lookups. These will be garbage collected when
> +   * removed from the array.
> +   */
> +  nsTArray<nsRefPtr<PendingLookup> > mPendingLookups;

I don't think we need a table of pending lookups, we don't have code to cancel them. The references in the objects for which PendingLookup is a callback should be enough to keep it alive.

::: toolkit/components/downloads/nsIApplicationReputation.idl
@@ +67,5 @@
>     * The SHA256 hash of the downloaded file in raw bytes. If this is not set by
>     * the caller, it will be passed as an empty string but the query won't
>     * produce any useful information.
>     */
>    attribute ACString sha256Hash;

At this point, maybe we should make these attributes readonly? You should also ask for a super-review on the interface change.

I also wonder if the name "nsIApplicationReputationQuery" still makes sense or we should change it.

I've not reviewed the tests yet.
Attachment #807435 - Flags: review?(paolo.mozmail)
(In reply to :Paolo Amadini from comment #14)
> Comment on attachment 807878 [details] [diff] [review]
> Clean up urlclassifier.gethashtables preferences (
> 
> As I understand it, preferences that are common to all products and used by
> Toolkit code can be defined once in "modules/libpref/src/init/all.js". They
> can be optionally overridden in product-specific files when different values
> are needed.

Awesome! In that case I will delete them from all the product-specific files, since they are identical.

Thanks for the review, I will make the changes, and agree on making the attributes readonly.

Do you mean that nsIApplicationReputationQuery doesn't make sense because it also queries a local db? If so, I think it's ok. The feature itself is called ApplicationReputation in the docs (https://wiki.mozilla.org/Security/Features/Application_Reputation) which I think was named after the IE feature (http://blogs.msdn.com/b/ie/archive/2011/05/17/smartscreen-174-application-reputation-in-ie9.aspx), and I don't think it matters much whether the query is local-only or not, only whether people can figure what it does from the name.

Thanks,
Monica
Attachment #807878 - Flags: review?(gpascutto) → review+
(In reply to Monica Chew [:mmc] (please use needinfo) from comment #16)
> Do you mean that nsIApplicationReputationQuery doesn't make sense because it
> also queries a local db?

I mean that, since with this new approach the interface nsIApplicationReputationQuery does not correspond to a controllable object that actually executes the "query", but to a different, passive object that only provides the parameters, maybe the word "query" should be replaced with something else. But it's also fine as is.
Attachment #807435 - Attachment is obsolete: true
Attachment #807435 - Flags: feedback?(gpascutto)
Attachment #807878 - Attachment is obsolete: true
Comment on attachment 808715 [details] [diff] [review]
Clean up urlclassifier.gethashtables preferences (

r=gcp in comment 16.5
Moved shared prefs to modules/libpref/src/init/all.js as Paolo recommended in comment 15
Address Paolo's comments.
Attachment #808787 - Flags: review?(paolo.mozmail)
Attachment #808714 - Attachment is obsolete: true
Attachment #808773 - Flags: review?(dtownsend+bugmail) → review+
Comment on attachment 808787 [details] [diff] [review]
Check local list to suppress remote lookups (

Review of attachment 808787 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: toolkit/components/downloads/ApplicationReputation.cpp
@@ +139,5 @@
> +  // We did not find a local result, so fire off the query to the application
> +  // reputation service.
> +  safe_browsing::ClientDownloadRequest req;
> +  nsCOMPtr<nsIURI> uri;
> +  nsresult rv = mQuery->GetSourceURI(getter_AddRefs(uri));

nit: use standalone "nsresult rv;" declaration

@@ +325,5 @@
> +}
> +
> +ApplicationReputationService::~ApplicationReputationService() {
> +  mDBService = nullptr;
> +  mSecurityManager = nullptr;

Do we need to set these to null explicitly?

@@ +331,5 @@
> +
> +NS_IMETHODIMP
> +ApplicationReputationService::QueryReputation(
> +  nsIApplicationReputationQuery* aQuery,
> +  nsIApplicationReputationCallback* aCallback) {

nit: indent these parameters some more

@@ +338,5 @@
> +
> +  nsresult rv = QueryReputationInternal(aQuery, aCallback);
> +  if (NS_FAILED(rv)) {
> +    aCallback->OnComplete(false, rv);
> +    aCallback = nullptr;

Setting to null here does not have any effect.

@@ +354,5 @@
> +    NS_ENSURE_SUCCESS(rv, rv);
> +  }
> +  if (!mSecurityManager) {
> +    mSecurityManager = do_GetService("@mozilla.org/scriptsecuritymanager;1",
> +      &rv);

nit: indentation

@@ +378,5 @@
> +  // Create a new pending lookup.
> +  nsRefPtr<PendingLookup> lookup(new PendingLookup(aQuery, aCallback));
> +  NS_ENSURE_STATE(lookup);
> +
> +  nsCOMPtr<nsIURI> aURI;

nit: the usual "a" prefix :-)

@@ +384,5 @@
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  // If the URI hasn't been set, bail
> +  NS_ENSURE_STATE(aURI);
> +  nsCOMPtr<nsIPrincipal> principal;
> +  mSecurityManager->GetNoAppCodebasePrincipal(aURI, getter_AddRefs(principal));

Missing nsresult check?

By the way, I trust you that GetNoAppCodebasePrincipal is the way the lookup service is supposed to obtain the URL to look up. It seems to me this security function is actually used as a way to clean up the source URI. Maybe an explanatory comment on the mechanism would be useful here.

::: toolkit/components/downloads/ApplicationReputation.h
@@ +10,5 @@
>  #include "nsIApplicationReputation.h"
>  #include "nsIRequestObserver.h"
>  #include "nsIStreamListener.h"
>  #include "nsISupports.h"
> +#include "nsIUrlClassifierDBService.h" // For nsIUrlClassifierCallback

nit: remove part after //

@@ +54,5 @@
>                                     nsIApplicationReputationCallback* aCallback);
> +  /**
> +   * Delete lookups that are already finished.
> +   */
> +  nsresult DeleteFinishedLookups();

Should be removed.

::: toolkit/components/downloads/test/unit/test_app_rep.js
@@ +14,5 @@
> +
> +function readFileToString(aFilename) {
> +  let f = do_get_file(aFilename);
> +  let stream = Cc["@mozilla.org/network/file-input-stream;1"]
> +    .createInstance(Ci.nsIFileInputStream);

global indentation nit: avoid two-character continuation indentation when possible. In this case:

let stream = Cc["@mozilla.org/network/file-input-stream;1"]
               .createInstance(Ci.nsIFileInputStream);

@@ +40,5 @@
> +
> +  gHttpServ.registerPathHandler(redirectPath, function(request, response) {
> +    do_print("Mock safebrowsing server handling request for " + redirectPath);
> +    let contents = readFileToString(aFilename);
> +    do_print("contents of " + aFilename + ": " + contents.length);

"Length of "

@@ +108,4 @@
>    query.sourceURI = createURI("http://evil.com");
>    query.fileSize = 12;
>  
>    gAppRep.queryReputation(query, function onComplete(aShouldBlock, aStatus) {

global nit:

gAppRep.queryReputation({
  sourceURI: createURI("http://evil.com"),
  fileSize: 12,
}, function onComplete(aShouldBlock, aStatus) {
Attachment #808787 - Flags: review?(paolo.mozmail) → review+
Ugh, I think I pushed old versions of the patch that were missing a file. I'm sorry. Try was green: https://tbpl.mozilla.org/?tree=Try&rev=8fc6381b0ab5
New try, with all the patches folded into one (ryanvm says that patches should not depend on each other, to avoid bisecting pain): https://tbpl.mozilla.org/?tree=Try&rev=9e3ab52a7bb1
(In reply to Monica Chew [:mmc] (please use needinfo) from comment #16)
> (In reply to :Paolo Amadini from comment #14)

> Awesome! In that case I will delete them from all the product-specific
> files, since they are identical.

SeaMonkey isn't using identical files. We use "googpub-phish-shavar" rather than "goog-phish-shavar". Would appreciate a heads up the next time you change table names etc. Thanks muchly.
(In reply to Philip Chee from comment #27)
> SeaMonkey isn't using identical files. We use "googpub-phish-shavar" rather
> than "goog-phish-shavar". Would appreciate a heads up the next time you
> change table names etc. Thanks muchly.

Sorry for that, we should definitely be more careful when changing core preferences. This is related to the fact we changed the preference names, rather than the fact they've been centralized (which is still good).

We should define an override for the new "urlclassifier.phish_table" preference in "suite/browser/browser-prefs.js", and remove the obsolete preferences. Phil, I'm not sure about how we should proceed for safely landing a synchronized comm-central patch for this, suggestions?

I also think this should be mentioned in developer notes for Firefox 27, I've added the dev-doc-needed keyword.
Keywords: dev-doc-needed
(In reply to :Paolo Amadini from comment #28)

> Sorry for that, we should definitely be more careful when changing core
> preferences. This is related to the fact we changed the preference names,
> rather than the fact they've been centralized (which is still good).

I agree this is a good move. No complaints here.

> We should define an override for the new "urlclassifier.phish_table"
> preference in "suite/browser/browser-prefs.js", and remove the obsolete
> preferences. Phil, I'm not sure about how we should proceed for safely
> landing a synchronized comm-central patch for this, suggestions?

I'm going to file a SeaMonkey bug and set a dependency here. If the patch(es) here stick just send me a NEEDINFO and I'll take care of the SeaMonkey side of things.
Blocks: 920951
One last question:
> +pref("urlclassifier.download_block_table", "goog-badbinurl-shavar");
> +pref("urlclassifier.download_allow_table", "goog-downloadwhite-digest256");
We (SeaMonkey) use our own API key as we aren't Firefox. Are these two tables available to non-Firefox, non-Chrome projects/products? Linux distros which have agreements with Google are probably using their own API key as well.
(In reply to Philip Chee from comment #30)
> One last question:
> > +pref("urlclassifier.download_block_table", "goog-badbinurl-shavar");
> > +pref("urlclassifier.download_allow_table", "goog-downloadwhite-digest256");
> We (SeaMonkey) use our own API key as we aren't Firefox. Are these two
> tables available to non-Firefox, non-Chrome projects/products? Linux distros
> which have agreements with Google are probably using their own API key as
> well.

Hi Phillip,

Not at the moment. During talks we've agreed to Firefox usage only. The bug for flipping the Firefox API key on is bug 887044.

The b2g breakage on last push was real, this check should not be enabled for b2g. Fixed: https://tbpl.mozilla.org/?tree=Try&rev=28187852a71a

Thanks,
Monica
https://hg.mozilla.org/mozilla-central/rev/fe62ae1404a4
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
>  17.102 +  // Do we *really* need to be able to change all of these at runtime?
This came up previously when it was noted that Safebrowsing had code that implied that it was possible to change prefs at runtime but didn't actually work. The conclusion was to either make it work or stop pretending that it was possible.
You need to log in before you can comment on or make changes to this bug.