Closed Bug 1287058 Opened 8 years ago Closed 8 years ago

Supports SafeBrowsing v4 partial update

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1305801

People

(Reporter: hchang, Assigned: dimi)

References

Details

(Whiteboard: #sbv4-m1)

Attachments

(2 files, 4 obsolete files)

      No description provided.
Whiteboard: #sbv4-m1
This feature might depend on

1) Save states in HashStore
2) HashStore/PrefixSet to support index removal
Assignee: nobody → hchang
Summary: [SafeBrowsing] Supports v4 partial update → Supports SafeBrowsing v4 partial update
Depends on: 1287059
Priority: -- → P2
Hi Henry,
Do you mind me taking this bug ? I am working on stuff related to this.
(In reply to Dimi Lee[:dimi][:dlee] from comment #2)
> Hi Henry,
> Do you mind me taking this bug ? I am working on stuff related to this.

Sure! To be honest, at the time of filing this bug, I haven't decided what/how to do with this feature. After Bug 1287059 is fixed, the states would be stored to preferences. Afterward, listmanager would fetch the states from prefs and build the request. 

I would expect this bug to have two patches:

1) A API for ProtocolParserProtocolbuf to save states and 
   another API for listmanager to get the saved states.

2) ProtocolParserProtobuf/listmanager to use the APIs in (1) 
   to set/get states

I can help doing (2) once the APIs are ready. Does it make sense?
(In reply to Henry Chang [:henry][:hchang] from comment #3)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #2)
> I would expect this bug to have two patches:
> 
> 1) A API for ProtocolParserProtocolbuf to save states and 
>    another API for listmanager to get the saved states.
> 
> 2) ProtocolParserProtobuf/listmanager to use the APIs in (1) 
>    to set/get states
> 
> I can help doing (2) once the APIs are ready. Does it make sense?

I am going to work on 
3) PrefixSet to support index removal and merge new prefixes

So if you are ok i guess i'll file another bug for (3) and you can work on (1)(2) in this bug?
Offline discussed with henry, we will keep state in preference for now.
So this bug will only implement "Merge and remove" prefixes.
Assignee: hchang → dlee
Status: NEW → ASSIGNED
Attached patch WIP Patch (obsolete) — Splinter Review
Attached patch P1. Support partial update (obsolete) — Splinter Review
Attachment #8788319 - Attachment is obsolete: true
Attachment #8789183 - Flags: review?(hchang)
Attached patch P2. Testcase (obsolete) — Splinter Review
Test case still require some time to refine.
I will upload a better version later for review.
Comment on attachment 8789183 [details] [diff] [review]
P1. Support partial update

Review of attachment 8789183 [details] [diff] [review]:
-----------------------------------------------------------------

This is definitely an awesome patch for partial update! Great job!

I don't know who would be the next reviewer but maybe an illustration for LookupCacheV4::ApplyPartialUpdate will accelerate the following review pretty much.

Thanks!

::: toolkit/components/url-classifier/LookupCache.cpp
@@ +572,5 @@
> +// The different is prefixes from update use std:string(to avoid additional copy)
> +// and prefixes from VLPrefixSet use nsCString.
> +// This class provides a common interface for partial update algorithm to operate
> +// on two different kind prefix string map easier.
> +class V4PrefixSet

I like the idea wrapping PrefixStringMap and TableUpdateV4::PrefixesStringMap to one class. I would feel it deserves a better name than V4PrefixSet. V4PrefixSet is a little confusing to me but I can't figure out a better name actually. So maybe just leave it there until we find a better one :p

@@ +726,5 @@
> +        oldPSet.Merge(aOutputMap);
> +        break;
> +      }
> +    }
> +

From what I understand, there is no need to deal with the above two special cases. If one of the prefix maps is empty, the main routine would still be applicable. The reason you specifically handle the special cases is obtaining the smallest prefix from a prefix map is not trivial (because the enumerated key is not guaranteed ordered). Do I understand correctly?
Attachment #8789183 - Flags: review?(hchang) → feedback+
Thanks for your review! Henry.

(In reply to Henry Chang [:henry][:hchang] from comment #9)
> 
> I don't know who would be the next reviewer but maybe an illustration for
> LookupCacheV4::ApplyPartialUpdate will accelerate the following review
> pretty much.
> 
> Thanks!

Good idea, i will try to write one.

> 
> ::: toolkit/components/url-classifier/LookupCache.cpp
> @@ +572,5 @@
> I like the idea wrapping PrefixStringMap and
> TableUpdateV4::PrefixesStringMap to one class. I would feel it deserves a
> better name than V4PrefixSet. V4PrefixSet is a little confusing to me but I
> can't figure out a better name actually. So maybe just leave it there until
> we find a better one :p
> 

I agree, that naming is really bad :(
I'll see if i can come out a better one, hopefully.

> @@ +726,5 @@
> > +        oldPSet.Merge(aOutputMap);
> > +        break;
> > +      }
> > +    }
> > +
> 
> From what I understand, there is no need to deal with the above two special
> cases. If one of the prefix maps is empty, the main routine would still be
> applicable. The reason you specifically handle the special cases is
> obtaining the smallest prefix from a prefix map is not trivial (because the
> enumerated key is not guaranteed ordered). Do I understand correctly?

Yes, the main purpose here is to speed up. Once one map is empty then we can merge it without
comparing the smallest one in current map.
Attached patch P2. Testcase (obsolete) — Splinter Review
Hi Henry,
This should be the same testcase file you have already reviewed in bug 1283009.
I add testcases for partial update in this patch, but i also changes lots of things due to partial update.
So i think you'll have to check it again, sorry for the inconvenience.
Attachment #8789184 - Attachment is obsolete: true
Attachment #8790633 - Flags: feedback?(hchang)
Attachment #8790633 - Flags: feedback?(hchang)
Hi gcp,
This patch is based on bug 1283009 and only focus on partial update.
Could you help review this ?

I made a very simple slide to explain the basic idea of partial update:
https://docs.google.com/a/mozilla.com/presentation/d/1DxmTh_7SBG40NWpcm1jQYMRy2C3Uq1ZJ97DhAcS5A0E/edit?usp=sharing

Please let me know if there is any question, thanks!
Comment on attachment 8792452 [details]
Bug 1287058 - Supports SafeBrowsing v4 partial update.

https://reviewboard.mozilla.org/r/79508/#review78504

::: toolkit/components/url-classifier/Classifier.cpp:955
(Diff revision 1)
> +
> +  // prefixes2 is only used in partial update. If there are multiple
> +  // updates for the same table, prefixes1 & prefixes2 will act as
> +  // input and output in turn to reduce memory copy overhead.
> +  PrefixStringMap prefixes1, prefixes2;
> +  PrefixStringMap* output = &prefixes1;

These will always point to a valid object, so you might want to use references instead. The code below will become cleaner.

::: toolkit/components/url-classifier/Classifier.cpp:979
(Diff revision 1)
> -        prefixes.Put(iter.Key(), prefix);
> +        output->Put(iter.Key(), prefix);
>        }
>      } else {
> -      // TODO: Bug 1287058, partial update
> +      PrefixStringMap* input = nullptr;
> +      // If both prefix set is empty, this means we are doing a partial update
> +      // without a prior full/partiual update in the loop. In this case we should

...partial update...from the lookup cache...

::: toolkit/components/url-classifier/LookupCache.cpp:591
(Diff revision 1)
>      LOG(("Update: %s", str.get()));
>    }
>  }
>  #endif
>  
> +// Prefixes come from update and VLPrefixSet are both stored in the HashTable

...coming from...updates...

::: toolkit/components/url-classifier/LookupCache.cpp:592
(Diff revision 1)
>    }
>  }
>  #endif
>  
> +// Prefixes come from update and VLPrefixSet are both stored in the HashTable
> +// which key, value pair is prefix size and lexicographic-sorted string.

...where they (key, value) pair is a...and a...

::: toolkit/components/url-classifier/LookupCache.cpp:593
(Diff revision 1)
>  }
>  #endif
>  
> +// Prefixes come from update and VLPrefixSet are both stored in the HashTable
> +// which key, value pair is prefix size and lexicographic-sorted string.
> +// The different is prefixes from update use std:string(to avoid additional copy)

...the difference is...prefixes from updates...additional copies...

...for the partial update algorithm...to make it easier to...

The std::string is there because that's the output from the Protobuff library?

::: toolkit/components/url-classifier/LookupCache.cpp:597
(Diff revision 1)
> +// which key, value pair is prefix size and lexicographic-sorted string.
> +// The different is prefixes from update use std:string(to avoid additional copy)
> +// and prefixes from VLPrefixSet use nsCString.
> +// This class provides a common interface for partial update algorithm to operate
> +// on two different kind prefix string map easier.
> +class V4PrefixSet

This should probably go into its own file? Maybe together with all LookupCacheV4 code?

::: toolkit/components/url-classifier/LookupCache.cpp:601
(Diff revision 1)
> +// on two different kind prefix string map easier.
> +class V4PrefixSet
> +{
> +public:
> +  V4PrefixSet(const PrefixStringMap& aMap);
> +  V4PrefixSet(const TableUpdateV4::PrefixesStringMap& aMap);

PrefixStringMap
vs
TableUpdateV4::PrefixesStringMap

is very confusing...what's the difference? Is the latter with std::string? Then maybe name it PrefixStdStringMap. Together with the comment above noting the existence of a std::string map, it's much more obvious that's what it is.

::: toolkit/components/url-classifier/LookupCache.cpp:718
(Diff revision 1)
> +  V4PrefixSet addPSet(aTableUpdate->Prefixes());
> +
> +  // RemovalIndiceArray is a sorted integer array indicates the index of prefix we should
> +  // remove from the old prefix set(according to lexigraphic order).
> +  // |removalIndex| is the current index of RemovalIndiceArray.
> +  // |numOldPrefixPicked| is used to record how many prefixes we are picked from old map

...we picked from the old map.

::: toolkit/components/url-classifier/LookupCache.cpp:732
(Diff revision 1)
> +    // Get smallest prefix from old prefix set if we don't have one
> +    if (smallestOldPrefix.IsEmpty()) {
> +      // If prefixes from old prefix set are all merged,
> +      // then we can merge entire add prefix set directly.
> +      if (!oldPSet.GetSmallestPrefix(smallestOldPrefix)) {
> +        AppendPrefixToMap(aOutputMap, smallestAddPrefix);

I think there's a hidden assumpion here that we won't receive a "partial update" on an empty prefixset, i.e. that smallestAddPrefix isn't (still) empty here?

I saw later on a test actually excercises this, so AppendPrefixToMap is a no-op if smallestAddPrefix is empty?

::: toolkit/components/url-classifier/LookupCache.cpp:759
(Diff revision 1)
> +    // lexigraphic order.
> +    if (smallestOldPrefix < smallestAddPrefix ||
> +        smallestAddPrefix.IsEmpty()) {
> +      numOldPrefixPicked++;
> +
> +      // If number of picks from old map matches the removalIndice, then this prefix

...the number of pics...removalIndex...

::: toolkit/components/url-classifier/LookupCache.cpp:780
(Diff revision 1)
> +    }
> +  }
> +
> +  if (removalIndex < removalArray.Length()) {
> +    // Still removal left after merging.
> +    NS_WARNING("There is still removal left after applying update.");

Not very clear, maybe: "There are still prefixes to remove after exhausting the old PrefixSet."

::: toolkit/components/url-classifier/LookupCache.cpp:809
(Diff revision 1)
> +V4PrefixSet::Merge(PrefixStringMap& aPrefixMap) {
> +  for (auto iter = mMap.ConstIter(); !iter.Done(); iter.Next()) {
> +    nsCString* prefixString = aPrefixMap.LookupOrAdd(iter.Key());
> +    PrefixString* str = iter.Data();
> +
> +    if (str->get()) {

What's the case where the string would be empty? Do we want to WARN/ASSERT here?

::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:160
(Diff revision 1)
>    file->AppendRelativeNativePath(NS_LITERAL_CSTRING("safebrowsing/gtest-malware-proto.pset"));
>    file->Remove(false);
>  }
>  
>  static void
> +testUpdateFail(nsTArray<TableUpdate*>& tableUpdates)

This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional.

::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:329
(Diff revision 1)
> +  }
> +
> +  Clear();
> +}
> +
> +// Expecet failure because partial update contains prefix already

...Expect...

::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:330
(Diff revision 1)
> +
> +  Clear();
> +}
> +
> +// Expecet failure because partial update contains prefix already
> +// in old prefix set.

You can also add a "failure test" for removal indices that are larger than the length of the original prefix set.
Comment on attachment 8792452 [details]
Bug 1287058 - Supports SafeBrowsing v4 partial update.

https://reviewboard.mozilla.org/r/79508/#review78514

Looks good but I have some comments/questions and I've like to re-review before landing.
Attachment #8792452 - Flags: review?(gpascutto) → review-
Comment on attachment 8792452 [details]
Bug 1287058 - Supports SafeBrowsing v4 partial update.

https://reviewboard.mozilla.org/r/79508/#review78504

> These will always point to a valid object, so you might want to use references instead. The code below will become cleaner.

I'll get following error when using refernece
error: object of type 'nsClassHashtable<nsUint32HashKey, nsCString>' cannot be assigned because its copy assignment operator is implicitly deleted

And i think' reference should not be reassigned so i use pointer here.

> ...the difference is...prefixes from updates...additional copies...
> 
> ...for the partial update algorithm...to make it easier to...
> 
> The std::string is there because that's the output from the Protobuff library?

Yes, we use std::string because its the output of protocol buffer

> This should probably go into its own file? Maybe together with all LookupCacheV4 code?

Agree, i will separate it

> PrefixStringMap
> vs
> TableUpdateV4::PrefixesStringMap
> 
> is very confusing...what's the difference? Is the latter with std::string? Then maybe name it PrefixStdStringMap. Together with the comment above noting the existence of a std::string map, it's much more obvious that's what it is.

Yes TableUpdateV4::PrefixesStringMap use std::string. I'll rename according to your suggestion, thanks!

> I think there's a hidden assumpion here that we won't receive a "partial update" on an empty prefixset, i.e. that smallestAddPrefix isn't (still) empty here?
> 
> I saw later on a test actually excercises this, so AppendPrefixToMap is a no-op if smallestAddPrefix is empty?

Yes, if smallestAddPrefix is empty, then AppendPrefixToMap will just return without doing anything.

> What's the case where the string would be empty? Do we want to WARN/ASSERT here?

The string would be empty if it is already merged.
So we just skip it here

> This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional.

Currently we will reset database and in-memory data when update fail[1], should we change that in v4 ?
If you think we shouldn't reset when update fail then i'll file another bug for it.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#546
(In reply to Dimi Lee[:dimi][:dlee] from comment #16)

> And i think' reference should not be reassigned so i use pointer here.

Ah right :(

> > This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional.
> 
> Currently we will reset database and in-memory data when update fail[1],

Well, not always. OOM errors won't lead to a database reset. Firefox crashes or power failures in the middle of an update won't either. Conversely, when the update data can't be reconciled, we reset the database so we can recover eventually.

If we can test either or both that would be nice.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #17)
> 
> Well, not always. OOM errors won't lead to a database reset. Firefox crashes
> or power failures in the middle of an update won't either. Conversely, when
> the update data can't be reconciled, we reset the database so we can recover
> eventually.
> 
> If we can test either or both that would be nice.

oh, sorry just saw your comment after asking for review, should i ask for review after this testcase is finished ? BTW, do you have any idea how to simulate those failures you talked about in gtest ?
>oh, sorry just saw your comment after asking for review, should i ask for review after this testcase is finished ? BTW, do you have any idea how to simulate those failures you talked about in gtest ?

No, we can add more tests in followups, I'll review what you have.

Simulating the "keep the DB" failures doesn't look easy. You want to bail out in Classifier::ApplyUpdates halfway, but the only way that happens is on OOM. Testing that the database gets cleared on a failure is feasible with what you already have.
Comment on attachment 8792452 [details]
Bug 1287058 - Supports SafeBrowsing v4 partial update.

https://reviewboard.mozilla.org/r/79508/#review79412

::: toolkit/components/url-classifier/LookupCacheV4.cpp:39
(Diff revision 2)
> +  // Find the smallest string from the map in V4PrefixSet.
> +  bool GetSmallestPrefix(nsDependentCSubstring& aOutString);
> +
> +private:
> +  // PrefixString structure contains a lexicographic-sorted string with
> +  // a |pos| variable to indicate which substring we are pointing right now.

pointing to

::: toolkit/components/url-classifier/LookupCacheV4.cpp:139
(Diff revision 2)
> +                                  PrefixStringMap& aOutputMap)
> +{
> +  MOZ_ASSERT(aOutputMap.IsEmpty());
> +
> +  // oldPSet contains prefixes we already have or we just merged last round.
> +  // addPSet contains prefixes store in tableUpdate and should be merged with oldPSet.

stored in...which should be merged

::: toolkit/components/url-classifier/LookupCacheV4.cpp:143
(Diff revision 2)
> +  // oldPSet contains prefixes we already have or we just merged last round.
> +  // addPSet contains prefixes store in tableUpdate and should be merged with oldPSet.
> +  V4PrefixSet oldPSet(aInputMap);
> +  V4PrefixSet addPSet(aTableUpdate->Prefixes());
> +
> +  // RemovalIndiceArray is a sorted integer array indicates the index of prefix we should

indicating

::: toolkit/components/url-classifier/LookupCacheV4.cpp:157
(Diff revision 2)
> +  nsDependentCSubstring smallestAddPrefix;
> +
> +  while(true) {
> +    // Get smallest prefix from old prefix set if we don't have one
> +    if (smallestOldPrefix.IsEmpty()) {
> +      // If prefixes from old prefix set are all merged,

from the old prefix set...the entire add prefix set

::: toolkit/components/url-classifier/LookupCacheV4.cpp:168
(Diff revision 2)
> +      }
> +    }
> +
> +    // Get smallest prefix from add prefix set if we don't have one
> +    if (smallestAddPrefix.IsEmpty()) {
> +      // If add prefixes are all merged and there is no removalIndices left.

...left, then merge the entire...

::: toolkit/components/url-classifier/LookupCacheV4.cpp:169
(Diff revision 2)
> +    }
> +
> +    // Get smallest prefix from add prefix set if we don't have one
> +    if (smallestAddPrefix.IsEmpty()) {
> +      // If add prefixes are all merged and there is no removalIndices left.
> +      // Then merge entire old prefix set directly. If there is still

...if there are still...

::: toolkit/components/url-classifier/LookupCacheV4.cpp:180
(Diff revision 2)
> +        oldPSet.Merge(aOutputMap);
> +        break;
> +      }
> +    }
> +
> +    // Compare smallest string in old prefix set and add prefix set, merge the

Compare the smallest strings in the...
Attachment #8792452 - Flags: review?(gpascutto) → review+
Attachment #8790633 - Attachment is obsolete: true
Attachment #8789183 - Attachment is obsolete: true
(In reply to Gian-Carlo Pascutto [:gcp] from comment #21)
> Comment on attachment 8792452 [details]
> Bug 1287058 - Supports SafeBrowsing v4 partial update.
> 
Thanks for review!
I am sorry about having so many grammar errors in the comments, i'll try to improve that in the future :)
Comment on attachment 8792452 [details]
Bug 1287058 - Supports SafeBrowsing v4 partial update.

https://reviewboard.mozilla.org/r/79508/#review79854

As per our discussion, here are a few suggestions:

- Using "VLPrefixSet" instead of "V4PrefixSet".
- Exporting the Google Docs Presentation to a PDF and attaching it to the bug so that it can be found easily later. Maybe link to it from the algorithm in the code too.
- In the case of a prefix being in the add list and also in the original prefix set, we may want to issue a warning and to ignore it instead of returning an error. Perhaps with telemetry as well.
- Replacing "while(true)" in the merging code with a bounded for loop (to eliminate any possibility of an infinite loop). Telemetry for this could be useful too.
(In reply to François Marier [:francois] from comment #25)
> Comment on attachment 8792452 [details]
> Bug 1287058 - Supports SafeBrowsing v4 partial update.
> 
> https://reviewboard.mozilla.org/r/79508/#review79854
> 
> As per our discussion, here are a few suggestions:
> 

> - In the case of a prefix being in the add list and also in the original
> prefix set, we may want to issue a warning and to ignore it instead of
> returning an error. Perhaps with telemetry as well.

As discussed we will keep current behavior that just return an error because the algorithm itself is hard to handle the case if an add prefix is already in the original prefix.(Chromium code also does the same thing).
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: