1287058 - Supports SafeBrowsing v4 partial update

(In reply to Dimi Lee[:dimi][:dlee] from comment #2) > Hi Henry, > Do you mind me taking this bug ? I am working on stuff related to this. Sure! To be honest, at the time of filing this bug, I haven't decided what/how to do with this feature. After Bug 1287059 is fixed, the states would be stored to preferences. Afterward, listmanager would fetch the states from prefs and build the request. I would expect this bug to have two patches: 1) A API for ProtocolParserProtocolbuf to save states and another API for listmanager to get the saved states. 2) ProtocolParserProtobuf/listmanager to use the APIs in (1) to set/get states I can help doing (2) once the APIs are ready. Does it make sense?

Dimi Lee [:dimi]

Assignee

Comment 4

•

8 years ago

(In reply to Henry Chang [:henry][:hchang] from comment #3) > (In reply to Dimi Lee[:dimi][:dlee] from comment #2) > I would expect this bug to have two patches: > > 1) A API for ProtocolParserProtocolbuf to save states and > another API for listmanager to get the saved states. > > 2) ProtocolParserProtobuf/listmanager to use the APIs in (1) > to set/get states > > I can help doing (2) once the APIs are ready. Does it make sense? I am going to work on 3) PrefixSet to support index removal and merge new prefixes So if you are ok i guess i'll file another bug for (3) and you can work on (1)(2) in this bug?

Dimi Lee [:dimi]

Assignee

Comment 5

•

8 years ago

Offline discussed with henry, we will keep state in preference for now. So this bug will only implement "Merge and remove" prefixes.

Assignee: hchang → dlee

Status: NEW → ASSIGNED

Dimi Lee [:dimi]

Assignee

Comment 6

•

8 years ago

Attached patch WIP Patch (obsolete) — Details — Splinter Review

Dimi Lee [:dimi]

Assignee

Comment 7

•

8 years ago

Attached patch P1. Support partial update (obsolete) — Details — Splinter Review

Attachment #8788319 - Attachment is obsolete: true

Attachment #8789183 - Flags: review?(hchang)

Dimi Lee [:dimi]

Assignee

Comment 8

•

8 years ago

Attached patch P2. Testcase (obsolete) — Details — Splinter Review

Test case still require some time to refine. I will upload a better version later for review.

Henry Chang [:hchang]

Reporter

Comment 9

•

8 years ago

Comment on attachment 8789183 [details] [diff] [review] P1. Support partial update Review of attachment 8789183 [details] [diff] [review]: ----------------------------------------------------------------- This is definitely an awesome patch for partial update! Great job! I don't know who would be the next reviewer but maybe an illustration for LookupCacheV4::ApplyPartialUpdate will accelerate the following review pretty much. Thanks! ::: toolkit/components/url-classifier/LookupCache.cpp @@ +572,5 @@ > +// The different is prefixes from update use std:string(to avoid additional copy) > +// and prefixes from VLPrefixSet use nsCString. > +// This class provides a common interface for partial update algorithm to operate > +// on two different kind prefix string map easier. > +class V4PrefixSet I like the idea wrapping PrefixStringMap and TableUpdateV4::PrefixesStringMap to one class. I would feel it deserves a better name than V4PrefixSet. V4PrefixSet is a little confusing to me but I can't figure out a better name actually. So maybe just leave it there until we find a better one :p @@ +726,5 @@ > + oldPSet.Merge(aOutputMap); > + break; > + } > + } > + From what I understand, there is no need to deal with the above two special cases. If one of the prefix maps is empty, the main routine would still be applicable. The reason you specifically handle the special cases is obtaining the smallest prefix from a prefix map is not trivial (because the enumerated key is not guaranteed ordered). Do I understand correctly?

Attachment #8789183 - Flags: review?(hchang) → feedback+

Dimi Lee [:dimi]

Assignee

Comment 10

•

8 years ago

Thanks for your review! Henry. (In reply to Henry Chang [:henry][:hchang] from comment #9) > > I don't know who would be the next reviewer but maybe an illustration for > LookupCacheV4::ApplyPartialUpdate will accelerate the following review > pretty much. > > Thanks! Good idea, i will try to write one. > > ::: toolkit/components/url-classifier/LookupCache.cpp > @@ +572,5 @@ > I like the idea wrapping PrefixStringMap and > TableUpdateV4::PrefixesStringMap to one class. I would feel it deserves a > better name than V4PrefixSet. V4PrefixSet is a little confusing to me but I > can't figure out a better name actually. So maybe just leave it there until > we find a better one :p > I agree, that naming is really bad :( I'll see if i can come out a better one, hopefully. > @@ +726,5 @@ > > + oldPSet.Merge(aOutputMap); > > + break; > > + } > > + } > > + > > From what I understand, there is no need to deal with the above two special > cases. If one of the prefix maps is empty, the main routine would still be > applicable. The reason you specifically handle the special cases is > obtaining the smallest prefix from a prefix map is not trivial (because the > enumerated key is not guaranteed ordered). Do I understand correctly? Yes, the main purpose here is to speed up. Once one map is empty then we can merge it without comparing the smallest one in current map.

Dimi Lee [:dimi]

Assignee

Comment 11

•

8 years ago

Attached patch P2. Testcase (obsolete) — Details — Splinter Review

Hi Henry, This should be the same testcase file you have already reviewed in bug 1283009. I add testcases for partial update in this patch, but i also changes lots of things due to partial update. So i think you'll have to check it again, sorry for the inconvenience.

Attachment #8789184 - Attachment is obsolete: true

Attachment #8790633 - Flags: feedback?(hchang)

Dimi Lee [:dimi]

Assignee

Updated

•

8 years ago

Attachment #8790633 - Flags: feedback?(hchang)

Comment hidden (mozreview-request)

Dimi Lee [:dimi]

Assignee

Comment 13

•

8 years ago

Hi gcp, This patch is based on bug 1283009 and only focus on partial update. Could you help review this ? I made a very simple slide to explain the basic idea of partial update: https://docs.google.com/a/mozilla.com/presentation/d/1DxmTh_7SBG40NWpcm1jQYMRy2C3Uq1ZJ97DhAcS5A0E/edit?usp=sharing Please let me know if there is any question, thanks!

Gian-Carlo Pascutto [:gcp]

Comment 14

•

8 years ago

mozreview-review

Comment on attachment 8792452 [details] Bug 1287058 - Supports SafeBrowsing v4 partial update. https://reviewboard.mozilla.org/r/79508/#review78504 ::: toolkit/components/url-classifier/Classifier.cpp:955 (Diff revision 1) > + > + // prefixes2 is only used in partial update. If there are multiple > + // updates for the same table, prefixes1 & prefixes2 will act as > + // input and output in turn to reduce memory copy overhead. > + PrefixStringMap prefixes1, prefixes2; > + PrefixStringMap* output = &prefixes1; These will always point to a valid object, so you might want to use references instead. The code below will become cleaner. ::: toolkit/components/url-classifier/Classifier.cpp:979 (Diff revision 1) > - prefixes.Put(iter.Key(), prefix); > + output->Put(iter.Key(), prefix); > } > } else { > - // TODO: Bug 1287058, partial update > + PrefixStringMap* input = nullptr; > + // If both prefix set is empty, this means we are doing a partial update > + // without a prior full/partiual update in the loop. In this case we should ...partial update...from the lookup cache... ::: toolkit/components/url-classifier/LookupCache.cpp:591 (Diff revision 1) > LOG(("Update: %s", str.get())); > } > } > #endif > > +// Prefixes come from update and VLPrefixSet are both stored in the HashTable ...coming from...updates... ::: toolkit/components/url-classifier/LookupCache.cpp:592 (Diff revision 1) > } > } > #endif > > +// Prefixes come from update and VLPrefixSet are both stored in the HashTable > +// which key, value pair is prefix size and lexicographic-sorted string. ...where they (key, value) pair is a...and a... ::: toolkit/components/url-classifier/LookupCache.cpp:593 (Diff revision 1) > } > #endif > > +// Prefixes come from update and VLPrefixSet are both stored in the HashTable > +// which key, value pair is prefix size and lexicographic-sorted string. > +// The different is prefixes from update use std:string(to avoid additional copy) ...the difference is...prefixes from updates...additional copies... ...for the partial update algorithm...to make it easier to... The std::string is there because that's the output from the Protobuff library? ::: toolkit/components/url-classifier/LookupCache.cpp:597 (Diff revision 1) > +// which key, value pair is prefix size and lexicographic-sorted string. > +// The different is prefixes from update use std:string(to avoid additional copy) > +// and prefixes from VLPrefixSet use nsCString. > +// This class provides a common interface for partial update algorithm to operate > +// on two different kind prefix string map easier. > +class V4PrefixSet This should probably go into its own file? Maybe together with all LookupCacheV4 code? ::: toolkit/components/url-classifier/LookupCache.cpp:601 (Diff revision 1) > +// on two different kind prefix string map easier. > +class V4PrefixSet > +{ > +public: > + V4PrefixSet(const PrefixStringMap& aMap); > + V4PrefixSet(const TableUpdateV4::PrefixesStringMap& aMap); PrefixStringMap vs TableUpdateV4::PrefixesStringMap is very confusing...what's the difference? Is the latter with std::string? Then maybe name it PrefixStdStringMap. Together with the comment above noting the existence of a std::string map, it's much more obvious that's what it is. ::: toolkit/components/url-classifier/LookupCache.cpp:718 (Diff revision 1) > + V4PrefixSet addPSet(aTableUpdate->Prefixes()); > + > + // RemovalIndiceArray is a sorted integer array indicates the index of prefix we should > + // remove from the old prefix set(according to lexigraphic order). > + // |removalIndex| is the current index of RemovalIndiceArray. > + // |numOldPrefixPicked| is used to record how many prefixes we are picked from old map ...we picked from the old map. ::: toolkit/components/url-classifier/LookupCache.cpp:732 (Diff revision 1) > + // Get smallest prefix from old prefix set if we don't have one > + if (smallestOldPrefix.IsEmpty()) { > + // If prefixes from old prefix set are all merged, > + // then we can merge entire add prefix set directly. > + if (!oldPSet.GetSmallestPrefix(smallestOldPrefix)) { > + AppendPrefixToMap(aOutputMap, smallestAddPrefix); I think there's a hidden assumpion here that we won't receive a "partial update" on an empty prefixset, i.e. that smallestAddPrefix isn't (still) empty here? I saw later on a test actually excercises this, so AppendPrefixToMap is a no-op if smallestAddPrefix is empty? ::: toolkit/components/url-classifier/LookupCache.cpp:759 (Diff revision 1) > + // lexigraphic order. > + if (smallestOldPrefix < smallestAddPrefix || > + smallestAddPrefix.IsEmpty()) { > + numOldPrefixPicked++; > + > + // If number of picks from old map matches the removalIndice, then this prefix ...the number of pics...removalIndex... ::: toolkit/components/url-classifier/LookupCache.cpp:780 (Diff revision 1) > + } > + } > + > + if (removalIndex < removalArray.Length()) { > + // Still removal left after merging. > + NS_WARNING("There is still removal left after applying update."); Not very clear, maybe: "There are still prefixes to remove after exhausting the old PrefixSet." ::: toolkit/components/url-classifier/LookupCache.cpp:809 (Diff revision 1) > +V4PrefixSet::Merge(PrefixStringMap& aPrefixMap) { > + for (auto iter = mMap.ConstIter(); !iter.Done(); iter.Next()) { > + nsCString* prefixString = aPrefixMap.LookupOrAdd(iter.Key()); > + PrefixString* str = iter.Data(); > + > + if (str->get()) { What's the case where the string would be empty? Do we want to WARN/ASSERT here? ::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:160 (Diff revision 1) > file->AppendRelativeNativePath(NS_LITERAL_CSTRING("safebrowsing/gtest-malware-proto.pset")); > file->Remove(false); > } > > static void > +testUpdateFail(nsTArray<TableUpdate*>& tableUpdates) This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional. ::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:329 (Diff revision 1) > + } > + > + Clear(); > +} > + > +// Expecet failure because partial update contains prefix already ...Expect... ::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTableUpdateV4.cpp:330 (Diff revision 1) > + > + Clear(); > +} > + > +// Expecet failure because partial update contains prefix already > +// in old prefix set. You can also add a "failure test" for removal indices that are larger than the length of the original prefix set.

Gian-Carlo Pascutto [:gcp]

Comment 15

•

8 years ago

mozreview-review

Comment on attachment 8792452 [details] Bug 1287058 - Supports SafeBrowsing v4 partial update. https://reviewboard.mozilla.org/r/79508/#review78514 Looks good but I have some comments/questions and I've like to re-review before landing.

Attachment #8792452 - Flags: review?(gpascutto) → review-

Dimi Lee [:dimi]

Assignee

Comment 16

•

8 years ago

mozreview-review-reply

Comment on attachment 8792452 [details] Bug 1287058 - Supports SafeBrowsing v4 partial update. https://reviewboard.mozilla.org/r/79508/#review78504 > These will always point to a valid object, so you might want to use references instead. The code below will become cleaner. I'll get following error when using refernece error: object of type 'nsClassHashtable<nsUint32HashKey, nsCString>' cannot be assigned because its copy assignment operator is implicitly deleted And i think' reference should not be reassigned so i use pointer here. > ...the difference is...prefixes from updates...additional copies... > > ...for the partial update algorithm...to make it easier to... > > The std::string is there because that's the output from the Protobuff library? Yes, we use std::string because its the output of protocol buffer > This should probably go into its own file? Maybe together with all LookupCacheV4 code? Agree, i will separate it > PrefixStringMap > vs > TableUpdateV4::PrefixesStringMap > > is very confusing...what's the difference? Is the latter with std::string? Then maybe name it PrefixStdStringMap. Together with the comment above noting the existence of a std::string map, it's much more obvious that's what it is. Yes TableUpdateV4::PrefixesStringMap use std::string. I'll rename according to your suggestion, thanks! > I think there's a hidden assumpion here that we won't receive a "partial update" on an empty prefixset, i.e. that smallestAddPrefix isn't (still) empty here? > > I saw later on a test actually excercises this, so AppendPrefixToMap is a no-op if smallestAddPrefix is empty? Yes, if smallestAddPrefix is empty, then AppendPrefixToMap will just return without doing anything. > What's the case where the string would be empty? Do we want to WARN/ASSERT here? The string would be empty if it is already merged. So we just skip it here > This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional. Currently we will reset database and in-memory data when update fail[1], should we change that in v4 ? If you think we shouldn't reset when update fail then i'll file another bug for it. [1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#546

Gian-Carlo Pascutto [:gcp]

Comment 17

•

8 years ago

(In reply to Dimi Lee[:dimi][:dlee] from comment #16) > And i think' reference should not be reassigned so i use pointer here. Ah right :( > > This should preferably test whether the database has the original contents after the update fails, i.e. make sure our updates are transactional. > > Currently we will reset database and in-memory data when update fail[1], Well, not always. OOM errors won't lead to a database reset. Firefox crashes or power failures in the middle of an update won't either. Conversely, when the update data can't be reconciled, we reset the database so we can recover eventually. If we can test either or both that would be nice.

Comment hidden (mozreview-request)

Dimi Lee [:dimi]

Assignee

Comment 19

•

8 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #17) > > Well, not always. OOM errors won't lead to a database reset. Firefox crashes > or power failures in the middle of an update won't either. Conversely, when > the update data can't be reconciled, we reset the database so we can recover > eventually. > > If we can test either or both that would be nice. oh, sorry just saw your comment after asking for review, should i ask for review after this testcase is finished ? BTW, do you have any idea how to simulate those failures you talked about in gtest ?

Gian-Carlo Pascutto [:gcp]

Comment 20

•

8 years ago

>oh, sorry just saw your comment after asking for review, should i ask for review after this testcase is finished ? BTW, do you have any idea how to simulate those failures you talked about in gtest ? No, we can add more tests in followups, I'll review what you have. Simulating the "keep the DB" failures doesn't look easy. You want to bail out in Classifier::ApplyUpdates halfway, but the only way that happens is on OOM. Testing that the database gets cleared on a failure is feasible with what you already have.

Gian-Carlo Pascutto [:gcp]

Comment 21

•

8 years ago

mozreview-review

Comment on attachment 8792452 [details] Bug 1287058 - Supports SafeBrowsing v4 partial update. https://reviewboard.mozilla.org/r/79508/#review79412 ::: toolkit/components/url-classifier/LookupCacheV4.cpp:39 (Diff revision 2) > + // Find the smallest string from the map in V4PrefixSet. > + bool GetSmallestPrefix(nsDependentCSubstring& aOutString); > + > +private: > + // PrefixString structure contains a lexicographic-sorted string with > + // a |pos| variable to indicate which substring we are pointing right now. pointing to ::: toolkit/components/url-classifier/LookupCacheV4.cpp:139 (Diff revision 2) > + PrefixStringMap& aOutputMap) > +{ > + MOZ_ASSERT(aOutputMap.IsEmpty()); > + > + // oldPSet contains prefixes we already have or we just merged last round. > + // addPSet contains prefixes store in tableUpdate and should be merged with oldPSet. stored in...which should be merged ::: toolkit/components/url-classifier/LookupCacheV4.cpp:143 (Diff revision 2) > + // oldPSet contains prefixes we already have or we just merged last round. > + // addPSet contains prefixes store in tableUpdate and should be merged with oldPSet. > + V4PrefixSet oldPSet(aInputMap); > + V4PrefixSet addPSet(aTableUpdate->Prefixes()); > + > + // RemovalIndiceArray is a sorted integer array indicates the index of prefix we should indicating ::: toolkit/components/url-classifier/LookupCacheV4.cpp:157 (Diff revision 2) > + nsDependentCSubstring smallestAddPrefix; > + > + while(true) { > + // Get smallest prefix from old prefix set if we don't have one > + if (smallestOldPrefix.IsEmpty()) { > + // If prefixes from old prefix set are all merged, from the old prefix set...the entire add prefix set ::: toolkit/components/url-classifier/LookupCacheV4.cpp:168 (Diff revision 2) > + } > + } > + > + // Get smallest prefix from add prefix set if we don't have one > + if (smallestAddPrefix.IsEmpty()) { > + // If add prefixes are all merged and there is no removalIndices left. ...left, then merge the entire... ::: toolkit/components/url-classifier/LookupCacheV4.cpp:169 (Diff revision 2) > + } > + > + // Get smallest prefix from add prefix set if we don't have one > + if (smallestAddPrefix.IsEmpty()) { > + // If add prefixes are all merged and there is no removalIndices left. > + // Then merge entire old prefix set directly. If there is still ...if there are still... ::: toolkit/components/url-classifier/LookupCacheV4.cpp:180 (Diff revision 2) > + oldPSet.Merge(aOutputMap); > + break; > + } > + } > + > + // Compare smallest string in old prefix set and add prefix set, merge the Compare the smallest strings in the...

Attachment #8792452 - Flags: review?(gpascutto) → review+

Dimi Lee [:dimi]

Assignee

Updated

•

8 years ago

Attachment #8790633 - Attachment is obsolete: true

Dimi Lee [:dimi]

Assignee

Updated

•

8 years ago

Attachment #8789183 - Attachment is obsolete: true

Dimi Lee [:dimi]

Assignee

Comment 22

•

8 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #21) > Comment on attachment 8792452 [details] > Bug 1287058 - Supports SafeBrowsing v4 partial update. > Thanks for review! I am sorry about having so many grammar errors in the comments, i'll try to improve that in the future :)

Comment hidden (mozreview-request)

François Marier [:francois]

Updated

•

8 years ago

Blocks: 1305486

François Marier [:francois]

Comment 25

•

8 years ago

mozreview-review

Comment on attachment 8792452 [details] Bug 1287058 - Supports SafeBrowsing v4 partial update. https://reviewboard.mozilla.org/r/79508/#review79854 As per our discussion, here are a few suggestions: - Using "VLPrefixSet" instead of "V4PrefixSet". - Exporting the Google Docs Presentation to a PDF and attaching it to the bug so that it can be found easily later. Maybe link to it from the algorithm in the code too. - In the case of a prefix being in the add list and also in the original prefix set, we may want to issue a warning and to ignore it instead of returning an error. Perhaps with telemetry as well. - Replacing "while(true)" in the merging code with a bounded for loop (to eliminate any possibility of an infinite loop). Telemetry for this could be useful too.

Dimi Lee [:dimi]

Assignee

Comment 26

•

8 years ago

Attached file Partial Update Algorithm.pdf — Details

Dimi Lee [:dimi]

Assignee

Comment 27

•

8 years ago

(In reply to François Marier [:francois] from comment #25) > Comment on attachment 8792452 [details] > Bug 1287058 - Supports SafeBrowsing v4 partial update. > > https://reviewboard.mozilla.org/r/79508/#review79854 > > As per our discussion, here are a few suggestions: > > - In the case of a prefix being in the add list and also in the original > prefix set, we may want to issue a warning and to ignore it instead of > returning an error. Perhaps with telemetry as well. As discussed we will keep current behavior that just return an error because the algorithm itself is hard to handle the case if an add prefix is already in the original prefix.(Chromium code also does the same thing).

Dimi Lee [:dimi]

Assignee

Updated

•

8 years ago

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → DUPLICATE

WIP Patch 8 years ago Dimi Lee [:dimi] 21.28 KB, patch		Details \| Diff \| Splinter Review
P1. Support partial update 8 years ago Dimi Lee [:dimi] 12.82 KB, patch	hchang : feedback+	Details \| Diff \| Splinter Review
P2. Testcase 8 years ago Dimi Lee [:dimi] 17.92 KB, patch		Details \| Diff \| Splinter Review
P2. Testcase 8 years ago Dimi Lee [:dimi] 19.62 KB, patch		Details \| Diff \| Splinter Review
Bug 1287058 - Supports SafeBrowsing v4 partial update. 8 years ago Dimi Lee [:dimi] 58 bytes, text/x-review-board-request	gcp : review+	Details
Partial Update Algorithm.pdf 8 years ago Dimi Lee [:dimi] 92.69 KB, application/pdf		Details