Closed
Bug 1254763
Opened 9 years ago
Closed 9 years ago
Split Safe Browsing directory in per-provider sub-directories for V4 providers
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: francois, Assigned: hchang)
References
Details
(Whiteboard: #sbv4-m1)
Attachments
(2 files)
gcp suggested the following in order to reduce churn from atomic updates.
Reporter | ||
Updated•9 years ago
|
Assignee: nobody → hchang
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•9 years ago
|
||
Hi gcp,
I am taking this bug and would like to know how much performance improvement is it supposed to get. Changing the location is equivalent to updating a database schema so that the migration needs to be taken into account. Otherwise, the updating the browser would trigger a database re-download.
Thanks :)
Flags: needinfo?(gpascutto)
Comment 2•9 years ago
|
||
See this bug for some background: https://bugzilla.mozilla.org/show_bug.cgi?id=1164518
Normal performance improvement would be fairly little, as updating the Google lists normally hits all tables, and we only update our own providers (Tracking Protection) once a day. This is mostly to reduce the really bad edge cases, and to future proof if we add more list providers or more frequently updated lists of our own.
Migration is a hassle, I think the plan was to do this together with the protocol v4 update, at which point we need a new database download anyway. On the other hand, we really only need to copy some files once to migrate...
Flags: needinfo?(gpascutto)
Assignee | ||
Comment 3•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)
> See this bug for some background:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1164518
>
> Normal performance improvement would be fairly little, as updating the
> Google lists normally hits all tables, and we only update our own providers
> (Tracking Protection) once a day. This is mostly to reduce the really bad
> edge cases, and to future proof if we add more list providers or more
> frequently updated lists of our own.
>
> Migration is a hassle, I think the plan was to do this together with the
> protocol v4 update, at which point we need a new database download anyway.
> On the other hand, we really only need to copy some files once to migrate...
Doing this only for v4 protocol seems a good idea to mitigate the migration impact :) Thanks!
Assignee | ||
Updated•9 years ago
|
Priority: -- → P2
Comment 4•9 years ago
|
||
Hi Francois,
As we know, bug 1167038 is used to track the preparation works for SB v4, which should be done before we start to implement v4.
Since we are not going to work on this bug until we start to implement v4, I feel it is more clear to remove the dependency of bug 1167038 on this bug. What do you think?
And we still need to track this bug in some way.
Do we already have a meta bug for implementing SafeBrowsing v4?
Flags: needinfo?(francois)
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(francois)
Summary: Split Safe Browsing directory in per-provider sub-directories → Split Safe Browsing directory in per-provider sub-directories for V4 providers
Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Ethan Tseng [:ethan] from comment #4)
> As we know, bug 1167038 is used to track the preparation works for SB v4,
> which should be done before we start to implement v4.
> Since we are not going to work on this bug until we start to implement v4, I
> feel it is more clear to remove the dependency of bug 1167038 on this bug.
> What do you think?
You're right, it's getting a little confusing. I've renamed the two dependent bugs that we've decided to do only for V4 (bug 502932 and bug 1254763). Hopefully that makes the status of each bug clearer.
> And we still need to track this bug in some way.
> Do we already have a meta bug for implementing SafeBrowsing v4?
The meta bug for implementing V4 is bug 1167038. It contains the dependencies and the breakdown for that task. There are a few tasks missing (the ones we haven't allocated or started yet), but aside from that do you see anything that's missing or that you'd like organized in a different way?
Comment 6•9 years ago
|
||
(In reply to François Marier [:francois] from comment #5)
> The meta bug for implementing V4 is bug 1167038. It contains the
> dependencies and the breakdown for that task. There are a few tasks missing
> (the ones we haven't allocated or started yet), but aside from that do you
> see anything that's missing or that you'd like organized in a different way?
No. Actually I was confused and misunderstood the purpose of meta bug 1167038.
I thought it was just aimed at the preparation works for SB v4 and guessed that we'll have another meta bug for implementing v4.
Now I understand your plan is to do both (preparation and implementation for v4) in bug 1167038.
Thanks for clarification.
Assignee | ||
Updated•9 years ago
|
Whiteboard: #sbv4-m1
Reporter | ||
Updated•9 years ago
|
No longer blocks: safebrowsingv4
Assignee | ||
Comment 7•9 years ago
|
||
It's time to get started fixing this!
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
Assignee | ||
Updated•9 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment hidden (mozreview-request) |
Assignee | ||
Updated•9 years ago
|
Attachment #8786595 -
Flags: review?(dlee)
Assignee | ||
Comment 9•9 years ago
|
||
I'll be writing a test case for Classifier::GetPrivateStoreDirectory.
Comment 10•9 years ago
|
||
mozreview-review |
Comment on attachment 8786595 [details]
Bug 1254763 - Part 1: Use per-provider directory for V4 databases.
https://reviewboard.mozilla.org/r/75526/#review73522
Looks good to me, thanks!
::: toolkit/components/url-classifier/Classifier.cpp:115
(Diff revision 1)
> + }
> +
> + nsDependentCSubstring provider = Substring(child, 0, dotPos);
> +
> + providers.PutEntry(provider);
> + }
Can we just use |providers.PutEntry(Substring(child, 0, dotPos))| ?
But both are good to me
::: toolkit/components/url-classifier/Classifier.cpp:166
(Diff revision 1)
> + &aTableName,
> + &providerName] () -> void {
> +
> + nsresult rv = DeriveProviderFromPref(aTableName, providerName);
> + if (NS_FAILED(rv)) {
> + LOG(("No provider found for %s", nsCString(aTableName).get()));
LOG_ENABLED
::: toolkit/components/url-classifier/Classifier.cpp:179
(Diff revision 1)
> + MutexAutoLock lock(mutex); // Prevent from being notified before calling condVar.Wait.
> + NS_DispatchToMainThread(r);
> + condVar.Wait();
> +
> + return providerName;
> +}
Nice approach!
::: toolkit/components/url-classifier/LookupCache.cpp:13
(Diff revision 1)
> #include "nsISeekableStream.h"
> #include "mozilla/Telemetry.h"
> #include "mozilla/Logging.h"
> #include "nsNetUtil.h"
> #include "prprf.h"
> +#include "nsIUrlListManager.h"
Do we need this header file ?
::: toolkit/components/url-classifier/LookupCache.cpp:105
(Diff revision 1)
> +
> + nsString path;
> + mStoreDirectory->GetPath(path);
> + LOG(("Private store directory for %s is %s", mTableName.get(),
> + NS_ConvertUTF16toUTF8(path).get()));
> +
wrap this with LOG_ENABLED
Attachment #8786595 -
Flags: review?(dlee) → review+
Assignee | ||
Comment 11•9 years ago
|
||
Thanks Dimi!
Regarding LOG_ENABLED(), from what I understand, is mainly to avoid
evaluating the arguments of LOG function when logging is not enabled.
If the argument evaluation is not expensive, checking LOG_ENABLED
is not necessary.
(In reply to Dimi Lee[:dimi][:dlee] from comment #10)
> Comment on attachment 8786595 [details]
> Bug 1254763 - Use per-provider directory for V4 databases.
>
> https://reviewboard.mozilla.org/r/75526/#review73522
>
> Looks good to me, thanks!
>
> ::: toolkit/components/url-classifier/Classifier.cpp:115
> (Diff revision 1)
> > + }
> > +
> > + nsDependentCSubstring provider = Substring(child, 0, dotPos);
> > +
> > + providers.PutEntry(provider);
> > + }
>
> Can we just use |providers.PutEntry(Substring(child, 0, dotPos))| ?
> But both are good to me
>
> ::: toolkit/components/url-classifier/Classifier.cpp:166
> (Diff revision 1)
> > + &aTableName,
> > + &providerName] () -> void {
> > +
> > + nsresult rv = DeriveProviderFromPref(aTableName, providerName);
> > + if (NS_FAILED(rv)) {
> > + LOG(("No provider found for %s", nsCString(aTableName).get()));
>
> LOG_ENABLED
>
> ::: toolkit/components/url-classifier/Classifier.cpp:179
> (Diff revision 1)
> > + MutexAutoLock lock(mutex); // Prevent from being notified before calling condVar.Wait.
> > + NS_DispatchToMainThread(r);
> > + condVar.Wait();
> > +
> > + return providerName;
> > +}
>
> Nice approach!
>
> ::: toolkit/components/url-classifier/LookupCache.cpp:13
> (Diff revision 1)
> > #include "nsISeekableStream.h"
> > #include "mozilla/Telemetry.h"
> > #include "mozilla/Logging.h"
> > #include "nsNetUtil.h"
> > #include "prprf.h"
> > +#include "nsIUrlListManager.h"
>
> Do we need this header file ?
>
> ::: toolkit/components/url-classifier/LookupCache.cpp:105
> (Diff revision 1)
> > +
> > + nsString path;
> > + mStoreDirectory->GetPath(path);
> > + LOG(("Private store directory for %s is %s", mTableName.get(),
> > + NS_ConvertUTF16toUTF8(path).get()));
> > +
>
> wrap this with LOG_ENABLED
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•9 years ago
|
Attachment #8786595 -
Flags: review?(gpascutto)
Attachment #8787084 -
Flags: review?(gpascutto)
Comment 14•9 years ago
|
||
mozreview-review |
Comment on attachment 8786595 [details]
Bug 1254763 - Part 1: Use per-provider directory for V4 databases.
https://reviewboard.mozilla.org/r/75526/#review73904
This needs either
a) Some explanation in comments on why existing Sync dispatch options aren't suitable
b) Switching to using an existing sync dispatch
c) Reworking to push the provider name from the main thread to the worker thread instead of the other way around. Maybe nsUrlClassifierDBService can just tell Classifier when it's creating that object?
::: toolkit/components/url-classifier/Classifier.cpp:119
(Diff revision 2)
> + providers.PutEntry(provider);
> + }
> + NS_FREE_XPCOM_ALLOCATED_POINTER_ARRAY(childCount, childArray);
> +
> + // Now we have all providers. Check which one owns |aTableName|.
> + // e.g. The owning lists of provider "goolge" is defined in
typo: google
::: toolkit/components/url-classifier/Classifier.cpp:155
(Diff revision 2)
> +GetProviderByTableName(const nsACString& aTableName)
> +{
> + MOZ_ASSERT(!NS_IsMainThread(), "GetProviderByTableName MUST be called "
> + "on non-main thread.");
> +
> + Mutex mutex("LookupCache get provider lock");
I'd use something like LookupCache::GetProviderByTableNameLock, it will make finding the origin easier.
::: toolkit/components/url-classifier/Classifier.cpp:175
(Diff revision 2)
> + MutexAutoLock lock(mutex);
> + condVar.Notify();
> + });
> +
> + MutexAutoLock lock(mutex); // Prevent from being notified before calling condVar.Wait.
> + NS_DispatchToMainThread(r);
This looks like it's just reimplementing NS_DispatchToMainThread(event, NS_DISPATCH_SYNC); or reinventing SyncRunnable.
::: toolkit/components/url-classifier/Classifier.cpp:176
(Diff revision 2)
> + condVar.Notify();
> + });
> +
> + MutexAutoLock lock(mutex); // Prevent from being notified before calling condVar.Wait.
> + NS_DispatchToMainThread(r);
> + condVar.Wait();
This will fail with spurious wakeups. You should check the predicate that the Wait() is protecting. (In this case, DeriveProviderFromPref having run)
::: toolkit/components/url-classifier/Classifier.cpp:204
(Diff revision 2)
> + return NS_OK;
> + }
> +
> + nsCOMPtr<nsIFile> providerDirectory;
> +
> + // Clone first since we are gonna create new directory.
typo: create a new directory
::: toolkit/components/url-classifier/LookupCache.cpp:98
(Diff revision 2)
> + if (NS_FAILED(rv)) {
> + LOG(("Failed to get private store directory for %s", mTableName.get()));
> + mStoreDirectory = mRootStoreDirectory;
> + }
> +
> + nsString path;
Surround with MOZ_LOG_TEST.
Attachment #8786595 -
Flags: review?(gpascutto) → review-
Comment 15•9 years ago
|
||
mozreview-review |
Comment on attachment 8787084 [details]
Bug 1254763 - Part 2: Test cases.
https://reviewboard.mozilla.org/r/75942/#review73916
::: toolkit/components/url-classifier/HashStore.h:296
(Diff revision 1)
> SubCompleteArray mSubCompletes;
>
> uint32_t mFileSize;
> +
> + // For gtest to inspect private members.
> + friend class PerProviderDirectoryTestUtils;
I'm not sure declaring a friend class but then not linking it in (as in non-gtest builds) is legal C++, but I'm not enough versed in language lawyering to tell from the spec :-)
Comment 16•9 years ago
|
||
mozreview-review |
Comment on attachment 8787084 [details]
Bug 1254763 - Part 2: Test cases.
https://reviewboard.mozilla.org/r/75942/#review73918
Attachment #8787084 -
Flags: review?(gpascutto) → review+
Assignee | ||
Comment 17•9 years ago
|
||
Hi gcp,
Could you please help review these two patches?
The first patch is to store V4 databases to per-provider subdirectory. The rule is
0) If the table name does not end with '-proto', store the databases as before.
1) If the table name ends with '-proto', we derive the provider from prefs and
store databases to [root-store-path]/[provider]
2) If the table name ends with '-proto' but not owned by any provider, store the
database in [root-store-path]
The above rule is tested in the second patch, the test case.
The actual approach in the patch is to re-define the original store path to
"root store directory" (in |Classifier|) and add new concept "private store directory"
to |LookupCache| and |HashStore|:
Classifier::mStoreDirectory ==> renamed to Classifier::mRootStoreDirectory
HashStore::mStoreDirectory ==> indicates the "private store directory" now
LookupCache::mStoreDirectory ==> indicates the "private store directory" now
Thanks!
Comment 18•9 years ago
|
||
(In reply to Henry Chang [:henry][:hchang] from comment #17)
> Hi gcp,
>
> Could you please help review these two patches?
I just did?
Assignee | ||
Comment 19•9 years ago
|
||
mozreview-review-reply |
Comment on attachment 8786595 [details]
Bug 1254763 - Part 1: Use per-provider directory for V4 databases.
https://reviewboard.mozilla.org/r/75526/#review73904
I'd go for (b) since (c) requires a long chain bubbling the provider to HashStore/LookupCache. Thanks :)
> This looks like it's just reimplementing NS_DispatchToMainThread(event, NS_DISPATCH_SYNC); or reinventing SyncRunnable.
I have been looking for something like SyncRunnable and there it is! Thanks!
> This will fail with spurious wakeups. You should check the predicate that the Wait() is protecting. (In this case, DeriveProviderFromPref having run)
Thanks the "spurious wakeup" lesson again :) I'll change to using SyncRunnable so the mutex/cond would be entirely swiped out from my code.
Assignee | ||
Comment 20•9 years ago
|
||
mozreview-review-reply |
Comment on attachment 8787084 [details]
Bug 1254763 - Part 2: Test cases.
https://reviewboard.mozilla.org/r/75942/#review73916
Thanks for your quick review!
> I'm not sure declaring a friend class but then not linking it in (as in non-gtest builds) is legal C++, but I'm not enough versed in language lawyering to tell from the spec :-)
It's totally legal in C++ :)
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #18)
> (In reply to Henry Chang [:henry][:hchang] from comment #17)
> > Hi gcp,
> >
> > Could you please help review these two patches?
>
> I just did?
I really appreciate your light speed review! The comment was being written to explain
the patch right before I received your review :p Please ignore it.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 27•9 years ago
|
||
mozreview-review |
Comment on attachment 8786595 [details]
Bug 1254763 - Part 1: Use per-provider directory for V4 databases.
https://reviewboard.mozilla.org/r/75526/#review74878
Attachment #8786595 -
Flags: review?(gpascutto) → review+
Comment 28•9 years ago
|
||
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/63ee473f6386
Part 1: Use per-provider directory for V4 databases. r=dimi,gcp
https://hg.mozilla.org/integration/autoland/rev/66c0022a3037
Part 2: Test cases. r=gcp
Comment 29•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/63ee473f6386
https://hg.mozilla.org/mozilla-central/rev/66c0022a3037
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 30•9 years ago
|
||
Good job! We are one more step close to the completion of milestone 1.
Thanks Henry and gcp!
You need to log in
before you can comment on or make changes to this bug.
Description
•