Closed
Bug 1254766
Opened 9 years ago
Closed 9 years ago
Stop caching Safe Browsing completions to disk
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla51
| Tracking | Status | |
|---|---|---|
| firefox51 | --- | fixed |
People
(Reporter: francois, Assigned: dimi)
References
Details
(Whiteboard: #sbv4-m0)
Attachments
(1 file, 20 obsolete files)
|
63.95 KB,
patch
|
dimi
:
review+
|
Details | Diff | Splinter Review |
Chrome only caches completions in memory for the current browsing session. We should do the same to reduce the disk I/O we have to do.
gcp also made the following comment:
"Right now the completes are in .sbtore and .cache. This was a mistake, which hurt for example when I tried fixing https://bugzilla.mozilla.org/show_bug.cgi?id=1164518#c50"
In bug 1164518, I discovered that the on-disk completion probably isn't working anyways because it would have made reproducing the bug a lot harder (i.e. I would have had to copy the safebrowsing cache prior to doing the gethash calls and then overwrite the updated files with the backup afterwards).
| Reporter | ||
Updated•9 years ago
|
Summary: Stop caching completions on disk → Stop caching Safe Browsing completions to disk
Comment 1•9 years ago
|
||
See bug 806422 for the explanation that this is allowed in SafeBrowsing v2 (v2.2->v2.3).
| Assignee | ||
Updated•9 years ago
|
Assignee: nobody → dlee
| Reporter | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 4•9 years ago
|
||
Just an update that I am working on this right now. But not really writing patch, currently still trying to figure out the .cache, .pset, .sbstore and all the logic about caching prefix, completions...
| Assignee | ||
Comment 5•9 years ago
|
||
| Assignee | ||
Comment 6•9 years ago
|
||
Hi Francouis,
One question, do we want to remove those .cache files for the old version or just leave them there ?
Flags: needinfo?(francois)
| Assignee | ||
Comment 7•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #6)
> Hi Francouis,
> One question, do we want to remove those .cache files for the old version or
> just leave them there ?
This comment is bad, sorry, i'll restate what i would like to ask.
I think we should remove .cache files, but it will make gecko always checks if there is .cache file when startup and keeps this behavior until someday we decide we can remove this code(or we won't).
So maybe keep .cache files is not really a bad idea since they are not very large and the size will not grow because we will stop using them.
so i would like to know if you have any opinion on this ? thanks!
Comment 8•9 years ago
|
||
Whatever behavior you program, remember that people can switch between Nightly/Aurora/Beta, and it would probably be nice if that didn't wipe their DB each time. (You may be lucky here because IIRC missing .cache files are not considered corrupted profiles.)
| Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #8)
> Whatever behavior you program, remember that people can switch between
> Nightly/Aurora/Beta, and it would probably be nice if that didn't wipe their
> DB each time. (You may be lucky here because IIRC missing .cache files are
> not considered corrupted profiles.)
Thanks for the information :) , then i will not remove .cache files in this bug to avoid wiping db files for other builds as gcp suggested.
| Reporter | ||
Comment 10•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #9)
> Thanks for the information :) , then i will not remove .cache files in this
> bug to avoid wiping db files for other builds as gcp suggested.
Sounds good to me too. These files are small and cleaning them up would be tricky. Better to leave them alone for now.
Flags: needinfo?(francois)
| Assignee | ||
Comment 11•9 years ago
|
||
Hi gcp, francois,
I think i am a little uncertain about what we are trying to achieve here.
Following is my understanding, it could be totally wrong, please correct me if anything wrong, thanks!
The purpose of this bug is to stop caching completions to disk.
In current v2 implementation, completions may come from update[1] or gethash[2]. In both cases completions
are cached in |LookupCache::mCompletions| and eventually will be store to .cache and .sbstore.
We could stop writing completions to .cache files but i guess we will still need to write completions
to .sbstore and read it back to |mCompletions| when startup since it may contain completions from update.
If this is the way we want to do this, I am confused about what we achieve here because we will still write completions to disk and read it back even from a more expensive file, ".sbstore", compare to ".cache".
Another question is when i think about the implementation of v4.
v3 spec[3] says "Full-length hashes obtained in shavar add chunks must also be verified via a full-length
hash request prior to showing a warning", and v4 specs talks about caching, the "caching" in spec stands for response from GetEncodedFullHashes. So i think we will need to separate completions between update and gethash in v4 implementation.
If we separate completions from udpate and gethash, then we could just store update completions to disk
and gethash completions will only be cached in memory. But this leave me a question is that if we get the
completions and restart firefox immediately, the completions will be lost because it doesn't write to disk. It seems fine but according to v4 spec "caching", there are positive cache duration and negative cache duration, if we don't store cache to disk, then we will not be able to follow the duration rule after restart, is this correct behavior ?
[1]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/ProtocolParser.cpp#609
[2]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#697
[3]https://developers.google.com/safe-browsing/developers_guide_v3#Overview
Flags: needinfo?(gpascutto)
Flags: needinfo?(francois)
Comment 12•9 years ago
|
||
>We could stop writing completions to .cache files but i guess we will still need to write completions
to .sbstore and read it back to |mCompletions| when startup since it may contain completions from update.
Yes, In fact, TP is *all* completions.
I think what we don't want to do is cache gethash completes.
>So i think we will need to separate completions between update and gethash in v4 implementation.
Exactly, and maybe for v2 too (if we want to fix this bug!).
>But this leave me a question is that if we get the
completions and restart firefox immediately, the completions will be lost because it doesn't write to disk. It seems fine but according to v4 spec "caching", there are positive cache duration and negative cache duration, if we don't store cache to disk, then we will not be able to follow the duration rule after restart, is this correct behavior ?
You send out the HTTP GET to the Google server, and while the request is in-flight I cut your network cable. Now nothing that respects the laws of physics will prevent you from breaking spec :-)
More seriously, I'd verify what the current Chromium implementation is doing and/or have Francois ask the Chrome guys about it. I agree respecting cache durations means you'd have to store them on disk with associated cache data, so maybe the "don't store on disk" was specific to v2.
Flags: needinfo?(gpascutto)
| Reporter | ||
Comment 13•9 years ago
|
||
Like gcp said, we only want to stop caching gethash completions to disk. So as Dimi points out, that means splitting completions up so that we keep the ones that come from update.
We should do this in V2 so that we get the same behaviour on V4.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)
> More seriously, I'd verify what the current Chromium implementation is doing
> and/or have Francois ask the Chrome guys about it. I agree respecting cache
> durations means you'd have to store them on disk with associated cache data,
> so maybe the "don't store on disk" was specific to v2.
Google told me that Chrome only caches gethash completions in memory. They don't care about clients asking for the same completions again in the case of a browser restart.
I would expect they will have the same policy in V4 since completions will be even rarer in V4.
Flags: needinfo?(francois)
| Assignee | ||
Comment 14•9 years ago
|
||
thanks! gcp, francois,
That is really helpful, I know exactly what i should do next now :)
| Assignee | ||
Comment 15•9 years ago
|
||
One question,
For completion cache, v2 spec doesn't mention that we should clear cache when update and since v3 spec says "Clients must clear cached full-length hashes each time they send an update request." in "Changes since version 2.2" paragraph, so i think what we are doing right now - not clear cache when update, doesn't violate v2 spec.
As for v4, i didn't find any part of spec talking about this(only cache duration) but i assume it remains the same as v3.
Since we will separate gethash and update completion in this bug, so i propose that we could also clear gethash completion when update in current implementation, the benefits are:
1.Reduce the complexity when an update occurs, otherwise we will have to deal with things like check if we need to remove completion in cache if update contains expiration, sub complete...
2.More easier to migrate to v4 implementation.
This change won't affect the correctness of sb, it only affects how often we send a gethash request.
gcp, francois, how do you think ?
Comment 16•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #15)
> One question,
> For completion cache, v2 spec doesn't mention that we should clear cache
> when update
See:
https://bugzilla.mozilla.org/show_bug.cgi?id=806422
This was a change requested by Google that they never documented.
> so i think what we are doing right now - not clear cache
> when update, doesn't violate v2 spec.
We do clear the gethash cache on updates. See the bug above for where the relevant code lives.
| Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #16)
> We do clear the gethash cache on updates. See the bug above for where the
> relevant code lives.
Thanks!
One more question here, for the ClearCompletes[1] code here, we won't clear cache if there is a complete.
So if an update contains completion, we will not remove gethash completions in cache.
Do we still want to keep this behavior ? Or we should always clear gethash completions in cache when there is an update.
[1]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#652
Comment 18•9 years ago
|
||
The existing code doesn't know the difference between Completions coming from a "real" update from the server containing Completions in addition to Prefixes, and an "update" constructed from the result of a gethash request here:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp?from=nsUrlClassifierDBService.cpp#697
That is, if it has completions, it doesn't know where they came from.
The logic there makes the assumption that a real update would consist of a mix of Completions and Prefixes, so that in that case, we'd not want to drop the Completions from the update (immediately).
I suspect that logic may also be required in order to pass the tests or to make TP work, as they do use Completions in updates.
If you split the Completions in two classes then you can make the behavior more correct.
| Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #18)
> The existing code doesn't know the difference between Completions coming
> from a "real" update from the server containing Completions in addition to
> Prefixes, and an "update" constructed from the result of a gethash request
> here:
>
> That is, if it has completions, it doesn't know where they came from.
>
I saw there is an |mLocalUpdate| flag[1] seems could distinguish completion from update or from gethash, why don't we use that ? or that flag doesn't work as i thought.
I am going to remove that in this bug, just want to make sure that i understand the logic correctly.
Thanks in advanced that keep answering lots of my questions :)
[1]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#708
Comment 20•9 years ago
|
||
I think the problem is that that flag doesn't tell you where the *existing* Completions were from, just the ones you're receiving now.
Comment 21•9 years ago
|
||
Also note:
a) that flag controls the updates of database liveliness, so you can't outright remove it without also breaking that logic. (I guess if the database update goes away entirely it doesn't matter)
b) If you've have a database with prefixes and completes (not from gethash), and would receive an update consisting of only prefixes, we'd wipe the completes part of the DB. But no such databases/server behavior exists in the wild that we've seen, which is why the current code still manages to get by.
| Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #21)
> Also note:
>
> a) that flag controls the updates of database liveliness, so you can't
> outright remove it without also breaking that logic. (I guess if the
> database update goes away entirely it doesn't matter)
Got it, thanks!
| Assignee | ||
Updated•9 years ago
|
Attachment #8745251 -
Attachment is obsolete: true
| Assignee | ||
Comment 23•9 years ago
|
||
| Assignee | ||
Comment 24•9 years ago
|
||
| Assignee | ||
Comment 25•9 years ago
|
||
| Assignee | ||
Comment 26•9 years ago
|
||
| Assignee | ||
Comment 27•9 years ago
|
||
| Assignee | ||
Comment 28•9 years ago
|
||
| Assignee | ||
Comment 29•9 years ago
|
||
just an update that i am working on writing testcase(mochitest) right now.
Following is what i am going to test:
1. completions from update should work
2. completions from gethash should work
3. completions retrieved from update should still exist after restart.(Not sure how to simulate restart)
4. completions retrieved from gethash should not exist after restart
5. an update will clear completions comes from gethash
6. an update will not clear completions comes from previous update
please let me know if anything else should be test in this bug.
| Reporter | ||
Comment 30•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #29)
> Following is what i am going to test:
> 1. completions from update should work
> 2. completions from gethash should work
> 3. completions retrieved from update should still exist after restart.(Not
> sure how to simulate restart)
> 4. completions retrieved from gethash should not exist after restart
> 5. an update will clear completions comes from gethash
> 6. an update will not clear completions comes from previous update
That looks both right and complete to me.
I'm not sure either how we can simulate a restart though.
| Assignee | ||
Comment 31•9 years ago
|
||
It seems we disable gethash from "test-xxx" table in many places[1][2], we always assume test database should only contain complete hash.
But I would like to trigger gethash request in the testcase, do you have any suggestion ?
Do we still want to block gethash for "test-" db ?
[1]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1630
[2]https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#866
Flags: needinfo?(francois)
Comment 32•9 years ago
|
||
Maybe it makes more sense to (only) block gethashes off of the "disallow_completions" list in the prefs? You can then simply unset/reset that pref in the relevant test.
| Reporter | ||
Comment 33•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #32)
> Maybe it makes more sense to (only) block gethashes off of the
> "disallow_completions" list in the prefs? You can then simply unset/reset
> that pref in the relevant test.
That sounds like a good solution. We always add the test lists to urlclassifier.disallow_completions anyways.
Flags: needinfo?(francois)
| Assignee | ||
Comment 34•9 years ago
|
||
Thanks for suggestion!
I open bug 1272239 to remove those code that forbid "test-" list to trigger completions. Then we can use "disallow_completions" in testcase as suggested.
| Assignee | ||
Comment 35•9 years ago
|
||
.cache files will only store completions from update
Attachment #8748997 -
Attachment is obsolete: true
Attachment #8748999 -
Attachment is obsolete: true
Attachment #8749001 -
Attachment is obsolete: true
Attachment #8749002 -
Attachment is obsolete: true
Attachment #8749003 -
Attachment is obsolete: true
Attachment #8749004 -
Attachment is obsolete: true
Attachment #8764811 -
Flags: review?(francois)
| Assignee | ||
Comment 36•9 years ago
|
||
Attachment #8764812 -
Flags: review?(francois)
| Assignee | ||
Comment 37•9 years ago
|
||
Attachment #8764813 -
Flags: review?(francois)
| Assignee | ||
Comment 38•9 years ago
|
||
Attachment #8764814 -
Flags: review?(francois)
| Assignee | ||
Comment 39•9 years ago
|
||
Attachment #8764815 -
Flags: review?(francois)
| Assignee | ||
Comment 40•9 years ago
|
||
Attachment #8764816 -
Flags: review?(francois)
| Assignee | ||
Comment 41•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #40)
> Created attachment 8764816 [details] [diff] [review]
> P6. Testcase for stoping caching Safe Browsing completions to disk
Hi Francois,
I don't really have a good way to test restart, so in this patch i add an API to "simulate" the behavior we will do when restart. Please let me know if you have any suggestion, thanks!
| Reporter | ||
Comment 42•9 years ago
|
||
Comment on attachment 8764811 [details] [diff] [review]
P1. Do not store completions from gethash to .cache files
Review of attachment 8764811 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/url-classifier/LookupCache.cpp
@@ +18,5 @@
> // the updates from the protocol.
>
> // This module has its own store, which stores the Completions,
> +// mCache caching lookups that have happened over the net.
> +// Full length hashes retriving by updating are stored in
nit: "retrieved"
@@ +19,5 @@
>
> // This module has its own store, which stores the Completions,
> +// mCache caching lookups that have happened over the net.
> +// Full length hashes retriving by updating are stored in
> +// mUpateCompletions. The prefixes are cached/checked by looking
typo: missing "d" in mUpdateCompletions
Attachment #8764811 -
Flags: review?(gpascutto)
Attachment #8764811 -
Flags: review?(francois)
Attachment #8764811 -
Flags: feedback+
| Reporter | ||
Comment 43•9 years ago
|
||
Comment on attachment 8764812 [details] [diff] [review]
P2. Separate code path for gethash and update
Review of attachment 8764812 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/url-classifier/Classifier.cpp
@@ +697,5 @@
> + const nsACString& aTable)
> +{
> + LOG(("Classifier::UpdateCache(%s)", PromiseFlatCString(aTable).get()));
> +
> + // TODO : Implement it
I see that you've implemented this in Patch 5 so I think it makes sense to combine these two. There's not much point in splitting these patches if it leaves incomplete functions around.
Attachment #8764812 -
Flags: review?(francois)
| Reporter | ||
Updated•9 years ago
|
Attachment #8764815 -
Flags: review?(francois)
| Reporter | ||
Updated•9 years ago
|
Attachment #8764813 -
Flags: review?(francois)
| Reporter | ||
Updated•9 years ago
|
Attachment #8764814 -
Flags: review?(francois)
| Reporter | ||
Comment 44•9 years ago
|
||
Comment on attachment 8764811 [details] [diff] [review]
P1. Do not store completions from gethash to .cache files
Review of attachment 8764811 [details] [diff] [review]:
-----------------------------------------------------------------
Patches 1-5 are not very big and also aren't exactly stand-alone. I think they would be easier to read as a single patch.
Once you've merged them, I think gcp would be the best person to review them since he's a lot more familiar with this code than I am .
Attachment #8764811 -
Flags: review?(gpascutto)
| Reporter | ||
Comment 45•9 years ago
|
||
Comment on attachment 8764816 [details] [diff] [review]
Testcase for stoping caching Safe Browsing completions to disk
Review of attachment 8764816 [details] [diff] [review]:
-----------------------------------------------------------------
(In reply to Dimi Lee[:dimi][:dlee] from comment #41)
> I don't really have a good way to test restart, so in this patch i add an
> API to "simulate" the behavior we will do when restart. Please let me know
> if you have any suggestion, thanks!
I think you were going to try Dimi's suggestion (watching for changes to the nextupdatetime pref), right?
Attachment #8764816 -
Flags: review?(francois)
| Assignee | ||
Comment 46•9 years ago
|
||
(In reply to François Marier [:francois] from comment #45)
> Comment on attachment 8764816 [details] [diff] [review]
> P6. Testcase for stoping caching Safe Browsing completions to disk
>
> Review of attachment 8764816 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> (In reply to Dimi Lee[:dimi][:dlee] from comment #41)
> > I don't really have a good way to test restart, so in this patch i add an
> > API to "simulate" the behavior we will do when restart. Please let me know
> > if you have any suggestion, thanks!
>
> I think you were going to try Dimi's suggestion (watching for changes to the
> nextupdatetime pref), right?
Not this one, that is bug 1281083
| Assignee | ||
Comment 47•9 years ago
|
||
Hi gcp,
Could you help review this patch ? Thanks!
This patch merges patch P1-P5 according to francois's suggestion.
Attachment #8770117 -
Flags: review?(gpascutto)
| Assignee | ||
Updated•9 years ago
|
Attachment #8764816 -
Attachment description: P6. Testcase for stoping caching Safe Browsing completions to disk → Testcase for stoping caching Safe Browsing completions to disk
Attachment #8764816 -
Flags: review?(gpascutto)
Comment 48•9 years ago
|
||
Comment on attachment 8770117 [details] [diff] [review]
Merged patch for stop caching Safe Browsing completions to disk.
Review of attachment 8770117 [details] [diff] [review]:
-----------------------------------------------------------------
I think there's a strong argument to be made that LookupCache shouldn't operate on the .cache files at all any more (and they should effectively be ignored and eventually disappear from new Firefox profiles), but just read in the mUpdateCompletions from .sbstore.
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/HashStore.cpp#87
I do regret not putting all the fixed-length data *before* the variable size, ByteSlice/zlib stuff, which would have made that trivial. Now you'd have to read in the lengths of the compressed chunks and skip over them, rather than being able to seek to the right place in that file directly. But that may still be preferable than spreading the IO over LookupCache.cpp and HashStore.cpp, and duplicating the data.
I can live with not doing this but think about it and let me know your opinion.
::: toolkit/components/url-classifier/Classifier.cpp
@@ +350,5 @@
> + LOG(("Applying %d table gethashes.", aUpdates->Length()));
> +
> + for (uint32_t i = 0; i < aUpdates->Length(); i++) {
> + nsCString updateTable(aUpdates->ElementAt(i)->TableName());
> + nsresult rv = UpdateCache(aUpdates, updateTable);
updateTable applies to aUpdates[i], so passing aUpdates rather than aUpdates[i] looks wrong. I think the callee is needlessly looping over the array again trying to find updates that match the passed table name.
Let's rework this to just pass aUpdates[i] directly?
@@ +701,5 @@
> +{
> + LOG(("Classifier::UpdateCache(%s)", PromiseFlatCString(aTable).get()));
> +
> + if (NS_FAILED(CheckValidUpdate(aUpdates, aTable))) {
> + return NS_OK;
This if (NS_FAILED) return NS_OK looks pretty ugly. I'd suggest making CheckValidUpdate return a boolean instead.
::: toolkit/components/url-classifier/LookupCache.cpp
@@ +18,5 @@
> // the updates from the protocol.
>
> // This module has its own store, which stores the Completions,
> +// mCache caching lookups that have happened over the net.
> +// Full length hashes retriving by updating are stored in
..retrieved..
@@ +43,5 @@
> namespace mozilla {
> namespace safebrowsing {
>
> const uint32_t LOOKUPCACHE_MAGIC = 0x1231af3e;
> +const uint32_t CURRENT_VERSION = 3;
This is going to blow away the SafeBrowsing data of every Firefox user and force a redownload. And it will happen every time if they have for example Release and Dev Edition and switch between the two.
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/LookupCache.cpp#327
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/LookupCache.cpp#138
It's not clear to me what it's trying to accomplish, too. The on-disk format didn't change?
There might be a concern that there are fullhash-completion-Completes in the .cache file, which we now no longer will want there. But the update logic in Classifier should get rid of these after the first remote update, as the .cache file is rebuilt from the contents of .sbstore, which won't have those.
@@ +172,5 @@
> + if (mCache.BinaryIndexOf(aAddCompletes[i].CompleteHash()) == mCache.NoIndex) {
> + mCache.AppendElement(aAddCompletes[i].CompleteHash());
> + }
> + }
> + mCache.Sort();
I guess as soon as there's a few aAddCompletes this is indeed faster than InsertElementSorted.
Attachment #8770117 -
Flags: review?(gpascutto) → review-
Comment 49•9 years ago
|
||
Comment on attachment 8764816 [details] [diff] [review]
Testcase for stoping caching Safe Browsing completions to disk
Review of attachment 8764816 [details] [diff] [review]:
-----------------------------------------------------------------
I don't seem to have toolkit/components/url-classifier/tests/mochitest/gethash.sjs
LGTM
::: toolkit/components/url-classifier/Classifier.cpp
@@ +381,5 @@
> +Classifier::GetLastUpdateTime(const nsACString& aTableName)
> +{
> + int64_t age;
> + bool found = mTableFreshness.Get(aTableName, &age);
> + return found ? age * PR_MSEC_PER_SEC: 0;
nit: add a space after PR_MSEC_PER_SEC
Maybe parentheses around age * PR_MSEC_PER_SEC for readability?
::: toolkit/components/url-classifier/tests/mochitest/gethash.sjs
@@ +49,3 @@
>
> + var counter = getState("counter");
> + counter = counter == "" ? "1" : (parseInt(counter) + 1).toString();
Can't getState/setState store integers?
::: toolkit/components/url-classifier/tests/mochitest/test_bug1254766.html
@@ +199,5 @@
> + ok(gCurGethashCounter > gPreGethashCounter, "Gethash request is triggered."); })
> + .then(reset);
> +}
> +
> +// This testcae is to make sure completions in update works:
nit: testcase
Attachment #8764816 -
Flags: review?(gpascutto) → review+
| Assignee | ||
Comment 50•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #49)
> Comment on attachment 8764816 [details] [diff] [review]
> Testcase for stoping caching Safe Browsing completions to disk
>
> Review of attachment 8764816 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I don't seem to have
> toolkit/components/url-classifier/tests/mochitest/gethash.sjs
>
The patch is currently based on bug 1272239 which is not yet landed.
| Assignee | ||
Comment 51•9 years ago
|
||
Attachment #8764811 -
Attachment is obsolete: true
Attachment #8764812 -
Attachment is obsolete: true
Attachment #8764813 -
Attachment is obsolete: true
Attachment #8764814 -
Attachment is obsolete: true
Attachment #8764815 -
Attachment is obsolete: true
| Assignee | ||
Updated•9 years ago
|
Attachment #8770117 -
Attachment is obsolete: true
| Assignee | ||
Comment 52•9 years ago
|
||
Thanks for review! This is really helpful!
(In reply to Gian-Carlo Pascutto [:gcp] from comment #48)
> Comment on attachment 8770117 [details] [diff] [review]
> Merged patch for stop caching Safe Browsing completions to disk.
>
> Review of attachment 8770117 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I think there's a strong argument to be made that LookupCache shouldn't
> operate on the .cache files at all any more (and they should effectively be
> ignored and eventually disappear from new Firefox profiles), but just read
> in the mUpdateCompletions from .sbstore.
>
> https://dxr.mozilla.org/mozilla-central/rev/
> 88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/
> HashStore.cpp#87
>
> I do regret not putting all the fixed-length data *before* the variable
> size, ByteSlice/zlib stuff, which would have made that trivial. Now you'd
> have to read in the lengths of the compressed chunks and skip over them,
> rather than being able to seek to the right place in that file directly. But
> that may still be preferable than spreading the IO over LookupCache.cpp and
> HashStore.cpp, and duplicating the data.
>
> I can live with not doing this but think about it and let me know your
> opinion.
>
For my understanding there are two reasons to have .cache and .sbstore originally, please correct me if i am wrong
1. .cache contains completions from gethash and update, .sbstore only have completions from update
2. .cache is not as complex as .sbstore, so it is faster to read all the completions
Since now we don't want keep completions from gethash, so the only reason to keep .cache is for fast access (That's the reason I keep it in the first patch).
But I think you are right, we can read completions directly from .sbstore as long as we don't have to read
complex data structure to get data we want. Then we can get rid of .cache file entirely.
I implement reading fix-sized completions directly from .sbstore by calculating the offset from the end, so we don't have to calculate the size of prefixes. Please help check the patch to see if you agree on this approach, if you are ok with it, then i think we can just keep the .sbstore file format as it is now.
> > const uint32_t LOOKUPCACHE_MAGIC = 0x1231af3e;
> > +const uint32_t CURRENT_VERSION = 3;
>
> This is going to blow away the SafeBrowsing data of every Firefox user and
> force a redownload. And it will happen every time if they have for example
> Release and Dev Edition and switch between the two.
>
> https://dxr.mozilla.org/mozilla-central/rev/
> 88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/
> LookupCache.cpp#327
> https://dxr.mozilla.org/mozilla-central/rev/
> 88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/
> LookupCache.cpp#138
>
> It's not clear to me what it's trying to accomplish, too. The on-disk format
> didn't change?
>
> There might be a concern that there are fullhash-completion-Completes in the
> .cache file, which we now no longer will want there. But the update logic in
> Classifier should get rid of these after the first remote update, as the
> .cache file is rebuilt from the contents of .sbstore, which won't have those.
>
Yes in the beginning I update the version because of there are fullhash-completion-Completes that we no longer want. But since now we are going to remove .cache file entirely so this part will be removed.
| Assignee | ||
Updated•9 years ago
|
Attachment #8770954 -
Flags: review?(gpascutto)
| Assignee | ||
Updated•9 years ago
|
Whiteboard: #sbv4-m0
Comment 53•9 years ago
|
||
Comment on attachment 8770954 [details] [diff] [review]
P1. Stop caching Safe Browsing completions to disk. v1
Review of attachment 8770954 [details] [diff] [review]:
-----------------------------------------------------------------
Doing the read by seeking from the end of the file is a great idea! The code simplifies a whole lot now that we can get rid of the .cache files, so these patches are great. Unfortunately, you'll have to do a bit more work to refactor HashStore::Open() to not do any extra work, else I foresee an extra performance penalty on startup. So r- because of the latter.
::: toolkit/components/url-classifier/LookupCache.cpp
@@ +19,5 @@
> +// This module has its own store, which stores the PrefixSet,
> +// mCache caching lookups that have happened over the net.
> +// Full length hashes retrieved by updating are stored in
> +// mUpateCompletions. The prefixes are cached/checked by looking
> +// them up in the PrefixSet.
Let's reword this, as PrefixSet isn't stored in LookupCache.
// This module provides a front for PrefixSet, mUpdateCompletions,
// and mCache, which together contain everything needed to
// provide a classification as long as the data is up to date.
// PrefixSet stores and provides lookups for 4-byte prefixes.
// mUpdateCompletions contains 32-byte completions which were
// contained in updates. They are retrieved from HashStore/.sbtore
// on startup.
// mGetHashCache contains 32-byte completions which were
// returned from the gethash server. They are not serialized,
// only cached until the next update.
@@ +234,2 @@
>
> + nsresult rv = store.Open();
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/url-classifier/HashStore.cpp#230
That function is now doing too much to be good for you.
a) fileSize should probably be a field in HashStore, so you don't have to open up things again in ReadCompletions just to get the fileSize (again) which you just retrieved in Open().
b) We probably don't care about detection corruption when we just open the file to read Completions (which would be at startup). It's OK to delay detecting corruption (i.e. CheckCheckSum) until we do an update. That avoids reading in the entire database at startup.
c) You don't care about ReadChunkNumbers either.
I suggest splitting HashStore::Open() into Open() and PrepareForUpdate() and moving (b) and (c) into the latter, so they don't get called when you're just setting up LookupCache.
::: toolkit/components/url-classifier/LookupCache.h
@@ +139,5 @@
> RefPtr<nsUrlClassifierPrefixSet> mPrefixSet;
> + // Full length hashes obtained in update request
> + CompletionArray mUpdatedCompletions;
> + // Full length hashes obtained in gethash request
> + CompletionArray mCache;
mGetHashCache?
Attachment #8770954 -
Flags: review?(gpascutto) → review-
| Assignee | ||
Comment 54•9 years ago
|
||
Hi gcp,
Thanks for your suggestion! This patch address comment 53.
But there is one change in this patch I would like to highlight:
Move |ReadCompletions| to private function. Caller now use AddChunk(), AddCompletes() directly, Inside AddChunk/AddCompletes we will read data if data haven't been read before.
This change is because of reading chunks/completes are not in Hash::Open, this can make sure caller will not read empty data.
Attachment #8770954 -
Attachment is obsolete: true
Attachment #8771899 -
Flags: review?(gpascutto)
Comment 55•9 years ago
|
||
Comment on attachment 8771899 [details] [diff] [review]
P1. Stop caching Safe Browsing completions to disk. v2
Review of attachment 8771899 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good to me.
::: toolkit/components/url-classifier/HashStore.cpp
@@ +470,5 @@
> +
> +nsresult
> +HashStore::BeginUpdate()
> +{
> + // Check of the file is corrupted and read the rest of the store
...check whether...
@@ +479,4 @@
> // Close input stream, won't be needed any more and
> // we will rewrite ourselves.
> if (mInputStream) {
> + mFileSize = 0;
I'd rather reset this at the point where we open the file for writing (which is really the place where it'll become 0-sized).
::: toolkit/components/url-classifier/LookupCache.h
@@ +138,4 @@
> // Set of prefixes known to be in the database
> RefPtr<nsUrlClassifierPrefixSet> mPrefixSet;
> + // Full length hashes obtained in update request
> + CompletionArray mUpdatedCompletions;
grammar: mUpdateCompletions (they haven't been updated, rather they belong to/come from an update)
Attachment #8771899 -
Flags: review?(gpascutto) → review+
| Assignee | ||
Comment 56•9 years ago
|
||
Attachment #8771899 -
Attachment is obsolete: true
| Assignee | ||
Comment 57•9 years ago
|
||
Hi gcp,
I have a question about the |ApplyTableUpdates|, currently we loop through entire TableUpdates array and apply update if table name matches[1]. So it means it is possible that there exists more than
one |TableUpdate| object with same table name.
For my understanding it may happen when gethash return multiple results for same table[2], but will this happen when we process update response ? From what i saw the |TableUpdates| will not have two object with same table name[3][4]. So if this is correct, i think i should make some changes to this patch because code path of gethash and update are separate now.
[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#612
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#936
[3] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#528
[4] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/ProtocolParser.cpp#683
Flags: needinfo?(gpascutto)
Comment 58•9 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #57)
> Hi gcp,
> I have a question about the |ApplyTableUpdates|, currently we loop through
> entire TableUpdates array and apply update if table name matches[1]. So it
> means it is possible that there exists more than
> one |TableUpdate| object with same table name.
Or a TableUpdates array contains updates for multiple tables. I do not know if the code can construct it like that, you'd have to check.
> For my understanding it may happen when gethash return multiple results for
> same table[2], but will this happen when we process update response ? From
> what i saw the |TableUpdates| will not have two object with same table
> name[3][4]. So if this is correct, i think i should make some changes to
> this patch because code path of gethash and update are separate now.
For the code you linked, note that there is GetTableUpdate and GetTableUpdateS. Yes, that's horrible naming, sorry!
Flags: needinfo?(gpascutto)
| Reporter | ||
Updated•9 years ago
|
| Assignee | ||
Comment 59•9 years ago
|
||
Attachment #8764816 -
Attachment is obsolete: true
Attachment #8772248 -
Attachment is obsolete: true
Attachment #8776427 -
Flags: review+
| Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 60•9 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/fx-team/rev/482f81651a2e
Stop caching Safe Browsing completions to disk. r=gcp
Keywords: checkin-needed
Comment 61•9 years ago
|
||
backed out for failures like https://treeherder.mozilla.org/logviewer.html#?job_id=10894640&repo=fx-team
Flags: needinfo?(dlee)
Comment 62•9 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/fx-team/rev/3bc5415cc37d
Backed out changeset 482f81651a2e for failing on own test
| Assignee | ||
Comment 63•9 years ago
|
||
Hi francois, gcp.
After digging into the try failure, i found the root cause is that in certain platform, testcase may run earlier than SafeBrowsing.jsm initialization, so everything isn't ready at that moment.
Whether the problem occurs or not depends on timing because we delay init Safebrowsing.jsm for 2secs[1](bug 778855, bug 779008).
So i guess possible solution for this might be
1. Fix bug 779008, put addMozEntries somewhere and remove the setTimeout[1].
2. Split the init of safebrowsing into two part, put addMozEntries in the second and only delay the second one.
3. Do not setTimeout for tests, but i am not sure if this might affect other tests.
4. Set a flag indicate that SafeBrowsing.jsm is initialized. testcase should run until the flag is set. But the problem is where to put this flag, listmanager?dbserver?preference?
Personally I prefer the first two approach, but i would like to know your opinion, do you have any suggestion ? thanks!
[1] https://dxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1180
Flags: needinfo?(gpascutto)
Flags: needinfo?(francois)
Comment 64•9 years ago
|
||
Take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1244259
SafeBrowsing testcases that require the service to be initialized, or the test entries to be present, should probably hold until that event has fired.
Flags: needinfo?(gpascutto)
| Assignee | ||
Comment 65•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #64)
> Take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1244259
>
> SafeBrowsing testcases that require the service to be initialized, or the
> test entries to be present, should probably hold until that event has fired.
Thanks for the info!
Flags: needinfo?(francois)
Flags: needinfo?(dlee)
| Assignee | ||
Comment 66•9 years ago
|
||
Hi gcp,
Could you help review this patch ? Thanks!
Attachment #8778793 -
Flags: review?(gpascutto)
| Assignee | ||
Comment 67•9 years ago
|
||
Comment on attachment 8778793 [details] [diff] [review]
Patch fIx try error
Obsolete because upload the wrong file
Attachment #8778793 -
Attachment is obsolete: true
Attachment #8778793 -
Flags: review?(gpascutto)
| Assignee | ||
Comment 68•9 years ago
|
||
Attachment #8778795 -
Flags: review?(gpascutto)
Comment 69•9 years ago
|
||
Comment on attachment 8778795 [details] [diff] [review]
Path fix try error
Review of attachment 8778795 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/url-classifier/tests/mochitest/classifierCommon.js
@@ +55,5 @@
> sendAsyncMessage("reloadSuccess");
> }
>
> +// SafeBrowsing.jsm is initialized after mozEntries are added.
> +// Add observer to receive "finished" event. If this function
Wording: For the case when this function is called after the event had already been notified...
Attachment #8778795 -
Flags: review?(gpascutto) → review+
| Assignee | ||
Comment 70•9 years ago
|
||
Attachment #8776427 -
Attachment is obsolete: true
Attachment #8778795 -
Attachment is obsolete: true
| Assignee | ||
Comment 71•9 years ago
|
||
| Assignee | ||
Comment 72•9 years ago
|
||
Comment on attachment 8780334 [details] [diff] [review]
Patch - Stop caching Safe Browsing completions to disk
carry r+
Attachment #8780334 -
Flags: review+
| Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
| Reporter | ||
Updated•9 years ago
|
Priority: -- → P2
Comment 73•9 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/fx-team/rev/05e5ceb799a1
Stop caching Safe Browsing completions to disk. r=gcp
Keywords: checkin-needed
Comment 74•9 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in
before you can comment on or make changes to this bug.
Description
•