Closed
Bug 1338970
Opened 8 years ago
Closed 8 years ago
Refine DB update scheme: update in a separate directory and swap in when finished
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: hchang, Assigned: hchang)
References
Details
Attachments
(3 files, 3 obsolete files)
In order to improve the overall DB update blocking time in Bug 1338017, we need to firstly change the DB update scheme to
1) Copy the current DB to a backup directory (we actually already do this)
2) Update against the backup directory (instead of the one in use)
3) Swap the backup and the in-use directory. This is only the operation
where lock is required. After swapping, remove the backup directory.
For some special cases:
1) If we met crash during update, the in-use directory will still be sane.
In other words, if the backup directory is present after bootup, it implies
a crash occurred in the previous update and we can just remove the backup
directory.
Question: backward compatibility. If we were using an old firefox and hitting
update crash, the backup directory is sane but the in-use directory
is be a junk. What if we restart with new firefox under this
configuration?
Possible solution: just use a different name from "safebrowsing-backup" and
tell apart the two cases.
2) anything else?
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → hchang
Assignee | ||
Comment 1•8 years ago
|
||
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Summary: Improve DB update scheme to reduce lock time → Refine DB update scheme: update in a separate directory and swap in when finished
Assignee | ||
Comment 3•8 years ago
|
||
Hi Francois,
I haven't tested this patch but it should be not far from what I imagined.
Please refer to the attached diagram and the patch for a early sense that
what I am going to do.
There is totally no worries about the DB concurrent read/write issue since
this bug will not change the thread that the any Classifier function will
be running on. Moving update code to a separate thread will be done in
other bug.
I'll keep refining my patch but the basic idea will not change. You can
take a quick look at the diagram to make sure if we are on the same page.
Thanks.
Flags: needinfo?(francois)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 5•8 years ago
|
||
Attachment #8836713 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8836716 -
Flags: review?(gpascutto)
Attachment #8836716 -
Flags: review?(francois)
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(francois)
Comment 7•8 years ago
|
||
mozreview-review |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review113636
I think the way this is written potentially leads to total data loss as you're missing at least one recovery scenario.
::: toolkit/components/url-classifier/Classifier.cpp:569
(Diff revision 3)
> +
> + LOG(("Swapping directories %s and %s...", dirName1.get(),
> + dirName2.get()));
> +
> + // 1. Rename "dirName1" to "temp"
> + rv = aDir1->RenameToNative(nullptr, TEMP_DIR_NAME);
What happens if we crash right after this operation? We've just moved mRootStoreDirectory away, and we haven't moved mUpdatingDirectory in its place.
The recovery code only tries to delete an incomplete updating directory.
At this point we don't know where our valid data is so we just lost everything.
::: toolkit/components/url-classifier/Classifier.cpp:573
(Diff revision 3)
> + // 1. Rename "dirName1" to "temp"
> + rv = aDir1->RenameToNative(nullptr, TEMP_DIR_NAME);
> + if (NS_FAILED(rv)) {
> + LOG(("Unable to rename %s to %s", dirName1.get(),
> + TEMP_DIR_NAME.get()));
> + return rv; // Nothting to roll back.
typo: Nothing
::: toolkit/components/url-classifier/Classifier.cpp:583
(Diff revision 3)
> + // object points to.
> + nsCOMPtr<nsIFile> tempDirectory;
> + rv = aParentDir->Clone(getter_AddRefs(tempDirectory));
> + rv = tempDirectory->AppendNative(TEMP_DIR_NAME);
> +
> + // 2. Rename "dirName2" to "dirName2".
typo: dirName1
Attachment #8836716 -
Flags: review?(gpascutto) → review-
Assignee | ||
Comment 8•8 years ago
|
||
I think the patch has been ready for review. The goal of the patch is described in comment 0.
As for some implementation detail:
1) Most of the work is done in Classifier, including
- Introduce mUpdatingDirectory (i.e. "safebrowsing-updating" for on-going update)
- Remove the use of mToDeleteDirectory/mBackupDirectory but still do clean and recover
in the very beginning just for backward compatibility.
- Implement |SwapDirectoryContent|. It will be used in |SwapInUpdatedTables|.
- Get rid of "2-step deletion". In the past, when we remove backup directory, we firstly
rename it to "safebrowsing-to_delete" then do the actual removal. I don't see any
necessity.
2) The overall flow is depicted in the attached diagram. The text version is
- In Classifier::OpenDb(), recover "safebrowsing-backup" (if existed),
remove "safebrowsing-to_delete" and "safebrowsing-updating"
- In Classifier::ApplyUpdate, we
a) Copy "safebrowsing" to "safebrowsing-updating",
b) Do in-memory update
c) Write out to "safebrowsing-updating"
d) If (b) or (c) failed, dump "safebrowsing" to "safebrowsing-failedupdate",
remove "safebrowsing-updating" and finish update.
e) Swap directory "safebrowsing" and "safebrowing-updating". After swap,
"safebrowsing" will contain the updated DB and "safebrowsing-updating"
will contain outdated DB.
f) Remove "safebrowsing-updating"
3) LookupCache::WriteFile, LookupCacheV4::WriteMetadat and HashStore::WriteFile
will take an additional argument for "write out location", which was the same
one for read. Note that we have to infer the actual write location by using
|Classifier::GetPrivateStoreDirectory|.
All the changes will not change the threading. I.e, everything will still run
on the worker thread until Bug 1339050.
Updated•8 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 9•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review113636
I've addressed the case which you pointed out and can't think of others. All possible snapshots for a crash are
0) nothing: nothing to do.
1) "safebrowsing": Perfect!
2) "safebrowsing-updating" (new) and "safebrowsing" (old): new DB may have been downloaded but not active. Acceptable.
3) "safebrowsing-backup" (new) and "safebrowsing" (old): The case you pointed out. (assume we rename "updating" to "backup" first)
The new DB will be recovered after reboot.
4) "safebrowsing-backup" (new) and "safebrowsing-updating" (old): The new DB will be recovered after reboot.
5) "safebrowsing" (new) and "safebrowsing-updating" (old): Perfect!
New patch with a little change around "SwapInUpdatedTables" will be submitted later.
I forgot to indicate the "in-memory" tables should be also swapped atomically with in-disk tables.
But it will still be fine because we haven't introduce concurrent lookup and update.
They still both on the same thread in this patch.
> What happens if we crash right after this operation? We've just moved mRootStoreDirectory away, and we haven't moved mUpdatingDirectory in its place.
>
> The recovery code only tries to delete an incomplete updating directory.
>
> At this point we don't know where our valid data is so we just lost everything.
My solution is using "safebrowsing-backup" as the intermediary of swapping. If we crash after renaming "safebrowsing" to "safebrowsing-backup", we will recover the databases from "safebrowsing-backup". (I kept the recovery code for backward
compatibility at first but it turned out also effective in this case.)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 11•8 years ago
|
||
I just greatly simplified the entire design to make the changes less error-prone:
Background:
Classifier maintains a list of LookupCache, which is the only interface to
read/write DB from/into disk and update in-memory data. In theory we don't
need such a list: just create a new one whenever needed. However, creating
a LookupCache requires opening files and data parsing. So we maintain
a list of created LookupCache to avoid re-creation.
Original solution (in the current patch):
In order to have a "background-y" update, I modified LookupCache::WriteFile()
so that we can assign a "special" directory to output files. Besides, I have to
also do something like in Bug 1340179 to deal with the in-memory update issue.
Simplified solution:
Bug 1340179 is fairly simple but really annoying. We have to take care of
every implementation detail. For example, we need to know what member variables
of LookupCache will be mutated while doing an update.
The idea behind the simplification is we don't have to stick to the current
maintained list of LookupCache. Analogue to writing file to a "temp" directory,
we can just allocate new LookupCaches for update only. Writing file and updating
in-memory data will not interfere the in-use ones.
Assume we originally had a list of LookupCaches like
[goog-phish-shavar, goog-unwanted-shavar]
and we've just downloaded something in memory for update. In addition to
copying "safebrowsing" directory to "safebrowsing-updating", we "conceptually"
clone the currently maintained LookupCache list. Since we haven't implemented
deep copy function or copy constructor for LookupCache, "re-open" is required
right now.
So after update, we have
in-use but old list |mLookupCache| => [goog-phish-shavar, goog-unwanted-shavar]
new list |mNewLookupCache| => [goog-phish-shavar, goog-unwanted-shavar, goog-malware-shavar]
and can then simply swap these two lists.
Assignee | ||
Comment 12•8 years ago
|
||
Attachment #8837079 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Assignee | ||
Comment 14•8 years ago
|
||
The latest diagram attachment 8838502 [details] and patch revision 5 are for the new design in comment 11.
Assignee | ||
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
Attachment #8839096 -
Attachment is obsolete: true
Comment 17•8 years ago
|
||
mozreview-review |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review115406
::: toolkit/components/url-classifier/Classifier.cpp:620
(Diff revision 5)
> +void
> +Classifier::RemoveUpdateIntermediaries()
> +{
> + // Remove old LookupCaches.
> + for (auto c: mNewLookupCaches) {
> + delete c;
Maybe it's possible to add a debug assert somewhere, checking that if we are not in an update, this array is indeed cleared. Right now it seems all paths are properly covered, but manual memory management is something that can easily regress.
::: toolkit/components/url-classifier/Classifier.cpp:747
(Diff revision 5)
>
> } // End of scopedUpdatesClearer scope.
>
> - rv = RegenActiveTables();
> - NS_ENSURE_SUCCESS(rv, rv);
> -
> + rv = SwapInNewTablesAndCleanup();
> + if (NS_FAILED(rv)) {
> + LOG(("Failed to swap in new tables."));
Does it make sense to propagate the error here? If we can't update the tables, does it make sense to do a full reset? (We don't ever want to get stuck with non-updatable, old tables)
Attachment #8836716 -
Flags: review?(gpascutto) → review+
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review115500
Looks good, just a few suggestions to make the code clearer.
::: toolkit/components/url-classifier/Classifier.cpp:588
(Diff revision 5)
> + rv = aDir2->RenameToNative(nullptr, dirName1);
> + if (NS_FAILED(rv)) {
> + LOG(("Failed to rename %s to %s. Rename temp directory back to %s",
> + dirName2.get(), dirName1.get(), dirName1.get()));
> + nsresult rbrv = tempDirectory->RenameToNative(nullptr, dirName1);
> + if (NS_FAILED(rbrv)) {
Given that we can't recover from this failure, should we do like in Step 3 and only have `NS_ENSURE_SUCCESS(rbrv, rbrv)` here?
::: toolkit/components/url-classifier/Classifier.cpp:626
(Diff revision 5)
> + }
> + mNewLookupCaches.Clear();
> +
> + // Remove the "old" directory. (despite its looking-new name)
> + if (NS_FAILED(mUpdatingDirectory->Remove(true))) {
> + LOG(("Failed to remove updating directory but we don't care."));
Why do we not care?
Maybe you can have "Failed to remove updating directory." in the LOG and then a comment above that explains why we ignore that error.
::: toolkit/components/url-classifier/Classifier.cpp:653
(Diff revision 5)
> +
> + // Step 2. Swap in in-memory tables.
> + mLookupCaches.SwapElements(mNewLookupCaches);
> +
> + // Step 3. Re-generate active tables based on on-disk tables.
> + // XXX: If step 2 failed, should we re-generate?
(Answering the question you ask) Is there an alternative? If we don't have a working lookup cache, Safe Browsing won't work, no?
We should answer the question before landing and remove that comment :)
::: toolkit/components/url-classifier/Classifier.cpp:997
(Diff revision 5)
> }
>
> -nsresult
> -Classifier::RemoveBackupTables()
> +void
> +Classifier::CopyInUseLookupCacheForUpdate()
> {
> - nsCString toDeleteName;
> + // TODO: Implement deep copy function or copy ctor. For now
Is this something we should open a follow-up bug for?
::: toolkit/components/url-classifier/Classifier.cpp:998
(Diff revision 5)
>
> -nsresult
> -Classifier::RemoveBackupTables()
> +void
> +Classifier::CopyInUseLookupCacheForUpdate()
> {
> - nsCString toDeleteName;
> - nsresult rv = mToDeleteDirectory->GetNativeLeafName(toDeleteName);
> + // TODO: Implement deep copy function or copy ctor. For now
> + // we just re-open every table we have opened. Another option
Can you clarify what "re-opening every table we have opened" does?
Unless I'm misunderstanding your code, I think what this does is: "Populate the new lookup cache based on the tables in the updating directory by loading them from disk."
Attachment #8836716 -
Flags: review?(francois) → review+
Assignee | ||
Comment 19•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review115406
Thanks!
> Maybe it's possible to add a debug assert somewhere, checking that if we are not in an update, this array is indeed cleared. Right now it seems all paths are properly covered, but manual memory management is something that can easily regress.
I might also have to add debug assert for intermediary directory.
> Does it make sense to propagate the error here? If we can't update the tables, does it make sense to do a full reset? (We don't ever want to get stuck with non-updatable, old tables)
It makes total sense. Thanks for reminding :)
Assignee | ||
Comment 20•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8836716 [details]
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in)
https://reviewboard.mozilla.org/r/112070/#review115500
Thanks, too!
> Given that we can't recover from this failure, should we do like in Step 3 and only have `NS_ENSURE_SUCCESS(rbrv, rbrv)` here?
Yes I think we should :)
> Why do we not care?
>
> Maybe you can have "Failed to remove updating directory." in the LOG and then a comment above that explains why we ignore that error.
Because it's just a clean up and the failure will not cause anything to misbahave until the next udpate. I will add it to the comment then.
> (Answering the question you ask) Is there an alternative? If we don't have a working lookup cache, Safe Browsing won't work, no?
>
> We should answer the question before landing and remove that comment :)
The question should no longer be a question because step 2 is infalliable and the step 3 will not be reached if step 1 failed.
To sum up, if we failed to swap in update intermediaries, every thing in-use should remain untouched (since we just didn't touch the in-use stuff.)
> Is this something we should open a follow-up bug for?
Sure and I'll not add dependency here because it's a kind of optimization.
> Can you clarify what "re-opening every table we have opened" does?
>
> Unless I'm misunderstanding your code, I think what this does is: "Populate the new lookup cache based on the tables in the updating directory by loading them from disk."
In this stage, the update hasn't started and we'd only like to "mirror" the already opended LookupCaches for update. (just like we copy "safebrowsing" to "safebrowsing-updating" for update.)
Assignee | ||
Updated•8 years ago
|
Target Milestone: --- → mozilla53
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 23•8 years ago
|
||
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2c1fc455e81e
Introduce 2-step DB update scheme. (background update then swap in) r=francois,gcp
Comment 24•8 years ago
|
||
bugherder |
Updated•8 years ago
|
Target Milestone: mozilla53 → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•