Closed Bug 1338970 Opened 8 years ago Closed 8 years ago

Refine DB update scheme: update in a separate directory and swap in when finished

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox54

---

fixed

People

(Reporter: hchang, Assigned: hchang)

References

Details

Attachments

(3 files, 3 obsolete files)

Safebrowsing new DB update scheme.png 8 years ago Henry Chang [:hchang] 106.20 KB, image/png		Details
Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) 8 years ago Henry Chang [:hchang] 59 bytes, text/x-review-board-request	francois : review+ gcp : review+	Details
Safebrowsing new DB update scheme.png 8 years ago Henry Chang [:hchang] 116.38 KB, image/png		Details
Safebrowsing new DB update scheme(2).png 8 years ago Henry Chang [:hchang] 107.74 KB, image/png		Details
SafeBrowsing Threading (Bug 1338970) - Breaking down Classifier__ApplyUpdate.svg 8 years ago Henry Chang [:hchang] 5.31 MB, image/svg+xml		Details
SafeBrowsing Threading (Bug 1338970) - Breaking down Classifier__ApplyUpdate.pdf 8 years ago Henry Chang [:hchang] 2.95 MB, application/pdf		Details

Henry Chang [:hchang]

Assignee

Description

•

8 years ago

In order to improve the overall DB update blocking time in Bug 1338017, we need to firstly change the DB update scheme to 1) Copy the current DB to a backup directory (we actually already do this) 2) Update against the backup directory (instead of the one in use) 3) Swap the backup and the in-use directory. This is only the operation where lock is required. After swapping, remove the backup directory. For some special cases: 1) If we met crash during update, the in-use directory will still be sane. In other words, if the backup directory is present after bootup, it implies a crash occurred in the previous update and we can just remove the backup directory. Question: backward compatibility. If we were using an old firefox and hitting update crash, the backup directory is sane but the in-use directory is be a junk. What if we restart with new firefox under this configuration? Possible solution: just use a different name from "safebrowsing-backup" and tell apart the two cases. 2) anything else?

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Blocks: 1338017

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Assignee: nobody → hchang

Henry Chang [:hchang]

Assignee

Comment 1

•

8 years ago

Attached image Safebrowsing new DB update scheme.png (obsolete) — Details

Comment hidden (mozreview-request)

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Summary: Improve DB update scheme to reduce lock time → Refine DB update scheme: update in a separate directory and swap in when finished

Henry Chang [:hchang]

Assignee

Comment 3

•

8 years ago

Hi Francois, I haven't tested this patch but it should be not far from what I imagined. Please refer to the attached diagram and the patch for a early sense that what I am going to do. There is totally no worries about the DB concurrent read/write issue since this bug will not change the thread that the any Classifier function will be running on. Moving update code to a separate thread will be done in other bug. I'll keep refining my patch but the basic idea will not change. You can take a quick look at the diagram to make sure if we are on the same page. Thanks.

Flags: needinfo?(francois)

Comment hidden (mozreview-request)

Henry Chang [:hchang]

Assignee

Comment 5

•

8 years ago

Attached image Safebrowsing new DB update scheme.png (obsolete) — Details

Attachment #8836713 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Attachment #8836716 - Flags: review?(gpascutto)

Attachment #8836716 - Flags: review?(francois)

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Flags: needinfo?(francois)

Gian-Carlo Pascutto [:gcp]

Comment 7

•

8 years ago

mozreview-review

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review113636 I think the way this is written potentially leads to total data loss as you're missing at least one recovery scenario. ::: toolkit/components/url-classifier/Classifier.cpp:569 (Diff revision 3) > + > + LOG(("Swapping directories %s and %s...", dirName1.get(), > + dirName2.get())); > + > + // 1. Rename "dirName1" to "temp" > + rv = aDir1->RenameToNative(nullptr, TEMP_DIR_NAME); What happens if we crash right after this operation? We've just moved mRootStoreDirectory away, and we haven't moved mUpdatingDirectory in its place. The recovery code only tries to delete an incomplete updating directory. At this point we don't know where our valid data is so we just lost everything. ::: toolkit/components/url-classifier/Classifier.cpp:573 (Diff revision 3) > + // 1. Rename "dirName1" to "temp" > + rv = aDir1->RenameToNative(nullptr, TEMP_DIR_NAME); > + if (NS_FAILED(rv)) { > + LOG(("Unable to rename %s to %s", dirName1.get(), > + TEMP_DIR_NAME.get())); > + return rv; // Nothting to roll back. typo: Nothing ::: toolkit/components/url-classifier/Classifier.cpp:583 (Diff revision 3) > + // object points to. > + nsCOMPtr<nsIFile> tempDirectory; > + rv = aParentDir->Clone(getter_AddRefs(tempDirectory)); > + rv = tempDirectory->AppendNative(TEMP_DIR_NAME); > + > + // 2. Rename "dirName2" to "dirName2". typo: dirName1

Attachment #8836716 - Flags: review?(gpascutto) → review-

Henry Chang [:hchang]

Assignee

Comment 8

•

8 years ago

I think the patch has been ready for review. The goal of the patch is described in comment 0. As for some implementation detail: 1) Most of the work is done in Classifier, including - Introduce mUpdatingDirectory (i.e. "safebrowsing-updating" for on-going update) - Remove the use of mToDeleteDirectory/mBackupDirectory but still do clean and recover in the very beginning just for backward compatibility. - Implement |SwapDirectoryContent|. It will be used in |SwapInUpdatedTables|. - Get rid of "2-step deletion". In the past, when we remove backup directory, we firstly rename it to "safebrowsing-to_delete" then do the actual removal. I don't see any necessity. 2) The overall flow is depicted in the attached diagram. The text version is - In Classifier::OpenDb(), recover "safebrowsing-backup" (if existed), remove "safebrowsing-to_delete" and "safebrowsing-updating" - In Classifier::ApplyUpdate, we a) Copy "safebrowsing" to "safebrowsing-updating", b) Do in-memory update c) Write out to "safebrowsing-updating" d) If (b) or (c) failed, dump "safebrowsing" to "safebrowsing-failedupdate", remove "safebrowsing-updating" and finish update. e) Swap directory "safebrowsing" and "safebrowing-updating". After swap, "safebrowsing" will contain the updated DB and "safebrowsing-updating" will contain outdated DB. f) Remove "safebrowsing-updating" 3) LookupCache::WriteFile, LookupCacheV4::WriteMetadat and HashStore::WriteFile will take an additional argument for "write out location", which was the same one for read. Note that we have to infer the actual write location by using |Classifier::GetPrivateStoreDirectory|. All the changes will not change the threading. I.e, everything will still run on the worker thread until Bug 1339050.

François Marier [:francois]

Updated

•

8 years ago

Status: NEW → ASSIGNED

Priority: -- → P2

Henry Chang [:hchang]

Assignee

Comment 9

•

8 years ago

mozreview-review-reply

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review113636 I've addressed the case which you pointed out and can't think of others. All possible snapshots for a crash are 0) nothing: nothing to do. 1) "safebrowsing": Perfect! 2) "safebrowsing-updating" (new) and "safebrowsing" (old): new DB may have been downloaded but not active. Acceptable. 3) "safebrowsing-backup" (new) and "safebrowsing" (old): The case you pointed out. (assume we rename "updating" to "backup" first) The new DB will be recovered after reboot. 4) "safebrowsing-backup" (new) and "safebrowsing-updating" (old): The new DB will be recovered after reboot. 5) "safebrowsing" (new) and "safebrowsing-updating" (old): Perfect! New patch with a little change around "SwapInUpdatedTables" will be submitted later. I forgot to indicate the "in-memory" tables should be also swapped atomically with in-disk tables. But it will still be fine because we haven't introduce concurrent lookup and update. They still both on the same thread in this patch. > What happens if we crash right after this operation? We've just moved mRootStoreDirectory away, and we haven't moved mUpdatingDirectory in its place. > > The recovery code only tries to delete an incomplete updating directory. > > At this point we don't know where our valid data is so we just lost everything. My solution is using "safebrowsing-backup" as the intermediary of swapping. If we crash after renaming "safebrowsing" to "safebrowsing-backup", we will recover the databases from "safebrowsing-backup". (I kept the recovery code for backward compatibility at first but it turned out also effective in this case.)

Comment hidden (mozreview-request)

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Blocks: 1340179

Henry Chang [:hchang]

Assignee

Comment 11

•

8 years ago

I just greatly simplified the entire design to make the changes less error-prone: Background: Classifier maintains a list of LookupCache, which is the only interface to read/write DB from/into disk and update in-memory data. In theory we don't need such a list: just create a new one whenever needed. However, creating a LookupCache requires opening files and data parsing. So we maintain a list of created LookupCache to avoid re-creation. Original solution (in the current patch): In order to have a "background-y" update, I modified LookupCache::WriteFile() so that we can assign a "special" directory to output files. Besides, I have to also do something like in Bug 1340179 to deal with the in-memory update issue. Simplified solution: Bug 1340179 is fairly simple but really annoying. We have to take care of every implementation detail. For example, we need to know what member variables of LookupCache will be mutated while doing an update. The idea behind the simplification is we don't have to stick to the current maintained list of LookupCache. Analogue to writing file to a "temp" directory, we can just allocate new LookupCaches for update only. Writing file and updating in-memory data will not interfere the in-use ones. Assume we originally had a list of LookupCaches like [goog-phish-shavar, goog-unwanted-shavar] and we've just downloaded something in memory for update. In addition to copying "safebrowsing" directory to "safebrowsing-updating", we "conceptually" clone the currently maintained LookupCache list. Since we haven't implemented deep copy function or copy constructor for LookupCache, "re-open" is required right now. So after update, we have in-use but old list |mLookupCache| => [goog-phish-shavar, goog-unwanted-shavar] new list |mNewLookupCache| => [goog-phish-shavar, goog-unwanted-shavar, goog-malware-shavar] and can then simply swap these two lists.

Henry Chang [:hchang]

Assignee

Comment 12

•

8 years ago

Attached image Safebrowsing new DB update scheme(2).png — Details

Attachment #8837079 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Henry Chang [:hchang]

Assignee

Comment 14

•

8 years ago

The latest diagram attachment 8838502 [details] and patch revision 5 are for the new design in comment 11.

Henry Chang [:hchang]

Assignee

Comment 15

•

8 years ago

Attached image SafeBrowsing Threading (Bug 1338970) - Breaking down Classifier__ApplyUpdate.svg (obsolete) — Details

Henry Chang [:hchang]

Assignee

Comment 16

•

8 years ago

Attached file SafeBrowsing Threading (Bug 1338970) - Breaking down Classifier__ApplyUpdate.pdf — Details

Attachment #8839096 - Attachment is obsolete: true

Gian-Carlo Pascutto [:gcp]

Comment 17

•

8 years ago

mozreview-review

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review115406 ::: toolkit/components/url-classifier/Classifier.cpp:620 (Diff revision 5) > +void > +Classifier::RemoveUpdateIntermediaries() > +{ > + // Remove old LookupCaches. > + for (auto c: mNewLookupCaches) { > + delete c; Maybe it's possible to add a debug assert somewhere, checking that if we are not in an update, this array is indeed cleared. Right now it seems all paths are properly covered, but manual memory management is something that can easily regress. ::: toolkit/components/url-classifier/Classifier.cpp:747 (Diff revision 5) > > } // End of scopedUpdatesClearer scope. > > - rv = RegenActiveTables(); > - NS_ENSURE_SUCCESS(rv, rv); > - > + rv = SwapInNewTablesAndCleanup(); > + if (NS_FAILED(rv)) { > + LOG(("Failed to swap in new tables.")); Does it make sense to propagate the error here? If we can't update the tables, does it make sense to do a full reset? (We don't ever want to get stuck with non-updatable, old tables)

Attachment #8836716 - Flags: review?(gpascutto) → review+

François Marier [:francois]

Comment 18

•

8 years ago

mozreview-review

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review115500 Looks good, just a few suggestions to make the code clearer. ::: toolkit/components/url-classifier/Classifier.cpp:588 (Diff revision 5) > + rv = aDir2->RenameToNative(nullptr, dirName1); > + if (NS_FAILED(rv)) { > + LOG(("Failed to rename %s to %s. Rename temp directory back to %s", > + dirName2.get(), dirName1.get(), dirName1.get())); > + nsresult rbrv = tempDirectory->RenameToNative(nullptr, dirName1); > + if (NS_FAILED(rbrv)) { Given that we can't recover from this failure, should we do like in Step 3 and only have `NS_ENSURE_SUCCESS(rbrv, rbrv)` here? ::: toolkit/components/url-classifier/Classifier.cpp:626 (Diff revision 5) > + } > + mNewLookupCaches.Clear(); > + > + // Remove the "old" directory. (despite its looking-new name) > + if (NS_FAILED(mUpdatingDirectory->Remove(true))) { > + LOG(("Failed to remove updating directory but we don't care.")); Why do we not care? Maybe you can have "Failed to remove updating directory." in the LOG and then a comment above that explains why we ignore that error. ::: toolkit/components/url-classifier/Classifier.cpp:653 (Diff revision 5) > + > + // Step 2. Swap in in-memory tables. > + mLookupCaches.SwapElements(mNewLookupCaches); > + > + // Step 3. Re-generate active tables based on on-disk tables. > + // XXX: If step 2 failed, should we re-generate? (Answering the question you ask) Is there an alternative? If we don't have a working lookup cache, Safe Browsing won't work, no? We should answer the question before landing and remove that comment :) ::: toolkit/components/url-classifier/Classifier.cpp:997 (Diff revision 5) > } > > -nsresult > -Classifier::RemoveBackupTables() > +void > +Classifier::CopyInUseLookupCacheForUpdate() > { > - nsCString toDeleteName; > + // TODO: Implement deep copy function or copy ctor. For now Is this something we should open a follow-up bug for? ::: toolkit/components/url-classifier/Classifier.cpp:998 (Diff revision 5) > > -nsresult > -Classifier::RemoveBackupTables() > +void > +Classifier::CopyInUseLookupCacheForUpdate() > { > - nsCString toDeleteName; > - nsresult rv = mToDeleteDirectory->GetNativeLeafName(toDeleteName); > + // TODO: Implement deep copy function or copy ctor. For now > + // we just re-open every table we have opened. Another option Can you clarify what "re-opening every table we have opened" does? Unless I'm misunderstanding your code, I think what this does is: "Populate the new lookup cache based on the tables in the updating directory by loading them from disk."

Attachment #8836716 - Flags: review?(francois) → review+

Henry Chang [:hchang]

Assignee

Comment 19

•

8 years ago

mozreview-review-reply

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review115406 Thanks! > Maybe it's possible to add a debug assert somewhere, checking that if we are not in an update, this array is indeed cleared. Right now it seems all paths are properly covered, but manual memory management is something that can easily regress. I might also have to add debug assert for intermediary directory. > Does it make sense to propagate the error here? If we can't update the tables, does it make sense to do a full reset? (We don't ever want to get stuck with non-updatable, old tables) It makes total sense. Thanks for reminding :)

Henry Chang [:hchang]

Assignee

Comment 20

•

8 years ago

mozreview-review-reply

Comment on attachment 8836716 [details] Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) https://reviewboard.mozilla.org/r/112070/#review115500 Thanks, too! > Given that we can't recover from this failure, should we do like in Step 3 and only have `NS_ENSURE_SUCCESS(rbrv, rbrv)` here? Yes I think we should :) > Why do we not care? > > Maybe you can have "Failed to remove updating directory." in the LOG and then a comment above that explains why we ignore that error. Because it's just a clean up and the failure will not cause anything to misbahave until the next udpate. I will add it to the comment then. > (Answering the question you ask) Is there an alternative? If we don't have a working lookup cache, Safe Browsing won't work, no? > > We should answer the question before landing and remove that comment :) The question should no longer be a question because step 2 is infalliable and the step 3 will not be reached if step 1 failed. To sum up, if we failed to swap in update intermediaries, every thing in-use should remain untouched (since we just didn't touch the in-use stuff.) > Is this something we should open a follow-up bug for? Sure and I'll not add dependency here because it's a kind of optimization. > Can you clarify what "re-opening every table we have opened" does? > > Unless I'm misunderstanding your code, I think what this does is: "Populate the new lookup cache based on the tables in the updating directory by loading them from disk." In this stage, the update hasn't started and we'd only like to "mirror" the already opended LookupCaches for update. (just like we copy "safebrowsing" to "safebrowsing-updating" for update.)

Henry Chang [:hchang]

Assignee

Updated

•

8 years ago

Target Milestone: --- → mozilla53

Comment hidden (mozreview-request)

Pulsebot

Comment 23

•

8 years ago

Pushed by hchang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2c1fc455e81e Introduce 2-step DB update scheme. (background update then swap in) r=francois,gcp

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 24

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2c1fc455e81e

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Target Milestone: mozilla53 → mozilla54

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Depends on: 1342397

You need to log in before you can comment on or make changes to this bug.