Closed Bug 818584 Opened 13 years ago Closed 12 years ago

Discard consecutive duplicate bookmark backups by comparing hashes

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla31

People

(Reporter: rnewman, Assigned: mak)

References

Details

(Whiteboard: p=5 s=it-31c-30a-29b.1 [qa-])

Attachments

(1 file, 13 obsolete files)

Blocks: 818587
No longer blocks: 818587
OS: Android → All
Hardware: ARM → All
The patch for Bug 818593 uses a preference to store meta data for backups. Shall we also store backup file hash to that preference when a new backup file is written to user's disk? Then, when the next automatic backup happens, the file hash of the last backup file can be retrieved from the preference instead of reading the its content to calculate the hash for comparison. Consider that the size of a backup file can be quite large. What do you think?
Assignee: nobody → raymond
Status: NEW → ASSIGNED
I would be very hesitant to suggest decoupling storage of a hash from its file. * You lose the ability to detect file corruption or truncation! * Prefs aren't always flushed, so you have to handle the case where the hash is missing. The solution IMO is to do the work in the background. And in any case, it takes a modern machine 50msec to MD5 a 20MB file, and we're planning to do this on once per day? Not something I'd be worried about.
(In reply to Richard Newman [:rnewman] from comment #2) > I would be very hesitant to suggest decoupling storage of a hash from its > file. > > * You lose the ability to detect file corruption or truncation! These cases are uninteresting for our scope though, and so far in 5 years I have yet to see any of them happening. > * Prefs aren't always flushed, so you have to handle the case where the hash > is missing. Even then, it would still be a win to have it in most cases (and when we don't have one we will just fallback to calculate it). In the end we just need the hash for the last backup, 99% of the cases are covered, others are exotic enough. That said, I'm not sure what's the solution you suggest, recalculate the hash everytime? how is that better than recalculating it only when it's missing?
(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #3) > These cases are uninteresting for our scope though, and so far in 5 years I > have yet to see any of them happening. I think it's of some interest. These are backups, after all; we definitely don't want to say "oh, already got this backup" only to find that the file never got flushed correctly and is sitting at zero bytes. That you haven't seen it doesn't mean that there isn't a backup file on a user's disk somewhere that's truncated, no? If we stretch out the lifecycle of that backup file, the importance grows. I agree that it's not a massive risk, but if we can cover it, that'd be nice. > That said, I'm not sure what's the solution you suggest, recalculate the > hash everytime? how is that better than recalculating it only when it's > missing? Because prefs. Flush the new backup file (have to do this anyway). If it's the same size as the previous entry, calculate the hash of both (have to do one file anyway), and if they match, discard the oldest. That's nice and simple, and in many cases you won't need to do any work at all, because the sizes won't match. Caching hashes in prefs can avoid you spending, say, 100msec in the background to hash a backup or two, but it adds a pref lookup, the need to clean up prefs as you clean up backups, a pref write alongside each new backup (might cause a disk write that takes even longer!) and you still need to handle the case of the pref being missing. Phrased differently: why would you cache in prefs a value that you read less than you write, maybe once every two or three days? I would regard this optimization as premature until you get telemetry telling you otherwise. And even then, get telemetry for the version with prefs, see if the additional pref handling makes things better or worse. As we found out from FHR, prefs is not a magical cheap storage layer :/ My hunch is that for the 99% case, just hashing the file when needed is the most straightforward approach. I chunked up a 7MB fake bookmark backup file (20 copies of my bookmark backups): md5 /tmp/2.json 0.02s user 0.01s system 110% cpu 0.025 total So long as it's backgrounded, that's fine IMO for once per day, a small percentage of the time. And if you really want to speed things up, put some simple known identifier -- bookmark count? highest bookmark ID? -- in the backup filename. Now you don't even need to hash to spot most changes, and you only need to hash if two files have the same size and bookmark count.
(In reply to Richard Newman [:rnewman] from comment #4) > (In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #3) > > > These cases are uninteresting for our scope though, and so far in 5 years I > > have yet to see any of them happening. > > I think it's of some interest. These are backups, after all; we definitely > don't want to say "oh, already got this backup" only to find that the file > never got flushed correctly and is sitting at zero bytes. I'm still not sure I get the point, if we don't complete the write we should not store the backup nor its metadata. If that happens there's a bug to fix. I know new version will use atomic writes from OS.file so I'm not sure what are we trying to cover. > Caching hashes in prefs can avoid you spending, say, 100msec in the > background to hash a backup or two > but it adds a pref lookup, the need to we already pay this price > clean up prefs as you clean up backups, a pref write alongside each new > backup again, we already pay this price > Phrased differently: why would you cache in prefs a value that you read less > than you write, maybe once every two or three days? cause we read and write it already, so there is no added cost. > And if you really want to speed things up, put some simple known identifier > -- bookmark count? highest bookmark ID? -- in the backup filename. Now you > don't even need to hash to spot most changes, and you only need to hash if > two files have the same size and bookmark count. we can do that, we are adding the bookmark count, to a pref though.
(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #5) > I'm still not sure I get the point, if we don't complete the write we should > not store the backup nor its metadata. If that happens there's a bug to > fix. I know new version will use atomic writes from OS.file so I'm not sure > what are we trying to cover. (a) if OS.File is wrong; (b) if the filesystem isn't transactional (it says it's written, but it's not necessarily all on disk yet); (c) if there is a post-write issue, like on-disk corruption, poorly restored profile backup, cosmic rays, malware, etc. This is why backups and downloads are verified out-of-band. There's a lot that can go wrong. > cause we read and write it already, so there is no added cost. You already write the hash? Or you already write a chunk of data that describes a new backup? The song I've heard from the perf team is "prefs bad, prefs slow, don't use prefs". And in this case we're writing to prefs when IMO we don't need to, because it would be read so infrequently. > we can do that, we are adding the bookmark count, to a pref though. Adding it to the filename is free, and works when users move bookmark backups between machines or profiles, or Reset Firefox.
(In reply to Richard Newman [:rnewman] from comment #6) > This is why backups and downloads are verified out-of-band. There's a lot > that can go wrong. It's still a so infrequent case that doesn't deserve added code complication, imo. Btw, I still don't get where are you suggesting to store this hash, again in the filename? > The song I've heard from the perf team is "prefs bad, prefs slow, don't use > prefs". And in this case we're writing to prefs when IMO we don't need to, > because it would be read so infrequently. I'm not sure I understand the frequent/infrequent thing, one case we have to read from the FS, the other case we have to read from a pref. The frequency is the same, that is basically once per day (when we backup, though if a backup fails may be multiple times). The pref file could already be in the OS memory map, thus the read could be really cheap or even free, considered how often we access prefs, while for sure accessing the backup file has to hit the disk, it's unlikely the backup file is in memory. > > we can do that, we are adding the bookmark count, to a pref though. > > Adding it to the filename is free, and works when users move bookmark > backups between machines or profiles, or Reset Firefox. the code relies on current file naming to sort and extract date, so the risk of regressions in the filename parsing code is higher, I would not be surprised if external apps/add-ons have similar code that relies on the current name (we can check add-ons at least). The gain must clearly be higher than these costs. Moreover the filename has less expandability (with new properties), and complicates code to handle all the missing cases due to old backups. In the end I honestly don't have enough data to tell what is better across the two approaches. according to your suggestion this would end up something like bookmarks-date-count-hash-somefuturemetadata.json, is this correct?
Attached patch WIP - preference approach (obsolete) — Splinter Review
This patch is using the preference approach. I will wait for the final decision before carrying on this bug.
Attachment #717949 - Attachment is patch: true
(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #7) > It's still a so infrequent case that doesn't deserve added code > complication, imo. > Btw, I still don't get where are you suggesting to store this hash, again in > the filename? Oh, I'm not. That was a side comment. Pretend I never mentioned integrity checking :D > I'm not sure I understand the frequent/infrequent thing, one case we have to > read from the FS, the other case we have to read from a pref. The frequency > is the same, that is basically once per day (when we backup, though if a > backup fails may be multiple times). The pref file could already be in the > OS memory map, thus the read could be really cheap or even free, considered > how often we access prefs, while for sure accessing the backup file has to > hit the disk, it's unlikely the backup file is in memory. Storing hash in pref: * Always compute hash on write, write to prefs * Read old hash from prefs on write if same count and size * Delete pref entries on cleanup * Needs code for pref handling * Might prompt a pref flush * No verification that hashed file contents match current file contents. Not storing: * Only compute new file hash on write if previous file has same count and size (this is a nice win) * Compute old file hash on write in those same circumstances * No pref cleanup, write, or flush * Offers current file content comparison. The only situation in which not storing is more costly: * Where the new file is a duplicate of the old one, and * Computing the old file's hash takes longer than the amortized cost of all of the pref interaction. > the code relies on current file naming to sort and extract date, so the risk > of regressions in the filename parsing code is higher, I would not be > surprised if external apps/add-ons have similar code that relies on the > current name (we can check add-ons at least). The gain must clearly be > higher than these costs. Fair enough. > In the end I honestly don't have enough data to tell what is better across > the two approaches. > according to your suggestion this would end up something like > bookmarks-date-count-hash-somefuturemetadata.json, is this correct? Probably bookmarks-date-count.json No need to over-engineer this, I think.
I'm not a dev, so forgive me if I missed something in scanning this bug or if this is a silly question, but are we able to calculate the hash before saving a backup to disk? If so, and we're able to find that two backups are identical, then how about we just update the filename of the already saved backup and discard the new one? This would save us from writing to disk.
(In reply to Mark S. from comment #10) > I'm not a dev, so forgive me if I missed something in scanning this bug or > if this is a silly question, but are we able to calculate the hash before > saving a backup to disk? If so, and we're able to find that two backups are > identical, then how about we just update the filename of the already saved > backup and discard the new one? This would save us from writing to disk. We could, by generating the backup in memory. That's probably not a win for a daily operation. A better approach is to have a generation number in the DB, but that incorporates a certain amount of trust at both ends: correct persistence and correct DB tracking.
(In reply to Richard Newman [:rnewman] from comment #11) > A better approach is to have a generation number in the DB, but that > incorporates a certain amount of trust at both ends: correct persistence and > correct DB tracking. I take it then we don't have that trust sufficiently at both ends.
(In reply to Mark S. from comment #12) > I take it then we don't have that trust sufficiently at both ends. Not really. It's possible to directly modify the places DB (inside or outside of Firefox). And it's possible for a backup file to be truncated, corrupted, overwritten, or whatever. It would be tragic to trust that an old backup is fine, never write a new one, and then discover when you most need it that it isn't usable.
we will add info to the filename, I just commented in bug 818593, thanks for the suggestion. While I think it's not a very flexible store, I also don't think we'll need that much added flexibility in future and we have to access the filesystem regardless.
Attachment #717949 - Attachment is obsolete: true
Depends on: 818593
Attached patch v2 (obsolete) — Splinter Review
This depends on the patch on bug 818593
Attachment #726517 - Flags: review?(mak77)
Attached patch v3 (obsolete) — Splinter Review
Added test
Attachment #726517 - Attachment is obsolete: true
Attachment #726517 - Flags: review?(mak77)
Attachment #726533 - Flags: review?(mak77)
Comment on attachment 726533 [details] [diff] [review] v3 Review of attachment 726533 [details] [diff] [review]: ----------------------------------------------------------------- I can't review this until bug 818593 reaches r+, please request at that time. Btw, I'm not sure I got the approach you took, what we need is to save I/O and this doesn't seem to do that. What we want is: 1. generate json string containing the backup in memory 2. calculate hash for this json string 3. extract hash from filename of the last backup 4. if they are the same do nothing 5. otherwise write it to a new backup if you need to convert the string to a bytearray to calculate the hash, Yoric suggested then using OS.file to save the file, since it's cheaper.
Attachment #726533 - Flags: review?(mak77)
(In reply to Marco Bonardo [:mak] from comment #17) > Btw, I'm not sure I got the approach you took, what we need is to save I/O > and this doesn't seem to do that. I actually disagree about the goal! The goal IMO is to save on redundant copies on disk, which has two benefits: * In the same disk space we can store a longer span of backups * In the same set of backups we don't have duplicate bookmark sets, which are confusing when presented to the user (i.e., we have meaningful snapshots). Furthermore, we can't save on I/O in every case. And I/O after the file has been written will be cheap: it'll be in every cache up and down the stack. And we only do this once a day, so I wouldn't be massively concerned with I/O cost here. See comment 9. > What we want is: > 1. generate json string containing the backup in memory > 2. calculate hash for this json string > 3. extract hash from filename of the last backup > 4. if they are the same do nothing I prefer: * Check previous backup filename bookmark *count* (not hash). We store a count in the name, not the hash, as previously discussed. The count is just a cheap way of short-circuiting. * If count is identical, we need to check the hash. * Compute the hash of the bookmark backup on disk. Do not persist the hash anywhere; compute it each day if the prior criteria are all met. * Compute the new hash in memory or on disk (doesn't matter too much). * Write if necessary. * If it isn't, we don't need to check anything: just write out the backup.
(In reply to Richard Newman [:rnewman] from comment #18) > I actually disagree about the goal! The goal IMO is to save on redundant > copies on disk We need to take the I/O problem into account from the beginning, doesn't make sense to solve half of the problem. > And I/O after the file has been written will be cheap Indeed I would like to avoid the "has been written" part. Backups are one of the greatest shutdown offenders as of now, and looks like the "happens only sometimes" reasoning didn't work at all for it, so I don't think it will start to work now. The most common case is that bookmarks don't change across sessions, that means the most common change is that we'll have to compute the hash or a possibly large file on disk in your proposal. > * If count is identical, we need to check the hash. > * Compute the hash of the bookmark backup on disk. That means read all of the backup on mainthread, some backups are many MB of data. I'd actually like to have constructive criticism on my proposal that seems to solve both the IO and the useless-backups problems.
The problems I see: * You have eliminated all robustness against file system damage, external corruption, etc. Not only do we no longer make redundant copies, but we also don't verify the contents of the one we have. I think that's unsuitable for probably the only backup that most users have. And I'd wager that most users are not running on fault-tolerant file systems, or have versioned storage of their places backups. This doesn't affect me, with Time Machine on a huge disk, but it affects pretty much every non-developer user I've ever talked to. My assertion is that before deciding *to not back up your bookmarks* we should positively determine that the existing backup is good. This is a fundamental point that dictates my suggested direction. See my analysis of the two approaches in Comment 9. A solution to main-thread IO: don't do it on the main thread. * My intuition is that it's more efficient to write smaller JSON chunks to an output stream than to compose a whole document in memory. You can probably hash as you go even for a stream. * Adding the bookmark count check is an easy win. Whenever a user has added or removed a bookmark, you don't have to check hashes. This is worth adding even to your scheme, I think, perhaps enhanced to be lazy about computing hashes for the written file - if a user bookmarks a lot, you never even switch to "hash mode". For file content hashes it's a huge win.
(In reply to Richard Newman [:rnewman] from comment #20) > * You have eliminated all robustness against file system damage, external > corruption, etc. Not only do we no longer make redundant copies, but we also > don't verify the contents of the one we have. I think that's unsuitable for > probably the only backup that most users have. And I'd wager that most users > are not running on fault-tolerant file systems, or have versioned storage of > their places backups. File system damage is not something we really care about, it can actually do much worse things than touching some bookmarks. As you said, users or even the OS already has tools to protect against file system damage, it's definitely not a browser's task. What we must care about is do our best to ensure the data we write is written safely, with atomic writes or renames. Your paranoid check for validity can easily be added in future if we should even find the need for it (e.g. we get many reports of backups invalid due to fs corruption). > * My intuition is that it's more efficient to write smaller JSON chunks to > an output stream than to compose a whole document in memory. You can > probably hash as you go even for a stream. Once we have measurements of what's more performant we can discuss about it, intuition is a lost bet. I don't exclude we may reach a point where we figure writing chunks was better and we go back. Surely measuring things before would be nicer. > * Adding the bookmark count check is an easy win. Whenever a user has added > or removed a bookmark, you don't have to check hashes. This is worth adding > even to your scheme, I think, perhaps enhanced to be lazy about computing > hashes for the written file - if a user bookmarks a lot, you never even > switch to "hash mode". For file content hashes it's a huge win. We are already adding bookmark count in bug 818593. The fact is the most common case is the user is unlikely to touch bookmark, so the "users that bookmarks a lot" case is not our target, our target is the user "who adds a couple bookmarks a week", this user will have its last backup hashed at every session, and a new backup written every day (and then removed), just to protect him from a corruption that may even make his computer unusable, let alone the browser.
(In reply to Richard Newman [:rnewman] from comment #20) > * Adding the bookmark count check is an easy win. Whenever a user has added > or removed a bookmark, you don't have to check hashes. This is worth adding > even to your scheme, I think, perhaps enhanced to be lazy about computing > hashes for the written file - if a user bookmarks a lot, you never even > switch to "hash mode". For file content hashes it's a huge win. Does bookmark count check take into account edited bookmarks ? Like, you have an existing bookmark with URL X . Now, instead of adding a new bookmark, you edit the existing bookmark to point to URL Y.
(In reply to mayankleoboy1 from comment #22) > Does bookmark count check take into account edited bookmarks ? Like, you > have an existing bookmark with URL X . Now, instead of adding a new > bookmark, you edit the existing bookmark to point to URL Y. Nope, but it doesn't need to -- it's a "positive shortcut", not a "negative shortcut". That is, if the count has changed we know we need to do a new backup. If it hasn't, we need to check contents.
Since the patch in bug 859695 also touches the same code, I will wait for bug 859695 to land first before working on this.
Depends on: 859695
Attached patch v4 WIP -w (obsolete) — Splinter Review
Marco: this is based on comment 17. Please have a look and give me some feedback. Thanks
Attachment #726533 - Attachment is obsolete: true
Attachment #798445 - Flags: feedback?(mak77)
Attached patch v4 (obsolete) — Splinter Review
I've added a new test and fixed issues in other tests.
Attachment #798445 - Attachment is obsolete: true
Attachment #798445 - Flags: feedback?(mak77)
Attachment #798768 - Flags: review?(mak77)
(In reply to Raymond Lee [:raymondlee] from comment #26) > Created attachment 798768 [details] [diff] [review] > v4 > > I've added a new test and fixed issues in other tests. This patch requires the one from bug 859695
Attachment #798768 - Flags: review?(mak77)
Attached patch v5 (obsolete) — Splinter Review
This requires bug 859695.
Attachment #798768 - Attachment is obsolete: true
Attachment #802176 - Flags: review?
Attachment #802176 - Flags: feedback?(mak77)
Attachment #802176 - Flags: review?
Comment on attachment 802176 [details] [diff] [review] v5 Review of attachment 802176 [details] [diff] [review]: ----------------------------------------------------------------- nit: I'd probably rename what you call "result" to "backupInfo", to avoid confusion with Places results. ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +108,5 @@ > + * @resolves When node have been serialized. > + * @rejects JavaScript exception. > + */ > + serializeNodeAsJSONString: function BJU_serializeNodeAsJSONString( > + aNode, aIsUICommand, aResolveShortcuts, aExcludeItems) { There's something wrong here, the method expects arguments, but all of the callpoints are passing nothing, and you serialize the root. I suspect you may rather want to make this exportToJSONString that just works like exportToFile? ::: toolkit/components/places/PlacesBackups.jsm @@ +33,5 @@ > localizedFilename.substr(0, localizedFilename.indexOf("-")); > delete this._filenamesRegex; > return this._filenamesRegex = > + new RegExp("^(bookmarks|" + localizedFilenamePrefix + ")-" + > + "([0-9-]+)(_[0-9]+){0,1}(_.{32}){0,1}\.(json|html)"); please file a bug to remove localizedFilenamePrefix from this code. @@ +322,5 @@ > let mostRecentBackupFile = yield this.getMostRecentBackup(); > > if (!aForceBackup) { > + if (aMaxBackups === 0) { > + yield this._deleteOldBackups(aMaxBackups); You can just pass zero, and merge the if conditions into a single if @@ +353,3 @@ > > + try { > + let view = new Uint8Array(result.byteArray); while I see where it comes from, I find the "view" naming a bit confusing in our code... maybe I'd just call this buffer or data. ::: toolkit/components/places/tests/bookmarks/test_818584-discard-duplicate-bookmarks.js @@ +46,5 @@ > + yield PlacesBackups.create(); > + recentBackup = yield PlacesBackups.getMostRecentBackup(); > + do_check_neq(recentBackup.path, oldBackupFile.path); > + do_check_eq((yield PlacesBackups.getBackupFiles()).length, 2); > + nit trailing space
Attachment #802176 - Flags: feedback?(mak77) → feedback+
Blocks: 923406
Attached patch v6 (obsolete) — Splinter Review
Updated the patch based on the new patch in bug 859695. (In reply to Marco Bonardo [:mak] from comment #29) > Comment on attachment 802176 [details] [diff] [review] > v5 > > Review of attachment 802176 [details] [diff] [review]: > ----------------------------------------------------------------- > > nit: I'd probably rename what you call "result" to "backupInfo", to avoid > confusion with Places results. > > ::: toolkit/components/places/BookmarkJSONUtils.jsm > @@ +108,5 @@ > > + * @resolves When node have been serialized. > > + * @rejects JavaScript exception. > > + */ > > + serializeNodeAsJSONString: function BJU_serializeNodeAsJSONString( > > + aNode, aIsUICommand, aResolveShortcuts, aExcludeItems) { > > There's something wrong here, the method expects arguments, but all of the > callpoints are passing nothing, and you serialize the root. > > I suspect you may rather want to make this exportToJSONString that just > works like exportToFile? Updated. > > ::: toolkit/components/places/PlacesBackups.jsm > @@ +33,5 @@ > > localizedFilename.substr(0, localizedFilename.indexOf("-")); > > delete this._filenamesRegex; > > return this._filenamesRegex = > > + new RegExp("^(bookmarks|" + localizedFilenamePrefix + ")-" + > > + "([0-9-]+)(_[0-9]+){0,1}(_.{32}){0,1}\.(json|html)"); > > please file a bug to remove localizedFilenamePrefix from this code. bug 923406 > > @@ +322,5 @@ > > let mostRecentBackupFile = yield this.getMostRecentBackup(); > > > > if (!aForceBackup) { > > + if (aMaxBackups === 0) { > > + yield this._deleteOldBackups(aMaxBackups); > > You can just pass zero, and merge the if conditions into a single if > > @@ +353,3 @@ > > > > + try { > > + let view = new Uint8Array(result.byteArray); > > while I see where it comes from, I find the "view" naming a bit confusing in > our code... maybe I'd just call this buffer or data. Done. > > ::: > toolkit/components/places/tests/bookmarks/test_818584-discard-duplicate- > bookmarks.js > @@ +46,5 @@ > > + yield PlacesBackups.create(); > > + recentBackup = yield PlacesBackups.getMostRecentBackup(); > > + do_check_neq(recentBackup.path, oldBackupFile.path); > > + do_check_eq((yield PlacesBackups.getBackupFiles()).length, 2); > > + > > nit trailing space Done.
Attachment #802176 - Attachment is obsolete: true
Attachment #813469 - Flags: review?(mak77)
Comment on attachment 813469 [details] [diff] [review] v6 Review of attachment 813469 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +527,5 @@ > + let buffer = new Uint8Array(exportedData.byteArray); > + yield OS.File.writeAtomic(aLocalFile.path, buffer, > + {tmpPath: aLocalFile.path + ".tmp"}); > + } catch (e) { > + throw e; what's up here? ::: toolkit/components/places/PlacesBackups.jsm @@ +346,1 @@ > } ditto @@ +417,5 @@ > + > + if (matches && matches[4]) > + hash = matches[4].replace(/^_/, ""); > + > + return hash; I'd prefer to avoid the hash var since doesn't seem needed if (...) return ... return null; ::: toolkit/components/places/tests/bookmarks/xpcshell.ini @@ +29,5 @@ > [test_675416.js] > [test_711914.js] > [test_protectRoots.js] > [test_818593-store-backup-metadata.js] > +[test_818584-discard-duplicate-bookmarks.js] looks like you forgot to add the file
Attachment #813469 - Flags: review?(mak77) → feedback+
Attached patch v7 (obsolete) — Splinter Review
(In reply to Marco Bonardo [:mak] from comment #31) > Comment on attachment 813469 [details] [diff] [review] > v6 > > Review of attachment 813469 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/components/places/BookmarkJSONUtils.jsm > @@ +527,5 @@ > > + let buffer = new Uint8Array(exportedData.byteArray); > > + yield OS.File.writeAtomic(aLocalFile.path, buffer, > > + {tmpPath: aLocalFile.path + ".tmp"}); > > + } catch (e) { > > + throw e; > > what's up here? > > ::: toolkit/components/places/PlacesBackups.jsm > @@ +346,1 @@ > > } > > ditto Removed. > > @@ +417,5 @@ > > + > > + if (matches && matches[4]) > > + hash = matches[4].replace(/^_/, ""); > > + > > + return hash; > > I'd prefer to avoid the hash var since doesn't seem needed > if (...) return ... > return null; > Updated. > ::: toolkit/components/places/tests/bookmarks/xpcshell.ini > @@ +29,5 @@ > > [test_675416.js] > > [test_711914.js] > > [test_protectRoots.js] > > [test_818593-store-backup-metadata.js] > > +[test_818584-discard-duplicate-bookmarks.js] > > looks like you forgot to add the file Added it back However, the tests for debug build crashed. Still investigating... https://tbpl.mozilla.org/?tree=Try&rev=5ff4741ee62e 02:50:31 INFO - WARNING: NS_ENSURE_TRUE(thread) failed: file ../../../../netwerk/base/src/nsSocketTransportService2.cpp, line 115 02:50:31 INFO - ###!!! ASSERTION: nsNSSComponent relies on profile manager to wait for synchronous shutdown of all network activity: 'mIsNetworkDown', file ../../../../../security/manager/ssl/src/nsNSSComponent.cpp, line 1835
Attachment #813469 - Attachment is obsolete: true
Attachment #814054 - Flags: feedback?(mak77)
Attached patch v8 (obsolete) — Splinter Review
I have spent some time to investigate why this patch doesn't work on debug build, however, didn't have much lunch. It's definitely related to wait for synchronous shutdown. May be it takes too long for the sync shutdown to complete and the code doesn't like that. Running ./mach xpcshell-test browser/components/places/tests/unit/ would show the failing tests. Marco: do you have any suggestion?
Attachment #814054 - Attachment is obsolete: true
Attachment #814054 - Flags: feedback?(mak77)
Flags: needinfo?(mak77)
Attached file console-log.txt (obsolete) —
Console log related to the issue
(In reply to Raymond Lee [:raymondlee] from comment #33) > Created attachment 814326 [details] [diff] [review] > v8 > > I have spent some time to investigate why this patch doesn't work on debug > build, however, didn't have much lunch. > > It's definitely related to wait for synchronous shutdown. May be it takes > too long for the sync shutdown to complete and the code doesn't like that. > > Running ./mach xpcshell-test browser/components/places/tests/unit/ would > show the failing tests. > > Marco: do you have any suggestion? May we should wait for bug 887043 and bug 855226
Depends on: 824433
Flags: needinfo?(mak77)
Whiteboard: p=0
No longer blocks: 923406
No longer depends on: 824433
I'll start from unbitrotting the latest patch and see what's left to do.
Assignee: raymond → mak77
Attached patch v9 (obsolete) — Splinter Review
This passed all of the tests, plus the new test The hash is md5 with '/' replaced by '-' to use it in the filename
Attachment #814326 - Attachment is obsolete: true
Attachment #814327 - Attachment is obsolete: true
Attachment #8379217 - Flags: review?(mano)
Blocks: 818587
Attached patch v10 (obsolete) — Splinter Review
Just an unbitrot
Attachment #8379217 - Attachment is obsolete: true
Attachment #8379217 - Flags: review?(mano)
Attachment #8387551 - Flags: review?(mano)
Comment on attachment 8387551 [details] [diff] [review] v10 ># HG changeset patch ># User Marco Bonardo <mbonardo@mozilla.com> >Bug 818584 - Discard consecutive duplicate bookmark backups by comparing hashes r=mano >Original patch by Raymond Lee <raymond@appcoast.com> f=mak > >diff --git a/toolkit/components/places/BookmarkJSONUtils.jsm b/toolkit/components/places/BookmarkJSONUtils.jsm >--- a/toolkit/components/places/BookmarkJSONUtils.jsm >+++ b/toolkit/components/places/BookmarkJSONUtils.jsm >+function generateHash(aString) { >+ let cryptoHash = Cc["@mozilla.org/security/hash;1"] >+ .createInstance(Ci.nsICryptoHash); >+ cryptoHash.init(Ci.nsICryptoHash.MD5); >+ let stringStream = Cc["@mozilla.org/io/string-input-stream;1"] >+ .createInstance(Ci.nsIStringInputStream); >+ stringStream.data = aString; >+ cryptoHash.updateFromStream(stringStream, -1); >+ // base64 allows the '/' char, but we can't use it for filenames. >+ return cryptoHash.finish(true).replace("/", "-"); At first I was worried because base64 is case-sensitive, but I guess this works because it's not the hash that makes the file name unique. This should be documented somewhere here. Otherwise the method may be reused in the future in an improper way leading to quite unfortunate bugs. >+} >+ > this.BookmarkJSONUtils = Object.freeze({ > /** > * Import bookmarks from a url. > * > * @param aSpec > * url of the bookmark data. > * @param aReplace > * Boolean if true, replace existing bookmarks, else merge. >@@ -94,23 +109,29 @@ this.BookmarkJSONUtils = Object.freeze({ > }); > }, > > /** > * Serializes bookmarks using JSON, and writes to the supplied file path. > * > * @param aFilePath > * OS.File path string for the "bookmarks.json" file to be created. >- * >+ * @param [optional] aOptions >+ * Object containing options for the export: >+ * - failIfSameHash: if the generated file would have the same hash >+ * defined here, will reject with ex.becauseSameHash The naming suggest a boolean, maybe fallIfHasIs? >- exportToFile: function BJU_exportToFile(aFilePath) { >+ exportToFile: function BJU_exportToFile(aFilePath, aOptions) { I would default-set aOptions to {} > if (aFilePath instanceof Ci.nsIFile) { > Deprecated.warning("Passing an nsIFile to BookmarksJSONUtils.exportToFile " + > "is deprecated. Please use an OS.File path string instead.", > "https://developer.mozilla.org/docs/JavaScript_OS.File"); > aFilePath = aFilePath.path; > } > return Task.spawn(function* () { > let [bookmarks, count] = yield PlacesBackups.getBookmarksTree(); >@@ -120,22 +141,39 @@ this.BookmarkJSONUtils = Object.freeze({ > try { > Services.telemetry > .getHistogramById("PLACES_BACKUPS_TOJSON_MS") > .add(Date.now() - startTime); > } catch (ex) { > Components.utils.reportError("Unable to report telemetry."); > } > >+ startTime = Date.now(); >+ let hash = generateHash(jsonString); >+ // Report the time taken to generate the hash. >+ try { >+ Services.telemetry >+ .getHistogramById("PLACES_BACKUPS_HASHING_MS") >+ .add(Date.now() - startTime); >+ } catch (ex) { >+ Components.utils.reportError("Unable to report telemetry."); >+ } >+ >+ if (aOptions && hash === aOptions.failIfSameHash) { >+ let e = new Error("Hash conflict"); >+ Object.defineProperty(e, "becauseSameHash", { get: () => true }); Any reason not to set e.becauseSameHash the simple way? If there's one, document it here. Or you could define a SameHash error object in this module and then use instanseof checks. By the way, I wonder if we could somehow encapsulate the entire process in this module. This setup as a whole seems somewhat complex. >diff --git a/toolkit/components/places/PlacesBackups.jsm b/toolkit/components/places/PlacesBackups.jsm >--- a/toolkit/components/places/PlacesBackups.jsm >+++ b/toolkit/components/places/PlacesBackups.jsm >+ /** >+ * Matches the backup filename: >+ * 0: file name >+ * 1: date in form Y-m-d >+ * 2: bookmarks count >+ * 3: contents hash >+ * 4: file extension >+ */ >+ get filenamesRegex() { >+ delete this.filenamesRegex; >+ return this.filenamesRegex = >+ new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(json|html)", "i"); > }, > Instead of making it even more public than it already is, take it out of this object. Tests can redefine it. review+, but if you do decide to structure this differently in a more simple way, I can look at it again.
Attachment #8387551 - Flags: review?(mano) → review+
(In reply to Mano from comment #39) > By the way, I wonder if we could somehow encapsulate the entire process in > this module. This setup as a whole seems somewhat complex. What does this mean? > >diff --git a/toolkit/components/places/PlacesBackups.jsm b/toolkit/components/places/PlacesBackups.jsm > >--- a/toolkit/components/places/PlacesBackups.jsm > >+++ b/toolkit/components/places/PlacesBackups.jsm > > >+ /** > >+ * Matches the backup filename: > >+ * 0: file name > >+ * 1: date in form Y-m-d > >+ * 2: bookmarks count > >+ * 3: contents hash > >+ * 4: file extension > >+ */ > >+ get filenamesRegex() { > >+ delete this.filenamesRegex; > >+ return this.filenamesRegex = > >+ new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(json|html)", "i"); > > }, > > > > Instead of making it even more public than it already is, take it out of > this object. Tests can redefine it. I'm exactly making this public cause I'm bored of having to redefine it in tests every time. Is there a reason I should not make this public? It's a good regex to match against our backups naming.
Flags: needinfo?(mano)
(In reply to Marco Bonardo [:mak] from comment #40) > (In reply to Mano from comment #39) > > By the way, I wonder if we could somehow encapsulate the entire process in > > this module. This setup as a whole seems somewhat complex. > > What does this mean? Ah maybe I got it, the fact is that I want the jsonUtils API to be very simple (and mirroring the html API) and handle all of the complexity in PlacesBackups. it wouldn't make much sense to move our specific backups behavior into a generic util to store bookmarks to json, imo.
discussed over IRC
Flags: needinfo?(mano)
Attached patch v11 (obsolete) — Splinter Review
Attachment #8387551 - Attachment is obsolete: true
Flags: in-testsuite+
Target Milestone: --- → mozilla31
Whiteboard: p=0 → p=5 s=it-31c-30a-29b.1
The only far relation I see is that sync uses PlacesBackups and BookmarkJSONUtils modules. I will wait to see if the oranges go away before trying to figure what's up there, since this is very strange :/
Flagging as [qa-] since this has in-testsuite coverage. Please advise if this needs any special QA testing before we release this change to GA.
Whiteboard: p=5 s=it-31c-30a-29b.1 → p=5 s=it-31c-30a-29b.1 [qa-]
This is the reoccurring theme in all the failures: 10:02:09 INFO - 1395248230077 Sync.EngineManager ERROR Engine 'bookmarks' is already registered!
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #49) > This is the reoccurring theme in all the failures: > > 10:02:09 INFO - 1395248230077 Sync.EngineManager ERROR Engine > 'bookmarks' is already registered! this happens regardless of my patch and of a failure. It just happens everytime.
Locally the test timed out here 0:10.38 Sqlite.Connection.places.sqlite DEBUG Conn #10: Stmt #0 finished. 0:10.38 Sqlite.Connection.places.sqlite DEBUG Conn #10: Request to close connection. 0:10.39 Sqlite.Connection.places.sqlite DEBUG Conn #10: Finalizing connection. 0:10.39 Sqlite.Connection.places.sqlite DEBUG Conn #10: Calling asyncClose(). 0:10.39 Sqlite.Connection.places.sqlite INFO Conn #10: Closed 0:10.83 Sync.Tracker.Bookmarks DEBUG Saving changed IDs to bookmarks
OK, looks like the problematic call is the yield PlacesBackups.create in BStore_wipe. I indeed touched this method, so investigating.
Attached patch patch v12Splinter Review
it was a very dumb issue, we were replacing only one possible instance of the char '/' in the base64 hash, just adding a "g" to the replace made us properly replace all of the instances. The additional '/' were causing OS.File to fail moving the file, and since the hash is generated randomly, that explains the intermittent failure.
Attachment #8393453 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Regressions: 1559403
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: