818584 - Discard consecutive duplicate bookmark backups by comparing hashes

The patch for Bug 818593 uses a preference to store meta data for backups. Shall we also store backup file hash to that preference when a new backup file is written to user's disk? Then, when the next automatic backup happens, the file hash of the last backup file can be retrieved from the preference instead of reading the its content to calculate the hash for comparison. Consider that the size of a backup file can be quite large. What do you think?

Raymond Lee [:raymondlee]

Updated

•

13 years ago

Assignee: nobody → raymond

Status: NEW → ASSIGNED

Richard Newman [:rnewman]

Reporter

Comment 2

•

13 years ago

I would be very hesitant to suggest decoupling storage of a hash from its file. * You lose the ability to detect file corruption or truncation! * Prefs aren't always flushed, so you have to handle the case where the hash is missing. The solution IMO is to do the work in the background. And in any case, it takes a modern machine 50msec to MD5 a 20MB file, and we're planning to do this on once per day? Not something I'd be worried about.

Marco Bonardo [:mak]

Assignee

Comment 3

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #2) > I would be very hesitant to suggest decoupling storage of a hash from its > file. > > * You lose the ability to detect file corruption or truncation! These cases are uninteresting for our scope though, and so far in 5 years I have yet to see any of them happening. > * Prefs aren't always flushed, so you have to handle the case where the hash > is missing. Even then, it would still be a win to have it in most cases (and when we don't have one we will just fallback to calculate it). In the end we just need the hash for the last backup, 99% of the cases are covered, others are exotic enough. That said, I'm not sure what's the solution you suggest, recalculate the hash everytime? how is that better than recalculating it only when it's missing?

Richard Newman [:rnewman]

Reporter

Comment 4

•

13 years ago

(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #3) > These cases are uninteresting for our scope though, and so far in 5 years I > have yet to see any of them happening. I think it's of some interest. These are backups, after all; we definitely don't want to say "oh, already got this backup" only to find that the file never got flushed correctly and is sitting at zero bytes. That you haven't seen it doesn't mean that there isn't a backup file on a user's disk somewhere that's truncated, no? If we stretch out the lifecycle of that backup file, the importance grows. I agree that it's not a massive risk, but if we can cover it, that'd be nice. > That said, I'm not sure what's the solution you suggest, recalculate the > hash everytime? how is that better than recalculating it only when it's > missing? Because prefs. Flush the new backup file (have to do this anyway). If it's the same size as the previous entry, calculate the hash of both (have to do one file anyway), and if they match, discard the oldest. That's nice and simple, and in many cases you won't need to do any work at all, because the sizes won't match. Caching hashes in prefs can avoid you spending, say, 100msec in the background to hash a backup or two, but it adds a pref lookup, the need to clean up prefs as you clean up backups, a pref write alongside each new backup (might cause a disk write that takes even longer!) and you still need to handle the case of the pref being missing. Phrased differently: why would you cache in prefs a value that you read less than you write, maybe once every two or three days? I would regard this optimization as premature until you get telemetry telling you otherwise. And even then, get telemetry for the version with prefs, see if the additional pref handling makes things better or worse. As we found out from FHR, prefs is not a magical cheap storage layer :/ My hunch is that for the 99% case, just hashing the file when needed is the most straightforward approach. I chunked up a 7MB fake bookmark backup file (20 copies of my bookmark backups): md5 /tmp/2.json 0.02s user 0.01s system 110% cpu 0.025 total So long as it's backgrounded, that's fine IMO for once per day, a small percentage of the time. And if you really want to speed things up, put some simple known identifier -- bookmark count? highest bookmark ID? -- in the backup filename. Now you don't even need to hash to spot most changes, and you only need to hash if two files have the same size and bookmark count.

Marco Bonardo [:mak]

Assignee

Comment 5

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #4) > (In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #3) > > > These cases are uninteresting for our scope though, and so far in 5 years I > > have yet to see any of them happening. > > I think it's of some interest. These are backups, after all; we definitely > don't want to say "oh, already got this backup" only to find that the file > never got flushed correctly and is sitting at zero bytes. I'm still not sure I get the point, if we don't complete the write we should not store the backup nor its metadata. If that happens there's a bug to fix. I know new version will use atomic writes from OS.file so I'm not sure what are we trying to cover. > Caching hashes in prefs can avoid you spending, say, 100msec in the > background to hash a backup or two > but it adds a pref lookup, the need to we already pay this price > clean up prefs as you clean up backups, a pref write alongside each new > backup again, we already pay this price > Phrased differently: why would you cache in prefs a value that you read less > than you write, maybe once every two or three days? cause we read and write it already, so there is no added cost. > And if you really want to speed things up, put some simple known identifier > -- bookmark count? highest bookmark ID? -- in the backup filename. Now you > don't even need to hash to spot most changes, and you only need to hash if > two files have the same size and bookmark count. we can do that, we are adding the bookmark count, to a pref though.

Richard Newman [:rnewman]

Reporter

Comment 6

•

13 years ago

(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #5) > I'm still not sure I get the point, if we don't complete the write we should > not store the backup nor its metadata. If that happens there's a bug to > fix. I know new version will use atomic writes from OS.file so I'm not sure > what are we trying to cover. (a) if OS.File is wrong; (b) if the filesystem isn't transactional (it says it's written, but it's not necessarily all on disk yet); (c) if there is a post-write issue, like on-disk corruption, poorly restored profile backup, cosmic rays, malware, etc. This is why backups and downloads are verified out-of-band. There's a lot that can go wrong. > cause we read and write it already, so there is no added cost. You already write the hash? Or you already write a chunk of data that describes a new backup? The song I've heard from the perf team is "prefs bad, prefs slow, don't use prefs". And in this case we're writing to prefs when IMO we don't need to, because it would be read so infrequently. > we can do that, we are adding the bookmark count, to a pref though. Adding it to the filename is free, and works when users move bookmark backups between machines or profiles, or Reset Firefox.

Marco Bonardo [:mak]

Assignee

Comment 7

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #6) > This is why backups and downloads are verified out-of-band. There's a lot > that can go wrong. It's still a so infrequent case that doesn't deserve added code complication, imo. Btw, I still don't get where are you suggesting to store this hash, again in the filename? > The song I've heard from the perf team is "prefs bad, prefs slow, don't use > prefs". And in this case we're writing to prefs when IMO we don't need to, > because it would be read so infrequently. I'm not sure I understand the frequent/infrequent thing, one case we have to read from the FS, the other case we have to read from a pref. The frequency is the same, that is basically once per day (when we backup, though if a backup fails may be multiple times). The pref file could already be in the OS memory map, thus the read could be really cheap or even free, considered how often we access prefs, while for sure accessing the backup file has to hit the disk, it's unlikely the backup file is in memory. > > we can do that, we are adding the bookmark count, to a pref though. > > Adding it to the filename is free, and works when users move bookmark > backups between machines or profiles, or Reset Firefox. the code relies on current file naming to sort and extract date, so the risk of regressions in the filename parsing code is higher, I would not be surprised if external apps/add-ons have similar code that relies on the current name (we can check add-ons at least). The gain must clearly be higher than these costs. Moreover the filename has less expandability (with new properties), and complicates code to handle all the missing cases due to old backups. In the end I honestly don't have enough data to tell what is better across the two approaches. according to your suggestion this would end up something like bookmarks-date-count-hash-somefuturemetadata.json, is this correct?

Raymond Lee [:raymondlee]

Comment 8

•

13 years ago

Attached patch WIP - preference approach (obsolete) — Details — Splinter Review

This patch is using the preference approach. I will wait for the final decision before carrying on this bug.

Raymond Lee [:raymondlee]

Updated

•

13 years ago

Attachment #717949 - Attachment is patch: true

Richard Newman [:rnewman]

Reporter

Comment 9

•

13 years ago

(In reply to Marco Bonardo [:mak] (Away Mar 1) from comment #7) > It's still a so infrequent case that doesn't deserve added code > complication, imo. > Btw, I still don't get where are you suggesting to store this hash, again in > the filename? Oh, I'm not. That was a side comment. Pretend I never mentioned integrity checking :D > I'm not sure I understand the frequent/infrequent thing, one case we have to > read from the FS, the other case we have to read from a pref. The frequency > is the same, that is basically once per day (when we backup, though if a > backup fails may be multiple times). The pref file could already be in the > OS memory map, thus the read could be really cheap or even free, considered > how often we access prefs, while for sure accessing the backup file has to > hit the disk, it's unlikely the backup file is in memory. Storing hash in pref: * Always compute hash on write, write to prefs * Read old hash from prefs on write if same count and size * Delete pref entries on cleanup * Needs code for pref handling * Might prompt a pref flush * No verification that hashed file contents match current file contents. Not storing: * Only compute new file hash on write if previous file has same count and size (this is a nice win) * Compute old file hash on write in those same circumstances * No pref cleanup, write, or flush * Offers current file content comparison. The only situation in which not storing is more costly: * Where the new file is a duplicate of the old one, and * Computing the old file's hash takes longer than the amortized cost of all of the pref interaction. > the code relies on current file naming to sort and extract date, so the risk > of regressions in the filename parsing code is higher, I would not be > surprised if external apps/add-ons have similar code that relies on the > current name (we can check add-ons at least). The gain must clearly be > higher than these costs. Fair enough. > In the end I honestly don't have enough data to tell what is better across > the two approaches. > according to your suggestion this would end up something like > bookmarks-date-count-hash-somefuturemetadata.json, is this correct? Probably bookmarks-date-count.json No need to over-engineer this, I think.

Caspy7

Comment 10

•

13 years ago

I'm not a dev, so forgive me if I missed something in scanning this bug or if this is a silly question, but are we able to calculate the hash before saving a backup to disk? If so, and we're able to find that two backups are identical, then how about we just update the filename of the already saved backup and discard the new one? This would save us from writing to disk.

Richard Newman [:rnewman]

Reporter

Comment 11

•

13 years ago

(In reply to Mark S. from comment #10) > I'm not a dev, so forgive me if I missed something in scanning this bug or > if this is a silly question, but are we able to calculate the hash before > saving a backup to disk? If so, and we're able to find that two backups are > identical, then how about we just update the filename of the already saved > backup and discard the new one? This would save us from writing to disk. We could, by generating the backup in memory. That's probably not a win for a daily operation. A better approach is to have a generation number in the DB, but that incorporates a certain amount of trust at both ends: correct persistence and correct DB tracking.

Caspy7

Comment 12

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #11) > A better approach is to have a generation number in the DB, but that > incorporates a certain amount of trust at both ends: correct persistence and > correct DB tracking. I take it then we don't have that trust sufficiently at both ends.

Richard Newman [:rnewman]

Reporter

Comment 13

•

13 years ago

(In reply to Mark S. from comment #12) > I take it then we don't have that trust sufficiently at both ends. Not really. It's possible to directly modify the places DB (inside or outside of Firefox). And it's possible for a backup file to be truncated, corrupted, overwritten, or whatever. It would be tragic to trust that an old backup is fine, never write a new one, and then discover when you most need it that it isn't usable.

Marco Bonardo [:mak]

Assignee

Comment 14

•

13 years ago

we will add info to the filename, I just commented in bug 818593, thanks for the suggestion. While I think it's not a very flexible store, I also don't think we'll need that much added flexibility in future and we have to access the filesystem regardless.

Raymond Lee [:raymondlee]

Updated

•

13 years ago

Attachment #717949 - Attachment is obsolete: true

Raymond Lee [:raymondlee]

Updated

•

13 years ago

Depends on: 818593

Raymond Lee [:raymondlee]

Comment 15

•

13 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

This depends on the patch on bug 818593

Attachment #726517 - Flags: review?(mak77)

Raymond Lee [:raymondlee]

Comment 16

•

13 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

Added test

Attachment #726517 - Attachment is obsolete: true

Attachment #726517 - Flags: review?(mak77)

Attachment #726533 - Flags: review?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 17

•

13 years ago

Comment on attachment 726533 [details] [diff] [review] v3 Review of attachment 726533 [details] [diff] [review]: ----------------------------------------------------------------- I can't review this until bug 818593 reaches r+, please request at that time. Btw, I'm not sure I got the approach you took, what we need is to save I/O and this doesn't seem to do that. What we want is: 1. generate json string containing the backup in memory 2. calculate hash for this json string 3. extract hash from filename of the last backup 4. if they are the same do nothing 5. otherwise write it to a new backup if you need to convert the string to a bytearray to calculate the hash, Yoric suggested then using OS.file to save the file, since it's cheaper.

Attachment #726533 - Flags: review?(mak77)

Richard Newman [:rnewman]

Reporter

Comment 18

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #17) > Btw, I'm not sure I got the approach you took, what we need is to save I/O > and this doesn't seem to do that. I actually disagree about the goal! The goal IMO is to save on redundant copies on disk, which has two benefits: * In the same disk space we can store a longer span of backups * In the same set of backups we don't have duplicate bookmark sets, which are confusing when presented to the user (i.e., we have meaningful snapshots). Furthermore, we can't save on I/O in every case. And I/O after the file has been written will be cheap: it'll be in every cache up and down the stack. And we only do this once a day, so I wouldn't be massively concerned with I/O cost here. See comment 9. > What we want is: > 1. generate json string containing the backup in memory > 2. calculate hash for this json string > 3. extract hash from filename of the last backup > 4. if they are the same do nothing I prefer: * Check previous backup filename bookmark *count* (not hash). We store a count in the name, not the hash, as previously discussed. The count is just a cheap way of short-circuiting. * If count is identical, we need to check the hash. * Compute the hash of the bookmark backup on disk. Do not persist the hash anywhere; compute it each day if the prior criteria are all met. * Compute the new hash in memory or on disk (doesn't matter too much). * Write if necessary. * If it isn't, we don't need to check anything: just write out the backup.

Marco Bonardo [:mak]

Assignee

Comment 19

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #18) > I actually disagree about the goal! The goal IMO is to save on redundant > copies on disk We need to take the I/O problem into account from the beginning, doesn't make sense to solve half of the problem. > And I/O after the file has been written will be cheap Indeed I would like to avoid the "has been written" part. Backups are one of the greatest shutdown offenders as of now, and looks like the "happens only sometimes" reasoning didn't work at all for it, so I don't think it will start to work now. The most common case is that bookmarks don't change across sessions, that means the most common change is that we'll have to compute the hash or a possibly large file on disk in your proposal. > * If count is identical, we need to check the hash. > * Compute the hash of the bookmark backup on disk. That means read all of the backup on mainthread, some backups are many MB of data. I'd actually like to have constructive criticism on my proposal that seems to solve both the IO and the useless-backups problems.

Richard Newman [:rnewman]

Reporter

Comment 20

•

13 years ago

The problems I see: * You have eliminated all robustness against file system damage, external corruption, etc. Not only do we no longer make redundant copies, but we also don't verify the contents of the one we have. I think that's unsuitable for probably the only backup that most users have. And I'd wager that most users are not running on fault-tolerant file systems, or have versioned storage of their places backups. This doesn't affect me, with Time Machine on a huge disk, but it affects pretty much every non-developer user I've ever talked to. My assertion is that before deciding *to not back up your bookmarks* we should positively determine that the existing backup is good. This is a fundamental point that dictates my suggested direction. See my analysis of the two approaches in Comment 9. A solution to main-thread IO: don't do it on the main thread. * My intuition is that it's more efficient to write smaller JSON chunks to an output stream than to compose a whole document in memory. You can probably hash as you go even for a stream. * Adding the bookmark count check is an easy win. Whenever a user has added or removed a bookmark, you don't have to check hashes. This is worth adding even to your scheme, I think, perhaps enhanced to be lazy about computing hashes for the written file - if a user bookmarks a lot, you never even switch to "hash mode". For file content hashes it's a huge win.

Marco Bonardo [:mak]

Assignee

Comment 21

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #20) > * You have eliminated all robustness against file system damage, external > corruption, etc. Not only do we no longer make redundant copies, but we also > don't verify the contents of the one we have. I think that's unsuitable for > probably the only backup that most users have. And I'd wager that most users > are not running on fault-tolerant file systems, or have versioned storage of > their places backups. File system damage is not something we really care about, it can actually do much worse things than touching some bookmarks. As you said, users or even the OS already has tools to protect against file system damage, it's definitely not a browser's task. What we must care about is do our best to ensure the data we write is written safely, with atomic writes or renames. Your paranoid check for validity can easily be added in future if we should even find the need for it (e.g. we get many reports of backups invalid due to fs corruption). > * My intuition is that it's more efficient to write smaller JSON chunks to > an output stream than to compose a whole document in memory. You can > probably hash as you go even for a stream. Once we have measurements of what's more performant we can discuss about it, intuition is a lost bet. I don't exclude we may reach a point where we figure writing chunks was better and we go back. Surely measuring things before would be nicer. > * Adding the bookmark count check is an easy win. Whenever a user has added > or removed a bookmark, you don't have to check hashes. This is worth adding > even to your scheme, I think, perhaps enhanced to be lazy about computing > hashes for the written file - if a user bookmarks a lot, you never even > switch to "hash mode". For file content hashes it's a huge win. We are already adding bookmark count in bug 818593. The fact is the most common case is the user is unlikely to touch bookmark, so the "users that bookmarks a lot" case is not our target, our target is the user "who adds a couple bookmarks a week", this user will have its last backup hashed at every session, and a new backup written every day (and then removed), just to protect him from a corruption that may even make his computer unusable, let alone the browser.

Mayank Bansal

Comment 22

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #20) > * Adding the bookmark count check is an easy win. Whenever a user has added > or removed a bookmark, you don't have to check hashes. This is worth adding > even to your scheme, I think, perhaps enhanced to be lazy about computing > hashes for the written file - if a user bookmarks a lot, you never even > switch to "hash mode". For file content hashes it's a huge win. Does bookmark count check take into account edited bookmarks ? Like, you have an existing bookmark with URL X . Now, instead of adding a new bookmark, you edit the existing bookmark to point to URL Y.

Richard Newman [:rnewman]

Reporter

Comment 23

•

13 years ago

(In reply to mayankleoboy1 from comment #22) > Does bookmark count check take into account edited bookmarks ? Like, you > have an existing bookmark with URL X . Now, instead of adding a new > bookmark, you edit the existing bookmark to point to URL Y. Nope, but it doesn't need to -- it's a "positive shortcut", not a "negative shortcut". That is, if the count has changed we know we need to do a new backup. If it hasn't, we need to check contents.

Raymond Lee [:raymondlee]

Comment 24

•

12 years ago

Since the patch in bug 859695 also touches the same code, I will wait for bug 859695 to land first before working on this.

Depends on: 859695

Raymond Lee [:raymondlee]

Comment 25

•

12 years ago

Attached patch v4 WIP -w (obsolete) — Details — Splinter Review

Marco: this is based on comment 17. Please have a look and give me some feedback. Thanks

Attachment #726533 - Attachment is obsolete: true

Attachment #798445 - Flags: feedback?(mak77)

Raymond Lee [:raymondlee]

Comment 26

•

12 years ago

Attached patch v4 (obsolete) — Details — Splinter Review

I've added a new test and fixed issues in other tests.

Attachment #798445 - Attachment is obsolete: true

Attachment #798445 - Flags: feedback?(mak77)

Attachment #798768 - Flags: review?(mak77)

Raymond Lee [:raymondlee]

Comment 27

•

12 years ago

(In reply to Raymond Lee [:raymondlee] from comment #26) > Created attachment 798768 [details] [diff] [review] > v4 > > I've added a new test and fixed issues in other tests. This patch requires the one from bug 859695

Raymond Lee [:raymondlee]

Updated

•

12 years ago

Attachment #798768 - Flags: review?(mak77)

Raymond Lee [:raymondlee]

Comment 28

•

12 years ago

Attached patch v5 (obsolete) — Details — Splinter Review

This requires bug 859695.

Attachment #798768 - Attachment is obsolete: true

Attachment #802176 - Flags: review?

Attachment #802176 - Flags: feedback?(mak77)

Benjamin Smedberg

Updated

•

12 years ago

Attachment #802176 - Flags: review?

Marco Bonardo [:mak]

Assignee

Comment 29

•

12 years ago

Comment on attachment 802176 [details] [diff] [review] v5 Review of attachment 802176 [details] [diff] [review]: ----------------------------------------------------------------- nit: I'd probably rename what you call "result" to "backupInfo", to avoid confusion with Places results. ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +108,5 @@ > + * @resolves When node have been serialized. > + * @rejects JavaScript exception. > + */ > + serializeNodeAsJSONString: function BJU_serializeNodeAsJSONString( > + aNode, aIsUICommand, aResolveShortcuts, aExcludeItems) { There's something wrong here, the method expects arguments, but all of the callpoints are passing nothing, and you serialize the root. I suspect you may rather want to make this exportToJSONString that just works like exportToFile? ::: toolkit/components/places/PlacesBackups.jsm @@ +33,5 @@ > localizedFilename.substr(0, localizedFilename.indexOf("-")); > delete this._filenamesRegex; > return this._filenamesRegex = > + new RegExp("^(bookmarks|" + localizedFilenamePrefix + ")-" + > + "([0-9-]+)(_[0-9]+){0,1}(_.{32}){0,1}\.(json|html)"); please file a bug to remove localizedFilenamePrefix from this code. @@ +322,5 @@ > let mostRecentBackupFile = yield this.getMostRecentBackup(); > > if (!aForceBackup) { > + if (aMaxBackups === 0) { > + yield this._deleteOldBackups(aMaxBackups); You can just pass zero, and merge the if conditions into a single if @@ +353,3 @@ > > + try { > + let view = new Uint8Array(result.byteArray); while I see where it comes from, I find the "view" naming a bit confusing in our code... maybe I'd just call this buffer or data. ::: toolkit/components/places/tests/bookmarks/test_818584-discard-duplicate-bookmarks.js @@ +46,5 @@ > + yield PlacesBackups.create(); > + recentBackup = yield PlacesBackups.getMostRecentBackup(); > + do_check_neq(recentBackup.path, oldBackupFile.path); > + do_check_eq((yield PlacesBackups.getBackupFiles()).length, 2); > + nit trailing space

Attachment #802176 - Flags: feedback?(mak77) → feedback+

Raymond Lee [:raymondlee]

Updated

•

12 years ago

Blocks: 923406

Raymond Lee [:raymondlee]

Comment 30

•

12 years ago

Attached patch v6 (obsolete) — Details — Splinter Review

Updated the patch based on the new patch in bug 859695. (In reply to Marco Bonardo [:mak] from comment #29) > Comment on attachment 802176 [details] [diff] [review] > v5 > > Review of attachment 802176 [details] [diff] [review]: > ----------------------------------------------------------------- > > nit: I'd probably rename what you call "result" to "backupInfo", to avoid > confusion with Places results. > > ::: toolkit/components/places/BookmarkJSONUtils.jsm > @@ +108,5 @@ > > + * @resolves When node have been serialized. > > + * @rejects JavaScript exception. > > + */ > > + serializeNodeAsJSONString: function BJU_serializeNodeAsJSONString( > > + aNode, aIsUICommand, aResolveShortcuts, aExcludeItems) { > > There's something wrong here, the method expects arguments, but all of the > callpoints are passing nothing, and you serialize the root. > > I suspect you may rather want to make this exportToJSONString that just > works like exportToFile? Updated. > > ::: toolkit/components/places/PlacesBackups.jsm > @@ +33,5 @@ > > localizedFilename.substr(0, localizedFilename.indexOf("-")); > > delete this._filenamesRegex; > > return this._filenamesRegex = > > + new RegExp("^(bookmarks|" + localizedFilenamePrefix + ")-" + > > + "([0-9-]+)(_[0-9]+){0,1}(_.{32}){0,1}\.(json|html)"); > > please file a bug to remove localizedFilenamePrefix from this code. bug 923406 > > @@ +322,5 @@ > > let mostRecentBackupFile = yield this.getMostRecentBackup(); > > > > if (!aForceBackup) { > > + if (aMaxBackups === 0) { > > + yield this._deleteOldBackups(aMaxBackups); > > You can just pass zero, and merge the if conditions into a single if > > @@ +353,3 @@ > > > > + try { > > + let view = new Uint8Array(result.byteArray); > > while I see where it comes from, I find the "view" naming a bit confusing in > our code... maybe I'd just call this buffer or data. Done. > > ::: > toolkit/components/places/tests/bookmarks/test_818584-discard-duplicate- > bookmarks.js > @@ +46,5 @@ > > + yield PlacesBackups.create(); > > + recentBackup = yield PlacesBackups.getMostRecentBackup(); > > + do_check_neq(recentBackup.path, oldBackupFile.path); > > + do_check_eq((yield PlacesBackups.getBackupFiles()).length, 2); > > + > > nit trailing space Done.

Attachment #802176 - Attachment is obsolete: true

Attachment #813469 - Flags: review?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 31

•

12 years ago

Comment on attachment 813469 [details] [diff] [review] v6 Review of attachment 813469 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +527,5 @@ > + let buffer = new Uint8Array(exportedData.byteArray); > + yield OS.File.writeAtomic(aLocalFile.path, buffer, > + {tmpPath: aLocalFile.path + ".tmp"}); > + } catch (e) { > + throw e; what's up here? ::: toolkit/components/places/PlacesBackups.jsm @@ +346,1 @@ > } ditto @@ +417,5 @@ > + > + if (matches && matches[4]) > + hash = matches[4].replace(/^_/, ""); > + > + return hash; I'd prefer to avoid the hash var since doesn't seem needed if (...) return ... return null; ::: toolkit/components/places/tests/bookmarks/xpcshell.ini @@ +29,5 @@ > [test_675416.js] > [test_711914.js] > [test_protectRoots.js] > [test_818593-store-backup-metadata.js] > +[test_818584-discard-duplicate-bookmarks.js] looks like you forgot to add the file

Attachment #813469 - Flags: review?(mak77) → feedback+

Raymond Lee [:raymondlee]

Comment 32

•

12 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

(In reply to Marco Bonardo [:mak] from comment #31) > Comment on attachment 813469 [details] [diff] [review] > v6 > > Review of attachment 813469 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/components/places/BookmarkJSONUtils.jsm > @@ +527,5 @@ > > + let buffer = new Uint8Array(exportedData.byteArray); > > + yield OS.File.writeAtomic(aLocalFile.path, buffer, > > + {tmpPath: aLocalFile.path + ".tmp"}); > > + } catch (e) { > > + throw e; > > what's up here? > > ::: toolkit/components/places/PlacesBackups.jsm > @@ +346,1 @@ > > } > > ditto Removed. > > @@ +417,5 @@ > > + > > + if (matches && matches[4]) > > + hash = matches[4].replace(/^_/, ""); > > + > > + return hash; > > I'd prefer to avoid the hash var since doesn't seem needed > if (...) return ... > return null; > Updated. > ::: toolkit/components/places/tests/bookmarks/xpcshell.ini > @@ +29,5 @@ > > [test_675416.js] > > [test_711914.js] > > [test_protectRoots.js] > > [test_818593-store-backup-metadata.js] > > +[test_818584-discard-duplicate-bookmarks.js] > > looks like you forgot to add the file Added it back However, the tests for debug build crashed. Still investigating... https://tbpl.mozilla.org/?tree=Try&rev=5ff4741ee62e 02:50:31 INFO - WARNING: NS_ENSURE_TRUE(thread) failed: file ../../../../netwerk/base/src/nsSocketTransportService2.cpp, line 115 02:50:31 INFO - ###!!! ASSERTION: nsNSSComponent relies on profile manager to wait for synchronous shutdown of all network activity: 'mIsNetworkDown', file ../../../../../security/manager/ssl/src/nsNSSComponent.cpp, line 1835

Attachment #813469 - Attachment is obsolete: true

Attachment #814054 - Flags: feedback?(mak77)

Raymond Lee [:raymondlee]

Comment 33

•

12 years ago

Attached patch v8 (obsolete) — Details — Splinter Review

I have spent some time to investigate why this patch doesn't work on debug build, however, didn't have much lunch. It's definitely related to wait for synchronous shutdown. May be it takes too long for the sync shutdown to complete and the code doesn't like that. Running ./mach xpcshell-test browser/components/places/tests/unit/ would show the failing tests. Marco: do you have any suggestion?

Attachment #814054 - Attachment is obsolete: true

Attachment #814054 - Flags: feedback?(mak77)

Flags: needinfo?(mak77)

Raymond Lee [:raymondlee]

Comment 34

•

12 years ago

Attached file console-log.txt (obsolete) — Details

Console log related to the issue

Raymond Lee [:raymondlee]

Comment 35

•

12 years ago

(In reply to Raymond Lee [:raymondlee] from comment #33) > Created attachment 814326 [details] [diff] [review] > v8 > > I have spent some time to investigate why this patch doesn't work on debug > build, however, didn't have much lunch. > > It's definitely related to wait for synchronous shutdown. May be it takes > too long for the sync shutdown to complete and the code doesn't like that. > > Running ./mach xpcshell-test browser/components/places/tests/unit/ would > show the failing tests. > > Marco: do you have any suggestion? May we should wait for bug 887043 and bug 855226

Marco Bonardo [:mak]

Assignee

Updated

•

12 years ago

Depends on: 824433

Flags: needinfo?(mak77)

Marco Mucci [:MarcoM]

Updated

•

12 years ago

Blocks: fxdesktopbacklog

Whiteboard: p=0

Marco Bonardo [:mak]

Assignee

Updated

•

12 years ago

No longer blocks: 923406

Marco Bonardo [:mak]

Assignee

Updated

•

12 years ago

No longer depends on: 824433

Marco Bonardo [:mak]

Assignee

Comment 36

•

12 years ago

I'll start from unbitrotting the latest patch and see what's left to do.

Assignee: raymond → mak77

Marco Bonardo [:mak]

Assignee

Comment 37

•

12 years ago

Attached patch v9 (obsolete) — Details — Splinter Review

This passed all of the tests, plus the new test The hash is md5 with '/' replaced by '-' to use it in the filename

Attachment #814326 - Attachment is obsolete: true

Attachment #814327 - Attachment is obsolete: true

Attachment #8379217 - Flags: review?(mano)

Marco Bonardo [:mak]

Assignee

Updated

•

12 years ago

Blocks: 818587

Marco Bonardo [:mak]

Assignee

Comment 38

•

12 years ago

Attached patch v10 (obsolete) — Details — Splinter Review

Just an unbitrot

Attachment #8379217 - Attachment is obsolete: true

Attachment #8379217 - Flags: review?(mano)

Attachment #8387551 - Flags: review?(mano)

Asaf Romano (gone)

Comment 39

•

12 years ago

Comment on attachment 8387551 [details] [diff] [review] v10 ># HG changeset patch ># User Marco Bonardo <mbonardo@mozilla.com> >Bug 818584 - Discard consecutive duplicate bookmark backups by comparing hashes r=mano >Original patch by Raymond Lee <raymond@appcoast.com> f=mak > >diff --git a/toolkit/components/places/BookmarkJSONUtils.jsm b/toolkit/components/places/BookmarkJSONUtils.jsm >--- a/toolkit/components/places/BookmarkJSONUtils.jsm >+++ b/toolkit/components/places/BookmarkJSONUtils.jsm >+function generateHash(aString) { >+ let cryptoHash = Cc["@mozilla.org/security/hash;1"] >+ .createInstance(Ci.nsICryptoHash); >+ cryptoHash.init(Ci.nsICryptoHash.MD5); >+ let stringStream = Cc["@mozilla.org/io/string-input-stream;1"] >+ .createInstance(Ci.nsIStringInputStream); >+ stringStream.data = aString; >+ cryptoHash.updateFromStream(stringStream, -1); >+ // base64 allows the '/' char, but we can't use it for filenames. >+ return cryptoHash.finish(true).replace("/", "-"); At first I was worried because base64 is case-sensitive, but I guess this works because it's not the hash that makes the file name unique. This should be documented somewhere here. Otherwise the method may be reused in the future in an improper way leading to quite unfortunate bugs. >+} >+ > this.BookmarkJSONUtils = Object.freeze({ > /** > * Import bookmarks from a url. > * > * @param aSpec > * url of the bookmark data. > * @param aReplace > * Boolean if true, replace existing bookmarks, else merge. >@@ -94,23 +109,29 @@ this.BookmarkJSONUtils = Object.freeze({ > }); > }, > > /** > * Serializes bookmarks using JSON, and writes to the supplied file path. > * > * @param aFilePath > * OS.File path string for the "bookmarks.json" file to be created. >- * >+ * @param [optional] aOptions >+ * Object containing options for the export: >+ * - failIfSameHash: if the generated file would have the same hash >+ * defined here, will reject with ex.becauseSameHash The naming suggest a boolean, maybe fallIfHasIs? >- exportToFile: function BJU_exportToFile(aFilePath) { >+ exportToFile: function BJU_exportToFile(aFilePath, aOptions) { I would default-set aOptions to {} > if (aFilePath instanceof Ci.nsIFile) { > Deprecated.warning("Passing an nsIFile to BookmarksJSONUtils.exportToFile " + > "is deprecated. Please use an OS.File path string instead.", > "https://developer.mozilla.org/docs/JavaScript_OS.File"); > aFilePath = aFilePath.path; > } > return Task.spawn(function* () { > let [bookmarks, count] = yield PlacesBackups.getBookmarksTree(); >@@ -120,22 +141,39 @@ this.BookmarkJSONUtils = Object.freeze({ > try { > Services.telemetry > .getHistogramById("PLACES_BACKUPS_TOJSON_MS") > .add(Date.now() - startTime); > } catch (ex) { > Components.utils.reportError("Unable to report telemetry."); > } > >+ startTime = Date.now(); >+ let hash = generateHash(jsonString); >+ // Report the time taken to generate the hash. >+ try { >+ Services.telemetry >+ .getHistogramById("PLACES_BACKUPS_HASHING_MS") >+ .add(Date.now() - startTime); >+ } catch (ex) { >+ Components.utils.reportError("Unable to report telemetry."); >+ } >+ >+ if (aOptions && hash === aOptions.failIfSameHash) { >+ let e = new Error("Hash conflict"); >+ Object.defineProperty(e, "becauseSameHash", { get: () => true }); Any reason not to set e.becauseSameHash the simple way? If there's one, document it here. Or you could define a SameHash error object in this module and then use instanseof checks. By the way, I wonder if we could somehow encapsulate the entire process in this module. This setup as a whole seems somewhat complex. >diff --git a/toolkit/components/places/PlacesBackups.jsm b/toolkit/components/places/PlacesBackups.jsm >--- a/toolkit/components/places/PlacesBackups.jsm >+++ b/toolkit/components/places/PlacesBackups.jsm >+ /** >+ * Matches the backup filename: >+ * 0: file name >+ * 1: date in form Y-m-d >+ * 2: bookmarks count >+ * 3: contents hash >+ * 4: file extension >+ */ >+ get filenamesRegex() { >+ delete this.filenamesRegex; >+ return this.filenamesRegex = >+ new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(json|html)", "i"); > }, > Instead of making it even more public than it already is, take it out of this object. Tests can redefine it. review+, but if you do decide to structure this differently in a more simple way, I can look at it again.

Attachment #8387551 - Flags: review?(mano) → review+

Marco Bonardo [:mak]

Assignee

Comment 40

•

12 years ago

(In reply to Mano from comment #39) > By the way, I wonder if we could somehow encapsulate the entire process in > this module. This setup as a whole seems somewhat complex. What does this mean? > >diff --git a/toolkit/components/places/PlacesBackups.jsm b/toolkit/components/places/PlacesBackups.jsm > >--- a/toolkit/components/places/PlacesBackups.jsm > >+++ b/toolkit/components/places/PlacesBackups.jsm > > >+ /** > >+ * Matches the backup filename: > >+ * 0: file name > >+ * 1: date in form Y-m-d > >+ * 2: bookmarks count > >+ * 3: contents hash > >+ * 4: file extension > >+ */ > >+ get filenamesRegex() { > >+ delete this.filenamesRegex; > >+ return this.filenamesRegex = > >+ new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(json|html)", "i"); > > }, > > > > Instead of making it even more public than it already is, take it out of > this object. Tests can redefine it. I'm exactly making this public cause I'm bored of having to redefine it in tests every time. Is there a reason I should not make this public? It's a good regex to match against our backups naming.

Flags: needinfo?(mano)

Marco Bonardo [:mak]

Assignee

Comment 41

•

12 years ago

(In reply to Marco Bonardo [:mak] from comment #40) > (In reply to Mano from comment #39) > > By the way, I wonder if we could somehow encapsulate the entire process in > > this module. This setup as a whole seems somewhat complex. > > What does this mean? Ah maybe I got it, the fact is that I want the jsonUtils API to be very simple (and mirroring the html API) and handle all of the complexity in PlacesBackups. it wouldn't make much sense to move our specific backups behavior into a generic util to store bookmarks to json, imo.

Marco Bonardo [:mak]

Assignee

Comment 42

•

12 years ago

discussed over IRC

Flags: needinfo?(mano)

Marco Bonardo [:mak]

Assignee

Comment 43

•

12 years ago

Attached patch v11 (obsolete) — Details — Splinter Review

Attachment #8387551 - Attachment is obsolete: true

Marco Bonardo [:mak]

Assignee

Comment 44

•

12 years ago

https://hg.mozilla.org/integration/fx-team/rev/d16f3e084b27

Flags: in-testsuite+

Target Milestone: --- → mozilla31

Marco Mucci [:MarcoM]

Updated

•

12 years ago

Whiteboard: p=0 → p=5 s=it-31c-30a-29b.1

Ryan VanderMeulen [:RyanVM]

Comment 45

•

12 years ago

Backed out for xpcshell failures. https://hg.mozilla.org/integration/fx-team/rev/f5f7d7692869 https://tbpl.mozilla.org/php/getParsedLog.php?id=36394203&tree=Fx-Team https://hg.mozilla.org/integration/fx-team/rev/34d970dcb450

Ryan VanderMeulen [:RyanVM]

Comment 46

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=36393418&tree=Fx-Team rather

Marco Bonardo [:mak]

Assignee

Comment 47

•

12 years ago

The only far relation I see is that sync uses PlacesBackups and BookmarkJSONUtils modules. I will wait to see if the oranges go away before trying to figure what's up there, since this is very strange :/

u279076

Comment 48

•

12 years ago

Flagging as [qa-] since this has in-testsuite coverage. Please advise if this needs any special QA testing before we release this change to GA.

Whiteboard: p=5 s=it-31c-30a-29b.1 → p=5 s=it-31c-30a-29b.1 [qa-]

Ryan VanderMeulen [:RyanVM]

Comment 49

•

12 years ago

This is the reoccurring theme in all the failures: 10:02:09 INFO - 1395248230077 Sync.EngineManager ERROR Engine 'bookmarks' is already registered!

Marco Bonardo [:mak]

Assignee

Comment 50

•

12 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #49) > This is the reoccurring theme in all the failures: > > 10:02:09 INFO - 1395248230077 Sync.EngineManager ERROR Engine > 'bookmarks' is already registered! this happens regardless of my patch and of a failure. It just happens everytime.

Marco Bonardo [:mak]

Assignee

Comment 51

•

12 years ago

Locally the test timed out here 0:10.38 Sqlite.Connection.places.sqlite DEBUG Conn #10: Stmt #0 finished. 0:10.38 Sqlite.Connection.places.sqlite DEBUG Conn #10: Request to close connection. 0:10.39 Sqlite.Connection.places.sqlite DEBUG Conn #10: Finalizing connection. 0:10.39 Sqlite.Connection.places.sqlite DEBUG Conn #10: Calling asyncClose(). 0:10.39 Sqlite.Connection.places.sqlite INFO Conn #10: Closed 0:10.83 Sync.Tracker.Bookmarks DEBUG Saving changed IDs to bookmarks

Marco Bonardo [:mak]

Assignee

Comment 52

•

12 years ago

OK, looks like the problematic call is the yield PlacesBackups.create in BStore_wipe. I indeed touched this method, so investigating.

Marco Bonardo [:mak]

Assignee

Comment 53

•

12 years ago

Attached patch patch v12 — Details — Splinter Review

it was a very dumb issue, we were replacing only one possible instance of the char '/' in the base64 hash, just adding a "g" to the replace made us properly replace all of the instances. The additional '/' were causing OS.File to fail moving the file, and since the hash is generated randomly, that explains the intermittent failure.

Attachment #8393453 - Attachment is obsolete: true

Marco Bonardo [:mak]

Assignee

Comment 54

•

12 years ago

https://hg.mozilla.org/integration/fx-team/rev/5cf8679fa840

Phil Ringnalda (:philor)

Comment 55

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/5cf8679fa840

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Marco Mucci [:MarcoM]

Updated

•

12 years ago

Status: RESOLVED → VERIFIED

(mostly gone) XtC4UaLL [:xtc4uall]

Updated

•

12 years ago

Depends on: 989860

Marco Mucci [:MarcoM]

Updated

•

12 years ago

No longer blocks: fxdesktopbacklog

Flags: firefox-backlog+

Masatoshi Kimura [:emk]

Updated

•

6 years ago

Regressions: 1559403

WIP - preference approach 13 years ago Raymond Lee [:raymondlee] 10.64 KB, patch		Details \| Diff \| Splinter Review
v2 13 years ago Raymond Lee [:raymondlee] 11.85 KB, patch		Details \| Diff \| Splinter Review
v3 13 years ago Raymond Lee [:raymondlee] 14.28 KB, patch		Details \| Diff \| Splinter Review
v4 WIP -w 12 years ago Raymond Lee [:raymondlee] 23.80 KB, patch		Details \| Diff \| Splinter Review
v4 12 years ago Raymond Lee [:raymondlee] 32.03 KB, patch		Details \| Diff \| Splinter Review
v5 12 years ago Raymond Lee [:raymondlee] 32.19 KB, patch	mak : feedback+	Details \| Diff \| Splinter Review
v6 12 years ago Raymond Lee [:raymondlee] 30.29 KB, patch	mak : feedback+	Details \| Diff \| Splinter Review
v7 12 years ago Raymond Lee [:raymondlee] 32.30 KB, patch		Details \| Diff \| Splinter Review
v8 12 years ago Raymond Lee [:raymondlee] 31.76 KB, patch		Details \| Diff \| Splinter Review
console-log.txt 12 years ago Raymond Lee [:raymondlee] 7.63 KB, text/plain		Details
v9 12 years ago Marco Bonardo [:mak] 36.70 KB, patch		Details \| Diff \| Splinter Review
v10 12 years ago Marco Bonardo [:mak] 36.69 KB, patch	asaf : review+	Details \| Diff \| Splinter Review
v11 12 years ago Marco Bonardo [:mak] 38.17 KB, patch		Details \| Diff \| Splinter Review
patch v12 12 years ago Marco Bonardo [:mak] 39.19 KB, patch		Details \| Diff \| Splinter Review