Closed Bug 818587 Opened 12 years ago Closed 11 years ago

Compress bookmark backups

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla32

People

(Reporter: rnewman, Assigned: ahameez)

References

(Depends on 1 open bug)

Details

(Whiteboard: [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.3 [qa-])

Attachments

(1 file, 6 obsolete files)

See Bug 818399. Compress on disk, or compress as we write. I think gps just wrote some code to do on-the-fly stream compression in JS…
Blocks: 818399
No longer depends on: 818399, 818584
I think we need bug 490661, that is async backups. Though we could be able to juse hand-off the compression part.
We can do the compression in a stream-based manner. Although, I didn't write a convenient API for that. I did write an API for applying encodings on whole strings. If we are compressing JSON, there is no stream-based API for obtaining the JSON representing, sadly. So, to compress JSON, you will need to obtain the monolithic JSON string from a monolithic object to feed into the compressor. At that point, you've already taken the memory hit for a large buffer. JSON typically compresses very well, so I'm not too worried about the additional memory hit from the compressed buffer. Others may have different opinions. I /really/ wish there were stream-based APIs all along the stack.
We could also compress the JSON files on disk. Write out first, then compress, either immediately or during the pruning process.
note there is a fact we didn't consider here, that is on downgrade the backups should still be readable... if we compress them only versions understanding the compressed version will be able to use them.
(In reply to Marco Bonardo [:mak] from comment #4) > note there is a fact we didn't consider here, that is on downgrade the > backups should still be readable... if we compress them only versions > understanding the compressed version will be able to use them. Land decompression support ASAP. Enable compression N releases later. Add SUMO doc for people who are trying to downgrade by more than N releases. (Decompression support should be in way earlier than fx24, so it hits the next ESR release.)
sure, we can do that, was just pointing out an issue to take into account. I don't think there's much more here... well I'd like if the compressed files we create would be expandable with common compression tools, in case the user wants to share the json with other services.
(In reply to Marco Bonardo [:mak] from comment #6) > well I'd like if the compressed files > we create would be expandable with common compression tools, in case the > user wants to share the json with other services. Yes I sometimes open JSON files for read-only purposes, so a .gz extension would be nice! That would also make it the support for both uncompressed and compressed files easier. BTW gzip'ing a 1.3MB bookmarks JSON file takes 0.09s on a Celeron 1.8 Ghz and 0.22s on a first-gen Atom clocked at 800MHz, and reduces it to 312 KB.
we can use lz4 compression from Os.file now, though we need to wait for bug 952335 since otherwise on Mac it's not working properly. I have a testing patch that creates .jsonlz4 backups (doesn't read them yet, but should not be hard).
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Depends on: 952335
Whiteboard: p=0
I don't have the resources to work on this atm, though would be a good project to mentor. Basically we want to lz4 compress only backups created into the bookmarkbackups folder, but not others created by the user. the code should still be able to read both formats. The extension for the compressed backups will be .jsonlz4, and it will be used to distinguish the behavior in importFromFile. The patch should be on top of bug 818584, should add a compress option to BookmarkJSONUtils.exportToFile's aOptions, should use it in PlacesBakups.jsm wherever we store an automatic backup: create() and saveBookmarksToJSONFile(). The latter is a little bit more problematic since it copies an uncompressed backup to the backups folder, so it should probably open, read and recompress it instead of doing a plain copy. We clearly need to ensure existing xpcshell tests are covering both formats and ensuring the expected behavior.
Assignee: mak77 → nobody
Depends on: 818584
Whiteboard: p=0 → [mentor=mak][lang=js]p=0
No longer blocks: fxdesktopbacklog
Status: ASSIGNED → NEW
Flags: firefox-backlog+
Whiteboard: [mentor=mak][lang=js]p=0 → [mentor=mak][lang=js] p=0
Attached patch bug818587_compressionlz4.diff (obsolete) — Splinter Review
So I just started the process of making a patch for this. I have created the options to save the backup files using lz4 compression and updating create in PlacesBackup.jsm to save using lz4. As I'm still a beginner in contributing and dealing with such a large code base, I'd like some feedback on what I have coded now before proceeding onto updating saveBookmarksToJSONFile and the import methods. Thanks.
Attachment #8423388 - Flags: feedback?(mak77)
Comment on attachment 8423388 [details] [diff] [review] bug818587_compressionlz4.diff Review of attachment 8423388 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Althaf Hameez from comment #10) > So I just started the process of making a patch for this. I have created the > options to save the backup files using lz4 compression and updating create > in PlacesBackup.jsm to save using lz4. Hi, first of all thank you for your contribution! Sure, there's still quite some work to do here, but no worries, there's time :) ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +169,5 @@ > > // Do not write to the tmp folder, otherwise if it has a different > // filesystem writeAtomic will fail. Eventual dangling .tmp files should > // be cleaned up by the caller. > + if (aOptions.compress){ per coding style, please add a space before the { @@ +179,1 @@ > { tmpPath: OS.Path.join(aFilePath + ".tmp") }); I think it would be nicer to build the options object apart, in pseudo-code: let backupOptions = { tmpPath: ...}; if (compress) backupOptions.compression = "lz4"; yield OS.File.write(bla, bla, backupOptions); ::: toolkit/components/places/PlacesBackups.jsm @@ +30,5 @@ > () => Components.Constructor("@mozilla.org/file/local;1", > "nsILocalFile", "initWithPath")); > > XPCOMUtils.defineLazyGetter(this, "filenamesRegex", > + () => new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(jsonlz4|html|json)", "i") not important: why inverting json and html, just a random change or is there a reason? fwiw, it's a lot of time we don't support anymore html backups, so we should remove html from here and any other place where we do json|html... and that means we can simplify some regexes since .json will be in common between .json and .jsonlz4 @@ +233,3 @@ > * @return A bookmarks backup filename. > */ > + getFilenameForDate: function PB_getFilenameForDate(aDateObj,aCompress) { add space after comma @@ +236,2 @@ > let dateObj = aDateObj || new Date(); > + let compress = aCompress || false; there's no need to do this, since if aCompress is not passed-in it will be undefined, and if (undefined) is basically equivalent in this case. @@ +239,4 @@ > // Use YYYY-MM-DD (ISO 8601) as it doesn't contain illegal characters > // and makes the alphabetical order of multiple backup files more useful. > + return "bookmarks-" + dateObj.toLocaleFormat("%Y-%m-%d") + ".jsonlz4"; > + }else{ remember that per coding style if/else should be spaced as: if (condition) { ... } else { ... } @@ +240,5 @@ > // and makes the alphabetical order of multiple backup files more useful. > + return "bookmarks-" + dateObj.toLocaleFormat("%Y-%m-%d") + ".jsonlz4"; > + }else{ > + return "bookmarks-" + dateObj.toLocaleFormat("%Y-%m-%d") + ".json"; > + } return "blablabla.json" + (compress ? "lz4" : ""); @@ +272,5 @@ > Deprecated.warning( > "PlacesBackups.getMostRecent is deprecated and will be removed in a future version", > "https://bugzilla.mozilla.org/show_bug.cgi?id=859695"); > > + let fileExt = aFileExt || "(jsonlz4|html|json)"; see below the getMostRecentBackup comment... @@ +292,5 @@ > * @result the path to the file. > */ > getMostRecentBackup: function PB_getMostRecentBackup(aFileExt) { > return Task.spawn(function* () { > + let fileExt = aFileExt || "(jsonlz4|html|json)"; we should remove the aFileExt argument, it is only used in a couple places with "json", that is the default btw (http://mxr.mozilla.org/mozilla-central/search?string=getMostRecentBackup) please file a separate bug to do that, and you can build the patch here on top of that one. @@ +375,5 @@ > }, > > /** > * Creates a dated backup in <profile>/bookmarkbackups. > + * Stores the bookmarks using lz4 compression this should still state JSON somewhere, I'd say "using a lz4 compressed JSON file." @@ +412,5 @@ > > // Ensure to initialize _backupFiles > if (!this._backupFiles) > yield this.getBackupFiles(); > + let newBackupFilename = this.getFilenameForDate(new Date(), true); you can pass undefined as the first param since it's optional @@ +439,5 @@ > let newFilenameWithMetaData; > try { > + let { count: nodeCount, hash: hash } = > + yield BookmarkJSONUtils.exportToFile(newBackupFile, > + { compress: true, please fix indentation so that the brace aligns with the previous argument I also think you indented this one space more, code should be indented by 2 spaces for each level (no tabs, only spaces, please check your text editor settings). ::: toolkit/components/places/tests/bookmarks/test_818587_compress-bookmarks.js @@ +6,5 @@ > + > +function run_test() { > + do_test_pending(); > + > + Task.spawn(function() { we have better harness utils now, you can now just do function run_test() { run_next_test(); } add_task(function* my_test_function() { ... }); just give the function a meaningful name and use it as a Task generator function. you don't need to call do_test_pending or do_test_finished, just to yield where needed. @@ +14,5 @@ > + let files = bookmarksBackupDir.directoryEntries; > + while (files.hasMoreElements()) { > + let entry = files.getNext().QueryInterface(Ci.nsIFile); > + entry.remove(false); > + } Each xpcshell-test gets a clean profile, so I'd expect there's no existing backups at all. @@ +16,5 @@ > + let entry = files.getNext().QueryInterface(Ci.nsIFile); > + entry.remove(false); > + } > + // Check for jsonlz4 extension > + let todayFilename = PlacesBackups.getFilenameForDate(new Date(2014,04,15),true); space after comma @@ +17,5 @@ > + entry.remove(false); > + } > + // Check for jsonlz4 extension > + let todayFilename = PlacesBackups.getFilenameForDate(new Date(2014,04,15),true); > + do_check_eq(todayFilename,"bookmarks-2014-05-15.jsonlz4"); this works today, but will be broken tomorrow. You should just .test the string with a regexp that checks for .jsonlz4 extension @@ +25,5 @@ > + // Check that a backup for today has been created and the regex works fine for lz4. > + do_check_eq((yield PlacesBackups.getBackupFiles()).length, 1); > + let mostRecentBackupFile = yield PlacesBackups.getMostRecentBackup(); > + do_check_neq(mostRecentBackupFile, null); > + do_check_true(PlacesBackups.filenamesRegex.test(OS.Path.basename(mostRecentBackupFile))); this is fine, but you should verify the name is the same as you built before (more literally, we care that this file has .jsonlz4 extension). @@ +27,5 @@ > + let mostRecentBackupFile = yield PlacesBackups.getMostRecentBackup(); > + do_check_neq(mostRecentBackupFile, null); > + do_check_true(PlacesBackups.filenamesRegex.test(OS.Path.basename(mostRecentBackupFile))); > + > + // Cleanup. you should now try to restore the backup and check the restore from this lz4 file succeeded. @@ +29,5 @@ > + do_check_true(PlacesBackups.filenamesRegex.test(OS.Path.basename(mostRecentBackupFile))); > + > + // Cleanup. > + mostRecentBackupFile = new FileUtils.File(mostRecentBackupFile); > + mostRecentBackupFile.remove(false); please use OS.File in new code (https://developer.mozilla.org/en-US/docs/JavaScript_OS.File) we are trying to avoid nsIFile as far as possible @@ +30,5 @@ > + > + // Cleanup. > + mostRecentBackupFile = new FileUtils.File(mostRecentBackupFile); > + mostRecentBackupFile.remove(false); > + do_check_false(mostRecentBackupFile.exists()); unfortunately on Windows remove often throws due to long locks and thus it should be protected with a try, but please use OS.File, as I said ::: toolkit/components/places/tests/bookmarks/xpcshell.ini @@ +29,5 @@ > [test_711914.js] > [test_protectRoots.js] > [test_818593-store-backup-metadata.js] > [test_818584-discard-duplicate-backups.js] > +[test_818587_compress-bookmarks.js] \ No newline at end of file I'd put the word backups in the test name, we are not compressing any bookmark, just the backups. compress-bookmarks-backups may be fine.. note that there are many other tests that check backups and may start failing after these changes, since they may need adjustments.
Attachment #8423388 - Flags: feedback?(mak77) → feedback+
Assignee: nobody → althaf.mozilla
Status: NEW → ASSIGNED
I suggest using extension mozLz4, given that our wrapper is unfortunately not compatible with existing lz4 implementations.
I don't think we should hide the fact it's still a json, .jsonlz4 doesn't have to be compatible with any special lz4 decompressor, it's a custom extension that has the same value as mozlz4 (I'd agree if I'd have suggested .json.lz4)
Thank you for the very valuable feedback Marco. Just to clarify > not important: why inverting json and html, just a random change or is there a reason? This was something that caused me a lot of head scratching actually because the extension for the compressed file kept ending up as json instead of jsonlz4. This was due to the function appendMetaDataToFilename which added the hash and then replaced the filename with first match found. Since json is found first even in jsonlz4, the jsonlz4 extension was replaced back with json. Hence the swap around. If html is to be removed maybe I can look into simplifying the regex further. Still learning how to write up tests and figuring everything out so I just modified an already existing test test_477583_json-backup-in-future.js hence the usage of FileUtils. I guess I can file a bug to remove it there too and if there are any other existing tests that FileUtils instead of OS.File
(In reply to Althaf Hameez [:ahameez] from comment #14) > This was something that caused me a lot of head scratching actually because > the extension for the compressed file kept ending up as json instead of > jsonlz4. This was due to the function appendMetaDataToFilename which added > the hash and then replaced the filename with first match found. Since json > is found first even in jsonlz4, the jsonlz4 extension was replaced back with > json. Hence the swap around. If html is to be removed maybe I can look into > simplifying the regex further. Ah nice find, you should usually document these findings in the code with a comment explaining things are done in a certain way due to a specific reason. This helps both future contributors and the reviewer :) > Still learning how to write up tests and figuring everything out so I just > modified an already existing test test_477583_json-backup-in-future.js hence > the usage of FileUtils. I guess I can file a bug to remove it there too and > if there are any other existing tests that FileUtils instead of OS.File That's ok, you don't have to know everything, we are here to help. We are not head down into rewriting tests cause there are too many, so far we try to avoid "bad" stuff in new tests, and when we modify old tests we fix them at the same time. I guess filing a bug for each test that should be converted may be nice from an organizational point of view, but also very time-consuming, so I don't think you should spend time on that now.
Depends on: 1011581
Sorry if I didn't answer on IRC, I was afk (even though I forgot to set away...). Your question was whether instead of taking the uncompressed bookmark and compressing it, we could instead just export bookmarks again in the backups folder. Well, yes, we could do that, though collecting all of the bookmarks for some users (about 2% of users) is taking more than 1 second (you can use the telemetry dashboard to check public telemetry data: http://telemetry.mozilla.org/#release/30/PLACES_BACKUPS_BOOKMARKSTREE_MS ), and then we must still write this to a compressed file. Thus my guess is that since we already have the bookmarks representation, it should be cheaper to just compress an existing file, reads are usually much faster than writes so globally it should end up being faster.
So I added up the options and tested everything manually and it all seemed to work fine. These are what the current additions I have made 1) Default bookmark backups are saved using lz4 compression 2) If an additional backup is saved by the user, the same backup is also stored in bookmarkbackups folder using lz4 compression 3) Ability to import bookmarks saved using lz4 Right now, I haven't written any automated tests for it and neither have I updated the old tests. From what I saw, the current changes only break 1 of the old tests because of the usage of .json extension directly. If the code changes are fine, then I can start working on the tests.
Attachment #8423388 - Attachment is obsolete: true
Attachment #8424648 - Flags: feedback?(mak77)
I missed your question on IRC, we allow only json export for the user, the compressed format so far is for internal use only.
Thanks. I finished up the tests and fixed the 1 broken test as well. All other tests under toolkit/components/places passed.
Attachment #8424648 - Attachment is obsolete: true
Attachment #8424648 - Flags: feedback?(mak77)
Attachment #8426160 - Flags: review?(mak77)
Comment on attachment 8426160 [details] [diff] [review] Compressing bookmark backups using lz4 v3 Review of attachment 8426160 [details] [diff] [review]: ----------------------------------------------------------------- This looks very good, most of my comments are small details. Though I will take a second look at the next version of patch, since I also want to do some local testing before final r+. So, I'm giving just a positive feedback for now. Thank you very much for working on this. ::: browser/components/places/content/places.js @@ +499,5 @@ > * Restores bookmarks from a JSON file. > */ > restoreBookmarksFromFile: function PO_restoreBookmarksFromFile(aFilePath) { > // check file extension > + if ((!aFilePath.endsWith("json")) && (!aFilePath.endsWith("jsonlz4"))) { there's no need to wrap the negations in parenthesis, the negation operator precedence is usually very clear and hard to get wrong. ::: browser/locales/en-US/chrome/browser/places/places.properties @@ +12,5 @@ > bookmarksRestoreAlertTitle=Revert Bookmarks > bookmarksRestoreAlert=This will replace all of your current bookmarks with the backup. Are you sure? > bookmarksRestoreTitle=Select a bookmarks backup > bookmarksRestoreFilterName=JSON > +bookmarksRestoreFilterExtension=*.json;*.jsonlz4 So, as a side note, when you change something in a localization file (be it a .properties or a .dtd) you should also change the identifier for that string, that way localizers notice a string is missing in the automated tools and update the localization. Though, in this case looks like we made the wrong choice in the past, this specific string (bookmarksRestoreFilterExtension) should not be localizable, indeed having it localizable allows localizers to break the feature. I found at least one case where this happens: http://mxr.mozilla.org/l10n-central/source/lg/browser/chrome/browser/places/places.properties#16 all of the other locales correctly left the string unchanged, that's indeed what I'd expect. So, please remove this string and any use of bookmarksRestoreFilterExtension from the tree, instead you should define a const RESTORE_FILEPICKER_FILTER_EXT at the top of browser/components/places/content/places.js and use that const where currently we read the localized string. ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +102,2 @@ > let importer = new BookmarkImporter(aReplace); > + if (aFilePath.match("(\.jsonlz4)")) { please use .endsWith, there's no need to complicate the check with a slower regex (fwiw instead of .match you should have used /\.jsonlz4$/.test(aFilePath) if we wanted a regex, since you don't need the matches but just to know if it matched, test may be faster and have a better memory footprint) @@ +102,3 @@ > let importer = new BookmarkImporter(aReplace); > + if (aFilePath.match("(\.jsonlz4)")) { > + // compressed import I (personally) prefer less cryptic comments, maybe even a bit verbose :) Something like // Import from a compressed json file. @@ +109,5 @@ > + let jsonString = converter.convertFromByteArray(aResult, > + aResult.length); > + yield importer.importFromJSON(jsonString); > + } else { > + yield importer.importFromURL(OS.Path.toFileURI(aFilePath)); please add a .importFromCompressedFile to BookmarkImporter and move the code there, so that here we are left with a simple (pseudo-code) if (compressed) { importFromCompressedFile } else { importFromURL } so the API functionality remains more readable and we delegate to the helper. @@ +183,5 @@ > + if (aOptions.compress) { > + writeOptions.compression = "lz4"; > + yield OS.File.writeAtomic(aFilePath, jsonString, writeOptions); > + }else{ > + yield OS.File.writeAtomic(aFilePath, jsonString, writeOptions); the writeAtomic call can be moved after the if/else so that it's unique, the only thing that changes is contents of writeOptions (indeed the else is pointless here and can be removed) Also, spacing around else does not follow general nor module coding style (https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style) ::: toolkit/components/places/PlacesBackups.jsm @@ +29,5 @@ > XPCOMUtils.defineLazyGetter(this, "localFileCtor", > () => Components.Constructor("@mozilla.org/file/local;1", > "nsILocalFile", "initWithPath")); > > +// Regex has jsonlz4 first as otherwise json would be matched for both json and jsonlz4 file ext. we limit width of code/comments to 80 chars, so this should be splitted across 2 rows. Note this is not a strict rule, if splitting hurts code readability we accept exceptions (for examnple the regexp down here is an exception to the rule since it's easier to keep it onelined) @@ +34,2 @@ > XPCOMUtils.defineLazyGetter(this, "filenamesRegex", > + () => new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(jsonlz4|json)", "i") I suspect adding a $ at the end of the regex would make the problem of priority between json and jsonlz4 a non-issue since then the regepx would match up to the end of the string. Maybe worth a try, if we can remove this "strictness" the code would be more robust to future changes. @@ +243,2 @@ > /** > * Creates a Date object from a backup file. The date is the backup I think you wrongly removed a newline between these methods @@ +266,5 @@ > Deprecated.warning( > "PlacesBackups.getMostRecent is deprecated and will be removed in a future version", > "https://bugzilla.mozilla.org/show_bug.cgi?id=859695"); > > + let fileExt = "(jsonlz4|json)"; this is going away for the other patch in bug 1011581. @@ +283,5 @@ > * @result the path to the file. > */ > getMostRecentBackup: function PB_getMostRecentBackup() { > return Task.spawn(function* () { > + let fileExt = "(jsonlz4|json)"; ditto @@ +334,5 @@ > // user. See bug 424389. > let mostRecentBackupFile = yield this.getMostRecentBackup(); > if (!mostRecentBackupFile || > hash != getHashFromFilename(OS.Path.basename(mostRecentBackupFile))) { > + let name = this.getFilenameForDate(undefined,true); space after comma ::: toolkit/components/places/tests/bookmarks/test_466303-json-remove-backups.js @@ +14,5 @@ > + // Create 2 json dummy backups in the past. > + let oldJsonPath = OS.Path.join(backupFolder, "bookmarks-2008-01-01.json"); > + let oldJsonFile = yield OS.File.open(oldJsonPath, { truncate: true }); > + oldJsonFile.close(); > + do_check_true(yield OS.File.exists(oldJsonPath)); Ah I see, you had to change this cause now we don't filter anymore on html... that's fine cause we stopped supporting it a lot of time ago, so there should be no left files in profile in the wild. So, I think here you can completely remove the old code referring to html, converting it to json would just end up doing the same test twice. ::: toolkit/components/places/tests/bookmarks/test_818587_compress-bookmarks-backups.js @@ +11,5 @@ > +add_task(function* compress_bookmark_backups_test() { > + let backupFolder = yield PlacesBackups.getBackupFolder(); > + > + // Check for jsonlz4 extension > + let todayFilename = PlacesBackups.getFilenameForDate(new Date(2014,04,15), true); please always add space after comma per coding style @@ +12,5 @@ > + let backupFolder = yield PlacesBackups.getBackupFolder(); > + > + // Check for jsonlz4 extension > + let todayFilename = PlacesBackups.getFilenameForDate(new Date(2014,04,15), true); > + do_check_eq(todayFilename,"bookmarks-2014-05-15.jsonlz4"); ditto @@ +24,5 @@ > + do_check_true(PlacesBackups.filenamesRegex.test(OS.Path.basename(mostRecentBackupFile))); > + > + /* The most recent backup file has to be removed since saveBookmarksToJSONFile > + // will otherwise over-write the current backup, since it will be made on the > + same date */ please be consistent with comment style, I personally prefer line comments (//) cause they are easier to handle when one wants to comment out a bunch of code for debug purposes. /** */ for javadocs and /* */ for long comments (more than 4 lines) @@ +25,5 @@ > + > + /* The most recent backup file has to be removed since saveBookmarksToJSONFile > + // will otherwise over-write the current backup, since it will be made on the > + same date */ > + do_print((yield OS.File.remove(mostRecentBackupFile))); what is this trying to print? if it fails the test will fail automatically @@ +26,5 @@ > + /* The most recent backup file has to be removed since saveBookmarksToJSONFile > + // will otherwise over-write the current backup, since it will be made on the > + same date */ > + do_print((yield OS.File.remove(mostRecentBackupFile))); > + yield OS.File.remove(mostRecentBackupFile); this is doing the same thing as above @@ +29,5 @@ > + do_print((yield OS.File.remove(mostRecentBackupFile))); > + yield OS.File.remove(mostRecentBackupFile); > + do_check_false((yield OS.File.exists(mostRecentBackupFile))); > + > + // Check if user creates a custom backup, then another backup is made in our bookmarkbackups folder This comment could be clarified a little bit // Check that, if the user created a custom backup out of the default // backups folder, it gets copied (compressed) into it. @@ +42,5 @@ > + PlacesUtils.bookmarks.DEFAULT_INDEX, > + "bookmark"); > + > + // Force create a compressed backup, Remove the bookmark, the restore the backup > + yield PlacesBackups.create(undefined,true); space after comma @@ +45,5 @@ > + // Force create a compressed backup, Remove the bookmark, the restore the backup > + yield PlacesBackups.create(undefined,true); > + let recentBackup = yield PlacesBackups.getMostRecentBackup(); > + PlacesUtils.bookmarks.removeItem(bm); > + yield BookmarkJSONUtils.importFromFile(recentBackup,true); ditto ::: toolkit/components/places/tests/bookmarks/xpcshell.ini @@ +29,5 @@ > [test_711914.js] > [test_protectRoots.js] > [test_818593-store-backup-metadata.js] > [test_818584-discard-duplicate-backups.js] > +[test_818587_compress-bookmarks-backups.js] \ No newline at end of file I probably bitrot this file with my other patches, you should pull again and rebase the patch (sorry) If you need assistance with using Mercurial Queue feel free to ask.
Attachment #8426160 - Flags: review?(mak77) → feedback+
So with regards to ::: toolkit/components/places/tests/bookmarks/test_466303-json-remove-backups.js @@ +14,5 @@ > + // Create 2 json dummy backups in the past. > + let oldJsonPath = OS.Path.join(backupFolder, "bookmarks-2008-01-01.json"); > + let oldJsonFile = yield OS.File.open(oldJsonPath, { truncate: true }); > + oldJsonFile.close(); > + do_check_true(yield OS.File.exists(oldJsonPath)); The test was trying to ensure that the old html file is removed by respecting max_backups as 2. Therefore I changed the test to now have 2 JSON files of which the old one should be removed. If we remove the old one, we might as well remove that whole test since it's not testing for anything else right? Also as I asked in IRC but you were away, I'll just repeat here. if (aFilePath.endsWith("jsonlz4")) { yield importer.importFromCompressedFile(aFilePath); } else { yield importer.importFromURL(OS.Path.toFileURI(aFilePath)); } One method takes in an OS.Path and another OS.Path URI. Is that alright, since in importFromCompressedFile we will be using OS.File.read(aFilePath) I've incorporated all the other changes as requested, except for the RegExp changes. Seemed to fail even when I add the $ at the end as follows json(lz4)?$ I'll try debugging it further.
Forgot to add, > let rx = new RegExp("\.json(lz4)?$"); This RegExp works fine for getMostRecentBackup() and getMostRecent() it's just the long filenamesRegEx that's being stubborn. I'm assuming it's because of the use of ^ to match the beginning of the string, so therefore the $ sign is rendered useless.
(In reply to Althaf Hameez [:ahameez] from comment #21) > The test was trying to ensure that the old html file is removed by > respecting max_backups as 2. Therefore I changed the test to now have 2 JSON > files of which the old one should be removed. If we remove the old one, we > might as well remove that whole test since it's not testing for anything > else right? OK, sorry if I misread it > Also as I asked in IRC but you were away, I'll just repeat here. > > if (aFilePath.endsWith("jsonlz4")) { > yield importer.importFromCompressedFile(aFilePath); > } else { > yield importer.importFromURL(OS.Path.toFileURI(aFilePath)); > } > > One method takes in an OS.Path and another OS.Path URI. Is that alright, > since in importFromCompressedFile we will be using OS.File.read(aFilePath) I think it's ok since those methods are internal implementation details, it wouldn't be OK if we'd expose them to the outside world as an API. (In reply to Althaf Hameez [:ahameez] from comment #22) > Forgot to add, > > let rx = new RegExp("\.json(lz4)?$"); > > This RegExp works fine for getMostRecentBackup() and getMostRecent() > it's just the long filenamesRegEx that's being stubborn. I'm assuming it's > because of the use of ^ to match the beginning of the string, so therefore > the $ sign is rendered useless. Strange, if I do: let re = new RegExp("^bookmarks-([0-9\-]+)(?:_([0-9]+)){0,1}(?:_([a-z0-9=\+\-]{24})){0,1}\.(json|jsonlz4)$", "i"); "bookmarks-01-01-01_12_ssssssssaaaaaaaacccccccc.jsonlz4".match(re)[4]; I get "jsonlz4" as expected
Woops you were right about the RegEx, I was missing the brackets for the $ sign. Updated patch with all the changes.
Attachment #8426160 - Attachment is obsolete: true
Attachment #8426950 - Flags: review?(mak77)
Comment on attachment 8426950 [details] [diff] [review] Compressing bookmark backups using lz4 v4 Review of attachment 8426950 [details] [diff] [review]: ----------------------------------------------------------------- A couple details, I think the next one will be the final one! ::: browser/components/places/content/places.js @@ +14,5 @@ > "resource://gre/modules/PlacesBackups.jsm"); > XPCOMUtils.defineLazyModuleGetter(this, "DownloadUtils", > "resource://gre/modules/DownloadUtils.jsm"); > > +const RESTORE_FILEPICKER_FILTER_EXT = "*.json;*.jsonlz4" missing semicolon at the end ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +190,5 @@ > /** > * Import bookmarks from a url. > * > * @param aSpec > + * OS.File path string of the bookmark data. I think you inverted the param here with importFromCompressedFile, this one is a url the other one is an OS.File path string @@ +236,5 @@ > /** > + * Import bookmarks from a compressed file. > + * > + * @param aSpec > + * url of the compressed bookmark data. this one should also be named aFilePath since it's not a spec @@ +242,5 @@ > + * @return {Promise} > + * @resolves When the new bookmarks have been created. > + * @rejects JavaScript exception. > + */ > + importFromCompressedFile: function BI_importFromCompressedFile(aSpec) { ditto on argument naming this one should also be a function* (star function) since it's a generator. @@ +248,5 @@ > + let converter = Cc["@mozilla.org/intl/scriptableunicodeconverter"]. > + createInstance(Ci.nsIScriptableUnicodeConverter); > + converter.charset = "UTF-8"; > + let jsonString = converter.convertFromByteArray(aResult, aResult.length); > + this.importFromJSON(jsonString); you should yield on importFromJSON, otherwise we are returning before it's done, that's wrong. it's unfortunate the test is not verifying this case, I guess it's because importFromJSON is not really asynchronous at the moment so the test cannot verify that :( ::: toolkit/components/places/PlacesBackups.jsm @@ +30,5 @@ > () => Components.Constructor("@mozilla.org/file/local;1", > "nsILocalFile", "initWithPath")); > > +// Regex has jsonlz4 first as otherwise json would be matched for both json > +// and jsonlz4 file ext. this comment is no more valid @@ +334,5 @@ > // user. See bug 424389. > let mostRecentBackupFile = yield this.getMostRecentBackup(); > if (!mostRecentBackupFile || > hash != getHashFromFilename(OS.Path.basename(mostRecentBackupFile))) { > + let name = this.getFilenameForDate(undefined,true); whitespace before comma @@ +436,3 @@ > newFilenameWithMetaData = appendMetaDataToFilename(newBackupFilename, > + { count: nodeCount, > + hash: hash }); the change in indentation here is unwanted ::: toolkit/components/places/tests/bookmarks/test_818587_compress-bookmarks-backups.js @@ +8,5 @@ > + run_next_test(); > +} > + > +add_task(function* compress_bookmark_backups_test() { > + let backupFolder = yield PlacesBackups.getBackupFolder(); the correct indentation is 2 spaces, not 4, please fix this test.
Attachment #8426950 - Flags: review?(mak77) → feedback+
Sorry! I think I rushed the last patch without going over it properly. Here is the updated one. Hopefully everything is good in this.
Attachment #8426950 - Attachment is obsolete: true
Attachment #8427866 - Flags: review?(mak77)
Comment on attachment 8427866 [details] [diff] [review] Compressing bookmark backups using lz4 v5 Review of attachment 8427866 [details] [diff] [review]: ----------------------------------------------------------------- Yep, this one looks good, nice work! Next week, after this lands, I will try (I must update my blogging platform) to blog post on Planet Firefox about this awesome contribution, stay tuned :) Please, before updating the patch pull a recent head, the current patch applies with some fuzz (likely some recent change to PlacesBackups.jsm) I pushed to try server to globally check test coverage, the requested changes here won't affect the validity of this push. Once we have the final patch and a "green" from this push we can proceed with a checkin-needed https://tbpl.mozilla.org/?tree=Try&rev=58329322054d ::: toolkit/components/places/BookmarkJSONUtils.jsm @@ +190,5 @@ > /** > * Import bookmarks from a url. > * > * @param aSpec > + * url of the compressed bookmark data. this is not compressed, we don't use importFromUrl for compressed files, the old comment was fine, please just restore it. ::: toolkit/components/places/tests/bookmarks/test_818587_compress-bookmarks-backups.js @@ +7,5 @@ > +function run_test() { > + run_next_test(); > +} > + > + add_task(function* compress_bookmark_backups_test() { this single row should not be indented
Attachment #8427866 - Flags: review?(mak77) → review+
Added the comment and indenting change.
Attachment #8427866 - Attachment is obsolete: true
Wrong version of v6 uploaded last time. This is the correct one with the fuzziness fixed.
Attachment #8428212 - Attachment is obsolete: true
Thank you! note for sheriffs: try run is on comment 27
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: [mentor=mak][lang=js] p=0 → [mentor=mak][lang=js] p=0[fixed-in-fx-team]
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [mentor=mak][lang=js] p=0[fixed-in-fx-team] → [mentor=mak][lang=js] p=0
Target Milestone: --- → mozilla32
Whiteboard: [mentor=mak][lang=js] p=0 → [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.2 [qa?]
Whiteboard: [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.2 [qa?] → [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.3 [qa?]
Whiteboard: [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.3 [qa?] → [mentor=mak][lang=js] p=0 s=it-32c-31a-30b.3 [qa-]
Status: RESOLVED → VERIFIED
Depends on: 1016953
(In reply to Marco Bonardo [:mak] from comment #6) > ... well I'd like if the compressed files > we create would be expandable with common compression tools, in case the > user wants to share the json with other services. (In reply to David Rajchenbach Teller [:Yoric] from comment #12) > I suggest using extension mozLz4, given that our wrapper is unfortunately > not compatible with existing lz4 implementations. (In reply to Marco Bonardo [:mak] from comment #13) > ... .jsonlz4 doesn't > have to be compatible with any special lz4 decompressor, it's a custom > extension that has the same value as mozlz4 ... It's somewhat unfortunate that we ended up with non standard format which users can't examine without loading it into a compatible Firefox version (and even then not examine directly, but rather just as much as the details are exposed through the bookmarks system). Since the whole framework is apparently already in place, how hard or bad would it be to try and replace it with a more standard format which common enough tools can handle?
(In reply to Avi Halachmi (:avih) from comment #33) > It's somewhat unfortunate that we ended up with non standard format which > users can't examine without loading it into a compatible Firefox version > (and even then not examine directly, but rather just as much as the details > are exposed through the bookmarks system). The user can still export bookmarks in plain-text json through the UI, these compressed files are only for internal and automated use. > Since the whole framework is apparently already in place, how hard or bad > would it be to try and replace it with a more standard format which common > enough tools can handle? We don't have a more standard format available through OS.File at the moment. I think it would be nice to make a very simple add-on to extract mozilla lz4 format, this would cover the rare need to access this data for advanced users. should take just some hours to make a restartless add-on for that.
Just adding some info. Our lz4 implementation is based on lz4 v1.3, and this version doesn't specify a file format: http://dxr.mozilla.org/mozilla-central/source/toolkit/components/workerlz4/lz4.js#49 The comment says that when lz4 will have a standard file format, our implementation will update, stay backward compatible for decompression, but will start producing standard lz4 compressed files. So from this perspective, there's no need for a change at the backup code because (new) backups will become standard once we modify our underlying lz4 code. On related news, lz4 now has a final specification v1.4 with stream/container format which should probably also be suitable for files storage: http://fastcompression.blogspot.fr/2013/04/lz4-streaming-format-final.html So it's only a matter of implementation now, and the backups will automatically become standard lz4 files.
I am trying to restore bookmarks using a bookmark backup of the file type .jsonlz4 and I am getting an "unsupported file type" error. I am using Firefox 31.0 on ubuntu. (My apologies if this is not the appropriate place to put this)
compressed backups are supported from Firefox 32 on.
If .jsonlz4 are created using LZ4 v1.3 format, then they should remain readable by Windows Binart LZ4 v1.4, at : http://fastcompression.blogspot.fr/p/lz4.html It's really a temporary work around, because v1.3 format was never published, precisely to avoid being used. For application interoperability, it is highly encourage to use the final framing specification v1.4.1 : http://fastcompression.blogspot.fr/2013/04/lz4-streaming-format-final.html The work to create an archive using this format should now be simplified since the availability of lz4frame library, which automatically encapsulate data streams using the framing specification. See : https://github.com/Cyan4973/lz4/blob/dev/lz4frame.h
we are using our internal lz4 compression for now, there's no reason to make these files openable by any external app (a Firefox add-on could easily be made if one would really need to). These are not intended to be lz4 files, indeed they have a custom file extension too.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #39) > we are using our internal lz4 compression for now, there's no reason to make > these files openable by any external app... The thing is, when you need the backup it could mean that something has gone wrong, possibly very wrong - maybe to a level where Firefox itself is not useable. At which case, being able to get direct access to the backed up content without a Firefox instance is a valid scenario IMO. That being said, per comment 35, we only need to followup on the premise of the comment at the source code that we'll update the underlying implementation to standard lz4 v1.4. It should probably be filed as a different bug if there isn't one already.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #39) > we are using our internal lz4 compression for now, there's no reason to make > these files openable by any external app (a Firefox add-on could easily be > made if one would really need to). > These are not intended to be lz4 files, indeed they have a custom file > extension too. That is a crock!!! Backup files should be able to be decoded using tools, outside firefox, just as Avi points out, firefox may not be available or even usable, at some point now. I can also gurantee that firefox, at least in its current form, would be available at some point in the future. In my case, and probably the reason for the sudden activity on this topic, is that I requested information on exactly this. Decoding the bookmark backups outside firefox automatically (no manual GUi interaction). See Mozilla Forums... http://forums.mozillazine.org/viewtopic.php?f=38&t=2885435 I have been using this backup files for years as a way of accessing the bookmark files, as a form of plain text. The code has run suddessfully, until a recent update a month ago, when bookmarks converted to "jsonlz4" format, and older standard format "json" files rolled out of existance. I then struggled with attempts to decode the format, using standard tools like "lz4" to get the json data. Now while I have a workaround, (contributed via the forum) I still find it increadible that a developer has the attitude that... > there's no reason to make these files openable by any external app Unless the matter is either a security or intellectual propriety issue (and itis not) there is absolutely no reason that a developer should make an assumption "that no one would want to read that data by other means". The Developer has no knowledge of the future, and limiting access to internal non-standard methods is not useful. It is also actually a useful method of 'debugging' or testing, especially between very distantally realated versions. I have been working with UNIX computers for more than 30 years, and I store almost all my data as plain text. Why? Because it is future proof. At least while 'ASCII' or even 'UNICODE' is in standard use. I can still use data files I wrote in 1978 (a raytracer project during my university days). And even my documentation is still readable, even though things like 'WordPefect" and "WordStar" (common use by students back then) are long gone. And "Firefox", like its predecessors, "Mosaic" and "Netscape", will also will eventually be relaced. Making backup files only usable by the programs that created them just does not make any real sense. I truely believe you should rethink your stratagy.
I may have come across a little harsh in my previous response. And I applogies if I may have offended. The point I was making however remains.
I think you misunderstood me, or I expressed badly. We'll soon or later get support to decompress with external decompressors. That will happen when the underlying lz4 implementation will be upgraded. But that's not one of the officially supported methods for data recovery, and thus we don't suggest doing that (as well as support shouldn't). What I meant is it's not a goal for us today to spend resources (we don't have) into changing that. lz4 is not a proprietary format, Firefox is open source, and thus there's no risk we are "hiding" any data, anyone can write a decompressor pretty easily. There's no lack of transparency here, so put down your sword please :) Here we just care that users don't lose bookmarks. There are many options available to recover bookmarks in case of problems: 1. use profile reset feature 2. import backup from a new profile (pointing to the old profile backups folder) 3. copy bookmarkbackups folder to a new profile The bug that happened here is just due to the fact we didn't care to release a version that could open compressed backups before releasing a version that can create them. That happened mostly due to lack of resources, this bug itself has been kindly fixed by a volunteer, cause we wouldn't have internal resources to improve backups at all. Nothing more than that.
Just curious,what is the rationale behind this new format for bookmark backups? I've just tried to compress a .json backup in my Nightly profile and it has shrunk from 364k to 80k using gzip and to 81k using zip,whilst all the other .jsonlz4 backups are at least 100K in size-so why use this new format for which I wasn't able to find a standalone decompressor?
It's supposedly about an order of magnitude faster at compressing and decompressing compared to zlib based compressions, and its compression ratio is still meaningful, making it probably practically faster than accessing uncompressed files, therefore more suitable for on the fly disk access than gzip et al are. Also, it is/was the only on the fly compression which the API supports. And it was supposed to be made standardized, but isn't yet. You're welcome to help speed this process up - the pointers appear at earlier comments.
(In reply to Marco Bonardo [::mak] from comment #43) > I think you misunderstood me, or I expressed badly. We'll soon or later get > support to decompress with external decompressors. That will happen when the > underlying lz4 implementation will be upgraded. I was wondering whether, more than a year after the above words were written, there is any progress with providing a stand-alone (i.e., that can be run independently/outside of Firefox) .jsonlz4 decompressor. I would also be curious to know the rationale behind compressing to a non-standard format, i.e., a format that the standard lz4 program cannot decompress due to an "Unrecognized header" error. I think most people would agree that shaving a few microseconds off compressing and writing proper header information would not qualify as a "valid" response.
(In reply to orionbelt2 from comment #46) > I was wondering whether, more than a year after the above words were > written, there is any progress with providing a stand-alone (i.e., that can > be run independently/outside of Firefox) .jsonlz4 decompressor. not a priority. It will happen when possible. > I would also be curious to know the rationale behind compressing to a > non-standard format, i.e., a format that the standard lz4 program cannot > decompress due to an "Unrecognized header" error. It's the best compression API we had at the time (and still now).
(In reply to Marco Bonardo [::mak] from comment #47) > (In reply to orionbelt2 from comment #46) > > I was wondering whether, more than a year after the above words were > > written, there is any progress with providing a stand-alone (i.e., that can > > be run independently/outside of Firefox) .jsonlz4 decompressor. > > not a priority. It will happen when possible. I'm using linux and free software since 15 years, even professionally, because of the possibility to modify the software for my own uses. Even if I'm not a programmer (I'm a lawyer) I learned enough of some programming languages to be able to create the (almost) perfect systems and software for my daily uses. We are using kubuntu, and the search engines of kubuntu that I write (in particular for legal search tools) are transformed automagically by a custom script in xml search engines. Until last year, I was able to write a deb package with these xml custom search engines and a little script to add them to all the profiles of all the workers in my office, including the keywords, which means alt-f2 -> "keyword searchterm" worked exactly like "keyword searchterm" in firefox bar. And now it's not possible anymore, because of the compression algorithm which makes impossible the script access to the search engines and keywords. So the interest of free software, i. e. custom modifications not exactly in the direction thinked by the original programmer, is gone away with this not-exaclty-lz4 blackbox. Best regards
Depends on: 1209390
(In reply to Olivier Subilia from comment #48) > We are using kubuntu, and the search engines of kubuntu that I write (in > particular for legal search tools) are transformed automagically by a custom > script in xml search engines. Until last year, I was able to write a deb > package with these xml custom search engines and a little script to add them > to all the profiles of all the workers in my office, including the keywords, > which means alt-f2 -> "keyword searchterm" worked exactly like "keyword > searchterm" in firefox bar. I'm not sure why you are posting here, since this bug has nothing to do with search engines and their keywords. You may have more luck in the Firefox / Search component. That said, Firefox is open source, there's nothing secret about the format and the used libraries. This is far from being a black box.
To give a hint, the format is trivial, just take lz4.c and lz4.h, the first 8bytes of the file are identifying the format ("mozLz4a\0"), the next 4 bytes are the decompressed size. All the rest is the payload. You could probably write a simple decompressor in 10 minutes. http://mxr.mozilla.org/mozilla-central/source/toolkit/components/lz4/lz4.js#31
(In reply to Marco Bonardo [::mak] from comment #49) > (In reply to Olivier Subilia from comment #48) > > We are using kubuntu, and the search engines of kubuntu that I write (in > > particular for legal search tools) are transformed automagically by a custom > > script in xml search engines. Until last year, I was able to write a deb > > package with these xml custom search engines and a little script to add them > > to all the profiles of all the workers in my office, including the keywords, > > which means alt-f2 -> "keyword searchterm" worked exactly like "keyword > > searchterm" in firefox bar. > > I'm not sure why you are posting here, since this bug has nothing to do > with search engines and their keywords. > You may have more luck in the Firefox / Search component. > That said, Firefox is open source, there's nothing secret about the format > and the used libraries. This is far from being a black box. Sorry. I'm posting here because searching what the search.json.mozlz4 compression format was send me here, and I assumed the problem was the same, i. e. the compression format used (see comment 46 for example) which seems the case (I just saw the link to the bug 1209390). @ comment 50 : Thank you for the hints for the headers. I'll write my own decompressor 10 minutes after having learned C ;-)
See bug 1209390, once that will be fixed, consumers will move to the new format. If the platform doesn't provide tools, consumers like backups or search are unlikely to change. Ideally the transition will be completely transparent to the consumers, new files could be created in the new format, old files could still be supported by the decompressor. I hope someone can chime in on bug 1209390 and help fixing it.
As a reminder, code comment is erroneous: the magic header is "mozLz40\0", not "mozLz4a\0".
I've created an unofficial stand-alone decompressor for .jsonlz4 files. More details at bug 1209390 comment 4.
Finally some progress for this problem!
Though I have moved on to using a different Browser, This bug was one of the key reasons (though not the only one) I did the switch!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: