Closed Bug 1156253 Opened 9 years ago Closed 9 years ago

Use OS.File lz4 compression for archived telemetry pings

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

A quick check shows that the OS.File lz4 compression gives a ~70% reduction with the pings i have locally.
Slightly slower loading from the ping archive should not be a problem, so this seems like an easy win.
Yoric, are there any caveats with us OS.File's lz4 compression? I seem to remember an issue with OS.File not using a standard format
Flags: needinfo?(dteller)
we are using it for bookmark backups, the only drawback is that the user can't use a third party decompressor, that means it's not a good idea to use .lz4 extension, we used .jsonlz4 for bookmarks.
Indeed, this is not standard lz4. I'd like to add support for standard lz4 one of these days, but I have found 0 time for that so far.
Flags: needinfo?(dteller)
How about gzip instead then? We don't really want to support our own compression file format
We are already using this in other places and OS.File supports it with just providing an option to read/writeAtomic, so this would be a straight-forward solution.

What concerns would we have with lz4?
Backwards compability would already be required by other modules (experiments, bookmarks, crash manager).
As mreid pointed out in the "New Telemetry ping upload & expiry rules" doc, we already gzip pings before sending, so we might as well just save the gzip compressed stream to disk.

The drawback of using our own lz4 format is that it's harder for devs and users to inspect the saved pings.
Yoric, how hard would it be to add gzip support to read & writeAtomic? We already have the compression/decompression code in the codebase
Flags: needinfo?(dteller)
After looking at zlib.h, I believe that I can do this in a few hours for the regular path (i.e. JS code). The fast path version will probably be reasonable, too, but it will take me some time to remember exactly how that code works.
Flags: needinfo?(dteller)
Per the meeting today, we will start to use lz4 compression for now because it's trivial to do.
We will use a ".jsonlz4" suffix for the files, so we can later tell them apart.

We will look into moving over to gzip in a non-blocking follow-up bug.
Blocks: 1157717
fwiw, writing a decompression restartless add-on would be trivial (and less time consuming than adding gzip support to OS.File imo) and would solve all of the "not inspectable" complains.
Assignee: nobody → alessio.placitelli
Attached patch bug1156253.patch (obsolete) — Splinter Review
This patch adds lz4 compression for archived pings. New archived pings are compressed and old uncompressed pings are still getting loaded correctly.

It also adds test coverage for compressed/uncompressed ping loading.
Attachment #8601495 - Flags: review?(gfritzsche)
Comment on attachment 8601495 [details] [diff] [review]
bug1156253.patch

Review of attachment 8601495 [details] [diff] [review]:
-----------------------------------------------------------------

I think it's more efficient to move the first patch from bug 1156355 over here.
Then the changes in this patch can be completely limited to the aborted session ping handling in TelemetryStorage.jsm (and tests).

::: toolkit/components/telemetry/TelemetryStorage.jsm
@@ +94,5 @@
>     *
>     * @param {string} id The pings id.
>     * @param {number} timestampCreated The pings creation timestamp.
>     * @param {string} type The pings type.
> +   * @param {bool} compressed If |true|, expect the ping to be compressed.

I don't think we need to keep track of whether archived pings are compressed.
When loading we can just test through {path, path+"lz4"} and load the first that exists.

@@ +121,5 @@
>     * @param {bool} overwrite If |true|, the file will be overwritten if it exists,
>     * if |false| the file will not be overwritten and no error will be reported if
>     * the file exists.
> +   * @param {bool} compress If |true|, the file will use lz4 compression. Otherwise no
> +   * compression will be used.

What is the default?
Attachment #8601495 - Flags: review?(gfritzsche)
Status: NEW → ASSIGNED
This refactors the aborted-session file handling details into TelemetryStorage. This is relatively simple and just moving code around per the Telemetry redesign draft. Note that all use of SaveSerializer will be in TelemetryStorage after the other refactors, so i didn't take care of sharing that between modules now.
Attachment #8602211 - Flags: review?(alessio.placitelli)
Blocks: 1156355
Comment on attachment 8602211 [details] [diff] [review]
Refactor aborted-session ping file details into TelemetryStorage

Review of attachment 8602211 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryController.jsm
@@ +279,3 @@
>     * @return {Promise} Promise that is resolved when the ping is saved.
>     */
> +  checkAbortedSessionPing: function addAbortedSessionPing() {

|function checkAbortedSessionPing|

@@ +712,4 @@
>    },
>  
>    /**
>     * Save an aborted-session ping to the pending pings and archive it.

This comment should probably be updated, along with the "@param" below.

@@ +716,5 @@
>     *
>     * @param {String} aFilePath The path to the aborted-session checkpoint ping.
>     * @return {Promise} Promise that is resolved when the ping is saved.
>     */
> +  checkAbortedSessionPing: Task.async(function* addAbortedSessionPing() {

|function* checkAbortedSessionPing()|

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +316,5 @@
>      return [parseInt(io.readBytes), parseInt(io.writeBytes)];
>    }
>  };
>  
> + /**

Nit: leading whitespace.

@@ +1887,5 @@
>            yield this.savePendingPings();
>            yield this._stateSaveSerializer.flushTasks();
>  
>            if (IS_UNIFIED_TELEMETRY) {
> +            TelemetryStorage.removeAbortedSessionPing();

Can't we do this in |TelemetryController|? TelemetrySession is using TelemetryController for all the aborted-session related operations, but it's directly interacting with the storage for this.

::: toolkit/components/telemetry/TelemetryStorage.jsm
@@ +310,5 @@
> + * We are using this to synchronize saving to the file that TelemetrySession persists
> + * its state in.
> + */
> +function SaveSerializer() {
> +  this._queuedOperations = [];

When are the other refactors taking place? I fear this might take some time and sharing the SaveSerializer (just exporting it) might be relatively cheap.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
@@ +1355,5 @@
>    yield OS.File.setDates(PENDING_PING_FILE, null, Date.now() - OVERDUE_PING_FILE_AGE);
>    yield TelemetryController.reset();
>  
>    // Wait for the aborted-session ping.
> +  while (true) {

Why is this needed/why are you changing this part?
Attachment #8602211 - Flags: review?(alessio.placitelli)
(In reply to Georg Fritzsche [:gfritzsche] from comment #12)
> Comment on attachment 8601495 [details] [diff] [review]
> bug1156253.patch
> 
> Review of attachment 8601495 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think it's more efficient to move the first patch from bug 1156355 over
> here.
> Then the changes in this patch can be completely limited to the aborted
> session ping handling in TelemetryStorage.jsm (and tests).
> 
> ::: toolkit/components/telemetry/TelemetryStorage.jsm
> @@ +94,5 @@
> >     *
> >     * @param {string} id The pings id.
> >     * @param {number} timestampCreated The pings creation timestamp.
> >     * @param {string} type The pings type.
> > +   * @param {bool} compressed If |true|, expect the ping to be compressed.
> 
> I don't think we need to keep track of whether archived pings are compressed.
> When loading we can just test through {path, path+"lz4"} and load the first
> that exists.
> 

Ideally, in the future, we will be having the majority of compressed archived pings and a few uncompressed ones (archived before this patch lands). I think that testing through {path, path+"lz4"} would make us waste the first disk hit. Maybe we should try the lz4 first. What do you think?
Thanks for your review, Georg. I'm not keeping track of the compression status of archived pings anymore.

This stacks up on the other patch in this bug.
Attachment #8601495 - Attachment is obsolete: true
Attachment #8602655 - Flags: review?(gfritzsche)
Attachment #8602676 - Flags: review?(gfritzsche)
(In reply to Alessio Placitelli [:Dexter] from comment #14)
> ::: toolkit/components/telemetry/TelemetryStorage.jsm
> @@ +310,5 @@
> > + * We are using this to synchronize saving to the file that TelemetrySession persists
> > + * its state in.
> > + */
> > +function SaveSerializer() {
> > +  this._queuedOperations = [];
> 
> When are the other refactors taking place? I fear this might take some time
> and sharing the SaveSerializer (just exporting it) might be relatively cheap.

I don't think that wins us anything. If we want to avoid duplicating we should just move the _stateSaveSerializer (and related state file handling details) over to TelemetryStorage too in a separate patch here.

> ::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
> @@ +1355,5 @@
> >    yield OS.File.setDates(PENDING_PING_FILE, null, Date.now() - OVERDUE_PING_FILE_AGE);
> >    yield TelemetryController.reset();
> >  
> >    // Wait for the aborted-session ping.
> > +  while (true) {
> 
> Why is this needed/why are you changing this part?

Dropped, that was from an earlier version of the ping.
Attachment #8602211 - Attachment is obsolete: true
Comment on attachment 8602676 [details] [diff] [review]
part 3 - Make ProfileAge use a different prefix

Review of attachment 8602676 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with below change.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm
@@ +155,5 @@
>  
>  const EXPERIMENTS_CHANGED_TOPIC = "experiments-changed";
>  
> +XPCOMUtils.defineLazyGetter(this, "gProfileAgeLogger",
> +  () => Log.repository.getLoggerWithMessagePrefix(LOGGER_NAME, "ProfileAge::"));

We don't need to keep that object alive after we used the profile accessor.
Just move this into _updateProfile with a local var:

const logger = Log.repository.getLoggerWithMessagePrefix(LOGGER_NAME, "ProfileAge - "));
... new ProfileAge(null, logger);

Also note the prefix change as ProfileAge.jsm doesn't expect a "foo::" prefixed logger.
Attachment #8602676 - Flags: review?(gfritzsche) → review+
Comment on attachment 8602707 [details] [diff] [review]
Refactor aborted-session ping file details into TelemetryStorage

Review of attachment 8602707 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, there's no reason to move that right now then.
Attachment #8602707 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8602655 [details] [diff] [review]
part 2 - Add lz4 compression for archived pings. v2

Review of attachment 8602655 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryStorage.jsm
@@ +166,5 @@
>     * @param {bool} overwrite If |true|, the file will be overwritten if it exists,
>     * if |false| the file will not be overwritten and no error will be reported if
>     * the file exists.
> +   * @param {bool} [compress=false] If |true|, the file will use lz4 compression.
> +   * Otherwise no compression will be used.

No external caller uses that param. Please drop it from this public function.

@@ +296,5 @@
>  
>    /**
>     * Loads a ping file.
>     * @param {String} aFilePath The path of the ping file.
> +   * @param {Boolean} [aCompressed=false] If |true|, expects the file to be compressed using lz4.

No external caller uses that param. Please drop it from this public function.

@@ +431,3 @@
>      yield OS.File.makeDir(OS.Path.dirname(filePath), { ignoreExisting: true,
>                                                         from: OS.Constants.Path.profileDir });
> +    yield TelemetryStorage.savePingToFile(ping, filePath, /*overwrite*/true, /*compress*/true);

* this.savePingToFile
* if you are worried about readability let's just have the internal savePingToFile() function use an - optional - options object?

@@ +443,5 @@
>     */
>    loadArchivedPing: function(id, timestampCreated, type) {
>      this._log.trace("loadArchivedPing - id: " + id + ", timestampCreated: " + timestampCreated + ", type: " + type);
>      const path = getArchivedPingPath(id, new Date(timestampCreated), type);
> +    const pathCompressed = path + 'lz4';

Nit: this is the only time i see us using '' instead of "".

@@ +449,5 @@
> +    // for the uncompressed version.
> +    this._log.trace("loadArchivedPing - loading ping from: " + pathCompressed);
> +    return this.loadPingFile(pathCompressed, /*compressed*/true).catch(() => {
> +      this._log.trace("loadArchivedPing - compressed ping not found, loading: " + path);
> +      return this.loadPingFile(path, /*compressed*/false);

We only need to try other suffixes when we didn't find the "jsonlz4" file.
If e.g. it's invalid there is something else going wrong.
A task + try/catch should make that handling more readable.

Finally, let's at least have a space between |/*compressed*/true| etc. or use an options argument.

@@ +459,5 @@
>     *
>     * @param {string} id The pings id.
>     * @param {number} timestampCreated The pings creation timestamp.
>     * @param {string} type The pings type.
> +   * @param {bool} [compressed=true] If |true|, expect the ping to be compressed.

I don't think we need that. Let's just remove all instances of {pingFile, pingFile+"lz4"} that we find, otherwise we could e.g. fall back to an old uncompressed version.

@@ +467,5 @@
> +    this._log.trace("_removeArchivedPing - id: " + id + ", timestampCreated: " +
> +                    timestampCreated + ", type: " + type + ", compressed: " + compressed);
> +    let path = getArchivedPingPath(id, new Date(timestampCreated), type);
> +    if (compressed) {
> +      path += 'lz4';

Nit: '' vs. ""

@@ +543,5 @@
>     * @param {bool} overwrite If |true|, the file will be overwritten if it exists,
>     * if |false| the file will not be overwritten and no error will be reported if
>     * the file exists.
> +   * @param {bool} compress If |true|, the file will use lz4 compression. Otherwise no
> +   * compression will be used.

As per above, maybe it's time we use an options argument for the internal savePingToFile function?

@@ +752,5 @@
>  
>    /**
>     * Loads a ping file.
>     * @param {String} aFilePath The path of the ping file.
> +   * @param {Boolean} aCompressed If |true|, expects the file to be compressed using lz4.

As per above, maybe it's time we use an options argument for the internal loadPingFile function?

@@ +784,5 @@
>     *                  Otherwise an object with the extracted data in the form:
>     *                  { timestamp: <number>,
>     *                    id: <string>,
> +   *                    type: <string>,
> +   *                    compressed: <bool> }

Do we need this anywhere?

@@ +826,5 @@
>      return {
>        timestamp: timestamp,
>        id: uuid,
>        type: type,
> +      compressed: compressed,

Do we need this anywhere?
Attachment #8602655 - Flags: review?(gfritzsche)
As per IRC, I've left the comments and added a whitespace.
Attachment #8602655 - Attachment is obsolete: true
Attachment #8602742 - Flags: review?(gfritzsche)
Attachment #8602742 - Flags: review?(gfritzsche) → review+
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.