Make session restore history serialization asynchronous




Session Restore
5 years ago
5 years ago


(Reporter: Yoric, Assigned: Yoric)


(Depends on: 1 bug, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)


(Whiteboard: [Snappy])


(1 attachment)

Comment hidden (empty)
Summary: Make session history asynchronous → Make session restore asynchronous
Summary: Make session restore asynchronous → Make session serialization asynchronous
Summary: Make session serialization asynchronous → Make session restore serialization asynchronous
Summary: Make session restore serialization asynchronous → Make session restore history serialization asynchronous
I am currently hacking around to see if I can get a prototype of this working.
The general idea is the following:
1. instead of immediate history serialization entries, |tabData.entries| contains promises, that can be resolved in the background;
2. once all entries are resolved, we record a non-promise version of |tabData| in the tab itself;
3. |_getCurrentState| returns the latest recorded version.

Tim, what do you think of this approach? If I understand correctly when |_getCurrentState| is called, I believe that, in all practical cases, the latest recorded version will be current.
Flags: needinfo?(ttaubert)
I forgot to mention the changes for persisting the session restore file to disk. This should not be an issue, as this system is already asynchronous.

Comment 3

5 years ago
I'm not too familiar with session restore code. But, AFAIK it is serializing large chunks of JSON. And, I know these JSON blobs can get quite large.

As part of implementing Firefox Health Report (which also shuffles around large JS objects and JSON blobs), I spent some time researching what async and/or stream-based APIs there were for dealing with JSON. Sadly, it appears there are none. Since session restore is on the critical path for startup and shutdown performance, perhaps you will have more luck convincing people they need implemented.

Essentially, we need an API for serializing JSON to and from streams. Currently, all JSON serialization is buffer based:

  JSON.stringify(obj) -> JSON string
  JSON.parse(JSON string)  -> obj

This is fine assuming the input or output is small. However, there are performance implications when we deal with large entities.

Let's take sessionstore.js as an example. Mine is 162kb. It consists of a single JSON object. To parse it is conceptually:

  // Open a file and read all data.
  let fh = open('sessionstore.js', 'r');
  let content =

  // Convert the raw byte stream to Unicode (required for JSON.parse)
  let decoded = content.decode();

  // Finally do JSON parsing.
  let obj = JSON.parse(decoded);

(Note I'm assuming this is how sessionstore.js loading works. If not, the pattern applies to other things dealing with large JSON blobs as well, such as Firefox Health Report.)

Anyway, this is suboptimal on so many fronts:

1) We must fully read the file before parsing can occur. What if there is a parsing error early in the file? You just wasted a lot of I/O.
2) We have 2 representations that are thrown away: the raw byte stream from the file and and Unicode decoding of it. You've allocated 2 large strings for practically nothing.
3) I /believe/ that the JS engine stores all strings as UTF-16 and thus requires 2x on-disk space in memory. 162kb just became 324kb! When we add up the intermediate representations, we have 324 (raw) + ~324 (Unicode) = ~648kb of strings that we don't even use!
4) Lots of XPCOM crossings. Potential JS compartment crossings. It is my understanding there is overhead here. Potentially buffer copying overhead.
5) The final object contains a lot of string data that was present in the original buffers but AFAIK we can't intern the strings into the larger buffer, so this is essentially a 3rd allocation of the original data!

Output is essentially the same problem.

For Firefox Health Report, it's even worse, particularly for the data upload case. In addition to serializing to JSON, we need to zlib compress. This is another large string allocation in JS. Instead of writing to disk, we write to Necko. I believe Necko performs an explicit copy of the buffer. I could be wrong.

Anyway, JSON + large objects is full of performance crapitude all the way down the stack.

I'd *really* like to see a stream-based API for doing JSON:

  let fh = open('sessionstore.js', 'r')

  let decoder = new StreamDecoder();
  decoder.addFilter('utf8'); // Decode input bytes as UTF-8.
  decoder.addFilter('json'); // Parse Unicode stream as JSON. Emit object. onComplete(obj) { ... });

For the FHR case of uploading:

  let obj = {...}; // The object to be JSONified that constitutes our payload.

  let encoder = new StreamEncoder();
  encoder.addFilter('json'); // Convert the input to JSON.
  encoder.addFilter('utf8'); // Encode as UTF-8.
  encoder.addFilter('compress'); // zlib compress it.

  let request = new HttpRequest();

  // The computation is actually deferred. Yay stream processing!
  request.bodyStream = encoder.write(obj);

I invented some classes here to demonstrate conceptually how this would work. We kinda/sorta have stuff like this in Gecko, but the APIs aren't as simple. In reality, you likely wouldn't have a JSON filter because the input/output is not a stream. But, it helps demonstrate my point.

Essentially I'm asking for APIs that use streams everywhere. No pre-buffering. No excessive buffer allocations. No XPCOM or compartment shuffling. Everything all self-contained and optimally implemented. Oh, and it's asynchronous of course.
Agreed with almost everything, but this deserves its own bug. Filed as bug 832664.
Created attachment 704843 [details] [diff] [review]
Assignee: nobody → dteller
Attachment #704843 - Flags: feedback?(ttaubert)
Moving to a different strategy: bug 838577.
Last Resolved: 5 years ago
Flags: needinfo?(ttaubert)
Resolution: --- → WONTFIX
Attachment #704843 - Flags: feedback?(ttaubert)
You need to log in before you can comment on or make changes to this bug.