Session Restore generates sustained large disk traffic [meta]

NEW
Unassigned

Status

()

defect
3 years ago
22 days ago

People

(Reporter: gcp, Unassigned)

Tracking

(Depends on 1 bug, Blocks 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fxperf])

Attachments

(1 attachment, 3 obsolete attachments)

Reporter

Description

3 years ago
Someone on IRC alerted me to this:
https://forums.servethehome.com/index.php?threads/firefox-is-chewing-through-your-nand.11346/

I have no idea how accurate the report is, but some observations:

A 1.2M or larger recovery.js doesn't seem to be exceptional. Saving that every 15 seconds leads to about 7GB of traffic per day, or 2.5TB/year. That seems rather large. It's still not a real issue given realistic SSD lifetimes (40T/year for 850 EVO for example). But on the other hand, I could imagine some people have way larger sessions, too.

In any case, I doubt anyone is expecting an idle Firefox session to write 7GB/day.

A local experiment shows zlib reduces my recovery.js by a factor of about 20. Worth considering?
Definitely worth considering. We also have a library for reading/writing lz4 compression (with non-standard headers, for historic reasons), so at least this (de)compression could be implemented very quickly.

Comment 2

3 years ago
If the session truly is idle, couldn't we get this size down significantly by only writing deltas? If the previous store is kept in memory and diffed, at the very least writes could be toned down in size, and best case, skipped altogether.
Yes, I developed an early prototype for diff-based updates a few years ago, but this died by lack of reviewer.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment on attachment 8794300 [details]
Bug 1304389 - Indent code performing session write;

https://reviewboard.mozilla.org/r/80814/#review79494

::: browser/components/sessionstore/SessionWorker.js:170
(Diff revision 1)
>          // Move $Path.clean out of the way, to avoid any ambiguity as
>          // to which file is more recent.
>          File.move(this.Paths.clean, this.Paths.cleanBackup);
>        }
>  
> +      if (true) {

Er, what's the point of this `if(true)`?
Comment on attachment 8794301 [details]
Bug 1304389 - Don't write session file if content hasn't changed;

https://reviewboard.mozilla.org/r/80816/#review79496

::: browser/components/sessionstore/SessionWorker.js:11
(Diff revision 1)
>   * A worker dedicated to handle I/O for Session Store.
>   */
>  
>  "use strict";
>  
> +const { classes: Cc, interfaces: Ci } = Components;

Well, `Components` hasn't been available in workers since ~2011, as this would require just about everything in Firefox to use (slow) multi-threaded refcounting and garbage-collection.

So, you would need to reimplement some kind of hashing in JS.

::: browser/components/sessionstore/SessionWorker.js:186
(Diff revision 1)
>          // to which file is more recent.
>          File.move(this.Paths.clean, this.Paths.cleanBackup);
>        }
>  
> -      if (true) {
> +      /* Avoid I/O if the end result would be the same. */
> +      if (currentHash != this.lastWriteHash) {

Have you checked that this change preserves the invariants represented by `this.state`? I'm not entirely sure that it does.
Comment on attachment 8794301 [details]
Bug 1304389 - Don't write session file if content hasn't changed;

https://reviewboard.mozilla.org/r/80816/#review79500
Attachment #8794301 - Flags: review?(dteller)
Comment on attachment 8794300 [details]
Bug 1304389 - Indent code performing session write;

https://reviewboard.mozilla.org/r/80814/#review79502
Attachment #8794300 - Flags: review?(dteller)

Comment 10

3 years ago
Hashing all of this data in memory is a terrible idea, especially for example if you end up doing that on the main thread where there is almost always more important work to do.  It's also bad for battery usage.  It seems very hard to fix this bug by applying a band-aid since the fundamental design of sessionstore (collecting all info off of a timer without knowing what things have changed, and dumping down the entire data in JSON) make it difficult to do something efficient here, so ideally we should fix those problems.
Attachment #8794300 - Attachment is obsolete: true
Attachment #8794301 - Attachment is obsolete: true
(In reply to :Ehsan Akhgari from comment #10)
> Hashing all of this data in memory is a terrible idea, especially for
> example if you end up doing that on the main thread where there is almost
> always more important work to do.  It's also bad for battery usage.  It
> seems very hard to fix this bug by applying a band-aid since the fundamental
> design of sessionstore (collecting all info off of a timer without knowing
> what things have changed, and dumping down the entire data in JSON) make it
> difficult to do something efficient here, so ideally we should fix those
> problems.

I think its worth an experiment.  Hashing in CPU may be cheaper battery-wise than spinning disk to write a large file.  Same for compression.  Just my opinion.  Just as long as this work occurs off-main-thread.
1. Yes, we need to fix the design of Session Restore. That would need manpower, though.

2. gps was doing this in a worker, and compression would also happen in a worker (decompression still needs to be on the main thread iirc), so that's ok.

3. A long time ago, we measured the performance of compressing + writing vs. just writing and found out that compression actually made things faster. I don't remember on which disks, though – probably not on SSDs.
ehsan, bkelly, and I chatted about this in #developers.

I think we all agree that the current approach of sessionstore writing everything to disk all the time is very sub-optimal.

I think we all agree that attempting to write sessionstore when nothing has changed is horrible. Ideally we'd react to change events and only write when we know something has updated. (Read: no timers). But that requires a major architectural change.

I believe writing uncompressed JSON is worse than JSON+lz4. I would happily trade CPU cycles in the worker for less I/O and lz4 is very CPU efficient. Seems like a no-brainer for a quick I/O win.

We talked a bit about where CPU is spent as part of passing the session data to the worker, serializing it, and writing it. https://nolanlawson.com/2016/02/29/high-performance-web-worker-messages/ seems to indicate we'd save CPU by performing the JSON.stringify() before the postMessage(). That optimization is worth exploring. But it has nothing to do with I/O.

We disagreed a bit on hashing. I believe trading CPU for hashing to avoid I/O is worthwhile. The big question is where that hashing occurs. If it is in the main thread, that could incur jank. So we want to hash from the worker thread. However, I'm not sure how to access the hasher from the worker thread since it doesn't have access to Components. If someone can make that work, it is worth trying.

Anyway, I was willing to do this work as a quick band-aid. Since it appears it isn't trivial and fixing this isn't part of my job responsibility, I'm discarding my commits. Good luck to whoever picks it up.
(In reply to Gregory Szorc [:gps] from comment #13)
> We talked a bit about where CPU is spent as part of passing the session data
> to the worker, serializing it, and writing it.
> https://nolanlawson.com/2016/02/29/high-performance-web-worker-messages/
> seems to indicate we'd save CPU by performing the JSON.stringify() before
> the postMessage(). That optimization is worth exploring. But it has nothing
> to do with I/O.

That blog post is more about increasing throughput.  It doesn't measure if the increased throughput comes with greater CPU usage, etc.  I think we should measure carefully before doing anything like a stringify on the main thread before the postMessage().  That could jank.
(In reply to Gregory Szorc [:gps] from comment #13)
> We disagreed a bit on hashing. I believe trading CPU for hashing to avoid
> I/O is worthwhile. The big question is where that hashing occurs. If it is
> in the main thread, that could incur jank. So we want to hash from the
> worker thread. However, I'm not sure how to access the hasher from the
> worker thread since it doesn't have access to Components. If someone can
> make that work, it is worth trying.

WebCrypto is exposed on workers.  I think `self.crypto.subtle.digest()` does what you want:

https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest
(In reply to Gregory Szorc [:gps] from comment #13)
> ehsan, bkelly, and I chatted about this in #developers.
> 
> I think we all agree that the current approach of sessionstore writing
> everything to disk all the time is very sub-optimal.
>
> I think we all agree that attempting to write sessionstore when nothing has
> changed is horrible. Ideally we'd react to change events and only write when
> we know something has updated. (Read: no timers). But that requires a major
> architectural change.

Agreed. Manpower needed to solve this.

> I believe writing uncompressed JSON is worse than JSON+lz4. I would happily
> trade CPU cycles in the worker for less I/O and lz4 is very CPU efficient.
> Seems like a no-brainer for a quick I/O win.

Agreed. iirc, the only reason not to do this was that it broke some add-ons, but I believe that we should just go ahead and get in touch with the add-on owners while we proceed in this direction. I also believe that it's about 20 minutes of coding (plus tests), since ChromeWorkers already have access to lz4 (de)compression.

> 
> We talked a bit about where CPU is spent as part of passing the session data
> to the worker, serializing it, and writing it.
> https://nolanlawson.com/2016/02/29/high-performance-web-worker-messages/
> seems to indicate we'd save CPU by performing the JSON.stringify() before
> the postMessage(). That optimization is worth exploring. But it has nothing
> to do with I/O.

Unless it has changed in the last few years, we have already tested with or without JSON.stringify and performance deltas were somewhat random. Sometimes, JSON.stringify made things faster, sometimes it made things slower, so I'm not optimistic about this.

> We disagreed a bit on hashing. I believe trading CPU for hashing to avoid
> I/O is worthwhile. The big question is where that hashing occurs. If it is
> in the main thread, that could incur jank. So we want to hash from the
> worker thread. However, I'm not sure how to access the hasher from the
> worker thread since it doesn't have access to Components. If someone can
> make that work, it is worth trying.

Definitely worth benchmarking.
(In reply to Gregory Szorc [:gps] from comment #13)

> I think we all agree that attempting to write sessionstore when nothing has
> changed is horrible.

Do we have a way of assessing whether this is frequent or infrequent?

If, say, 70% of sessionstore writes are redundant -- that is, nothing has changed -- it's worth exploring the use of change flags within contributing modules to decide whether to skip each timed save. No new windows? No tabs changed? Don't write anything.

If most sessionstore writes write *something* new, even if it's insignificant, then only compression or a more thorough reworking (deltas, databases) is worth considering.
Duplicate of this bug: 1305164
Duplicate of this bug: 1305203
(In reply to Ben Kelly [:bkelly] from comment #15)
> WebCrypto is exposed on workers.  I think `self.crypto.subtle.digest()` does
> what you want:

I looked at this again briefly today. Unfortunately, the PromiseWorker implementation we're using assumes the method we dispatch to in the worker returns a result synchronously. And all of WebCrypto's APIs return promises. That means we need to teach PromiseWorker to handle promises returned by dispatched methods (and that's easier said than done since PromiseWorker assumes a FIFO ordering of requests to replies) or we need to hack around PromiseWorker and do manual postMessage foo. Choose your rabbit hole.

FWIW if the worker infrastructure "just worked," I would have submitted patches already. As it stands, our custom JS modules need more love.

Regarding performance concerns, my vote is to land my original hashing to avoid writes approach with telemetry that measures a) the time to hash b) how often we avoid a no-op write. I think performance concerns are premature until telemetry data proves otherwise. If this isn't the type of thing telemetry should be used for, I don't know what is.

Updated

3 years ago
Duplicate of this bug: 1305193
Speaking of Telemetry, FX_SESSION_RESTORE_FILE_SIZE_BYTES should answer some questions. For example, 89.25% of release channel users have a file < 1,120,000 bytes and 74.58% of users are < 317k. There is a worrying amount of users with large files though.

Comment 23

3 years ago
Why not just use the filesystem?  Imagine a layout like this:

    + profilename.default
    |--+ sessionrestore
       |--+ <timestamp>
          |--+ window1
          |  |--+ tab1
          |  |  |--- cookies
          |  |  |--- formdata
          |  |  |--- title
          |  |  |--- url
          |  ---+ tab2   
          |     |--- cookies
          |     |--- formdata
          |     |--- title
          |     |--- url
          ---+ window2
             |--+ tab1
             |  |--- cookies
             |  |--- formdata
             |  |--- title
             |  |--- url
             ---+ tab2   
                |--- cookies
                |--- formdata
                |--- title
                |--- url

One directory per timestamp, containing one directory per window, containing one directory per tab, containing one file per data type.  All plain-text.  Change a tab, just write to that tab's files, and update the timestamp directory name.  Every x minutes, make a new timestamp tree, keeping y old ones as backups.

No serializing and writing entire sessions to disk at once.  Only small writes to small files.  All plain-text, easy to read outside of the browser.  Easy to copy, backup, modify, troubleshoot.  No complex JSON, serializing, or database code.  Use the write-to-temp-file-then-fsync-then-rename-over-existing-files paradigm to get atomic updates.

Need to know if part of a session on disk is stale?  Check the mtime for that file, see if it's older than the last time that data was changed in memory.

Simple stuff.  Straightforward.  No overengineering.  No bloat.

Why not do this?
Several problems:

- suddenly, you need garbage-collection of files;

- reading 5000 files upon startup instead of 1 might slow down things considerably, especially since we already have I/O congestion during startup;

- suddenly, we need to handle the possibility that any number of 5000 files might be corrupted and may need recovery, instead of just one, so you need to handle backups (and garbage-collection of backups) and recovery for 500 files;

- we still need to rearchitecture session restore to find out about updates per-tab. Yes, that's something we need to do nevertheless, but it's a pretty big prerequisite.

Also, `write-to-temp-file-then-fsync-then-rename-over-existing-files` for 5000 files sounds like a bad idea for global system performance. Note that we currently don't use (or need) fsync for Session Restore.
Just some random thoughts about the hashing approach. We use it succesfully for bookmarks backups but there is a big difference. Bookmarks don't change very often, so the hashing approach saves a lot.

On the other side, Session Restore doesn't seem to make a difference between critical and noncritical data, for example IIRC it stores scrolling position of pages, right?
Just storing the scrolling position of pages may be enough to require a new write every time, but honestly it's not a critical enough information to be stored every 15s. In the worst case the user has to scroll again a little bit.

If we could set a flag when "critical" information changes (new tab open/close and such), then every 15s we could write *if and only if* some critical information changed, after N skipped writes, we enforce a write regardless.

The other problem I see in Sessionstore at the moment, is that it doesn't seem to have any handling of idle, why don't we do a single write after the user enters idle, and then stop until idle finishes? we only care to store users actions, not page actions, right?
> If we could set a flag when "critical" information changes (new tab open/close and such), then every 15s we could write *if and only if* some critical information changed, after N skipped writes, we enforce a write regardless.

Worth trying.

> The other problem I see in Sessionstore at the moment, is that it doesn't seem to have any handling of idle, why don't we do a single write after the user enters idle, and then stop until idle finishes? we only care to store users actions, not page actions, right?

I like this idea a lot. This wouldn't complicate Session Restore and it would be a big benefit.

Generally, we want to store page actions, too (e.g. Session Cookies, DOM Storage, etc.) but if the user is not in front of the computer when it happens, it's definitely much less critical. So we could at least reduce the timer from once per 15 seconds to once per 15 minutes during idle, for instance.
(In reply to Marco Bonardo [::mak] from comment #25)
> Just some random thoughts about the hashing approach. We use it succesfully
> for bookmarks backups but there is a big difference. Bookmarks don't change
> very often, so the hashing approach saves a lot.
> 
> On the other side, Session Restore doesn't seem to make a difference between
> critical and noncritical data, for example IIRC it stores scrolling position
> of pages, right?
> Just storing the scrolling position of pages may be enough to require a new
> write every time, but honestly it's not a critical enough information to be
> stored every 15s. In the worst case the user has to scroll again a little
> bit.
> 
> If we could set a flag when "critical" information changes (new tab
> open/close and such), then every 15s we could write *if and only if* some
> critical information changed, after N skipped writes, we enforce a write
> regardless.
> 
> The other problem I see in Sessionstore at the moment, is that it doesn't
> seem to have any handling of idle, why don't we do a single write after the
> user enters idle, and then stop until idle finishes? we only care to store
> users actions, not page actions, right?

These ideas are quite pragmatic and can already solve a large chunk of the problems, without requiring a refactor or rewrite or, even worse, a complete re-architecture of Sessionstore.
I'd like to vote for working on - at least - the following three things in the short term:

1. (lz4)Compression. I've talked about this with Marco before and I think reached the conclusion that we need only small modifications to get this working; many of the prerequisites are already in the tree. Bonus is that JSON is a format that lends itself well for compression.
2. Flexible write (flush) intervals provided by an idle tracker. (Great idea Marco!)
3. Even more flexible write (flush) intervals by introducing Sessionstore update notification priorities; scroll data would be low prio, form field data would be high prio.

My gut tells me that the flexible write (flush) interval work will yield the most noticeable difference and wins.

Thoughts?

Comment 28

3 years ago

**A side note:** Since lot of users with SSD follow this bug I want let them know that there is an Addon that will MITIGATE the issue Auto Unload Tabs: https://addons.mozilla.org/en-US/firefox/addon/auto-unload-tab/

It can unload tab on various condition like: Unload tab ffter X minutes, Unload tab if size is over X megabytes etc etc.

The Addon is fully compatible with E10.

I've set my tabs to unload after 15 minutes if did not use them.
(In reply to Gregory Szorc [:gps] from comment #20)
> (In reply to Ben Kelly [:bkelly] from comment #15)
> > WebCrypto is exposed on workers.  I think `self.crypto.subtle.digest()` does
> > what you want:
> 
> I looked at this again briefly today. Unfortunately, the PromiseWorker
> implementation we're using assumes the method we dispatch to in the worker
> returns a result synchronously. And all of WebCrypto's APIs return promises.
> That means we need to teach PromiseWorker to handle promises returned by
> dispatched methods (and that's easier said than done since PromiseWorker
> assumes a FIFO ordering of requests to replies) or we need to hack around
> PromiseWorker and do manual postMessage foo. Choose your rabbit hole.

Ok.  That seems weird, though, considering we have lots of async APIs on workers.

You could also use async WebCrypto from the main window.
(In reply to Ben Kelly [:bkelly] from comment #29)
> Ok.  That seems weird, though, considering we have lots of async APIs on
> workers.

It is a limitation of PromiseWorker, which is an abstraction that lets you essentially do an RPC dispatch to a worker and get a promise in return, without having to manage messages. It's a nice abstraction. It just appears a little half-baked :/
 
> You could also use async WebCrypto from the main window.

That would require serializing JSON from the main thread and that might introduce jank. Although there is likely already jank from having to serialize the JS Object to the worker. But that is hidden away in the bowels of the worker implementation, so perhaps it isn't as bad?
> It is a limitation of PromiseWorker, which is an abstraction that lets you essentially do an RPC dispatch to a worker and get a promise in return, without having to manage messages. It's a nice abstraction. It just appears a little half-baked :/

Well, it does what it was designed for :)

I guess I could extend it to handle async computations.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #31)
> I guess I could extend it to handle async computations.

https://hg.mozilla.org/users/gszorc_mozilla.com/firefox/rev/ea904df6580f is as far as I got. It is failing the test where a promise is rejected. Something to do with serializing the Error I think.

dump() in the worker isn't sending logs to xpcshell. Neither is console.log(). That makes things really difficult to debug. The rabbit hole got too deep for my liking.

Updated

3 years ago
Depends on: 810932

Comment 33

3 years ago
(In reply to Marco Bonardo [::mak] from comment #25)
> Just storing the scrolling position of pages may be enough to require a new
> write every time, but honestly it's not a critical enough information to be
> stored every 15s.

Wasn't this alread done in bug 506482 ?
"Don't write sessionstore.js to disk for "read only" events"
(In reply to hartnegg from comment #33)
> Wasn't this alread done in bug 506482 ?
> "Don't write sessionstore.js to disk for "read only" events"

Good find, yes looks like "not saving on scroll" was already fixed. We may still be listening for other "low-priority traffic" though, so it may be nice to have a look and figure out if there are other case we may exclude.
We should also check we didn't regress that fix. I see we have a scroll position listener here
http://searchfox.org/mozilla-central/rev/935627b015aaabbd2dffa90cce051521f22dd7e6/browser/components/sessionstore/content/content-sessionStore.js#366
that sends a "SessionStore:update" message, and I don't know if later there's some filtering for these events.

Comment 35

3 years ago
Sounds like that kind of info should best be updated only in RAM. But taken care of by a separate process (shared memory?), which writes the data to disk when the main Firefox process is terminated. 

Is there not already a cleanup/exception function, which is always executed, even when Firefox crashes? Many of my programs have this (different programming language though).

Comment 36

3 years ago
(In reply to hartnegg from comment #35)
> Sounds like that kind of info should best be updated only in RAM. But taken
> care of by a separate process (shared memory?), which writes the data to
> disk when the main Firefox process is terminated. 
> 
> Is there not already a cleanup/exception function, which is always executed,
> even when Firefox crashes? Many of my programs have this (different
> programming language though).

That is a poor approach as it doesnt help if you experience a power loss/kernel panic or other external issue.

Comment 37

3 years ago
(In reply to 6lobe from comment #36)
> That is a poor approach as it doesnt help if you experience a power
> loss/kernel panic or other external issue.

I had so far only seen Firefox crash cited as reason for frequently updating this file. Not rare events like kernel panic. And power loss looses relevance when more people switch to notebooks.

If an exception handler would ensure that loss of session data occurs only in such very rare cases, the saving interval could be increased from 15 seconds to a much longer time.

Comment 38

3 years ago
That's a really good point hartnegg, the system crashing is quite rare, and I don't see why it would be worth it for Firefox to hog the disk just so that it can recover the session in the rare event that the system does crash. Storing session restore in RAM seems like a good idea to me.

It could also be a good idea to just sync the differences between the RAM copy and the disk copy in order to reduce the I/O but still keep restoring from a computer crash a possibility. We could maybe even have separate timers for saving session to RAM and for syncing this to disk, to allow for a more updated recovery just if Firefox crashes and a somewhat more outdated recovery if the system crashes.

Comment 39

3 years ago
That doesnt make sense. You are proposing to fix this non-issue (disks are meant to be written to) with a real issue (data loss in case firefox cant exit cleanly).
I'd like to point out that Bugzilla is not a discussion forum, but a tracker for work. We are always appreciative of additional data points that aid developers in getting to a better solution and making more informed decisions.
Speculation is not data.
We hope to improve the situation as described in this bug, but we need to prioritize it against other work since our resources are very limited. (That's why we're always immensely grateful when volunteer contributors step up to work on issues.)
For what its worth, I did some investigation on de-duplicating writes today.  I actually don't see many (any?) duplicate writes based on the encoded output data.

Compression in the SessionWorker may still be a good idea, though.
I guess I can get some mostly duplicate writes still by:

1. Open a page like washingtonpost.com
2. Scroll to bottom
3. Wait for write
4. Scroll up a bit and then immediately back to bottom
5. Another write occurs

The write in (5) is nearly identical to the write in (3).  The only different is a "lastAccessed" timestamp on the washingtonpost docshell entry.
This patch works for me locally.  I'm not sure, though, how to:

1) Handle the file format migration.  Right now I just made it fall back to uncompressed reading if decompression fails.
2) Test this sort of thing in session store.

Marco, what do you think?
Attachment #8819573 - Flags: feedback?(mak77)
Another thing we can do is look at why we get extremely large restore.js files.  I have a browser session open right now with only 4 tabs.  The restore.js is 5+MB.

I dug into the file a bit and 60% of this file is for a single long lived tab containing a single-page app.  In this case it was a twitter tab.  It has about 30 entries, but many with about:blank child docshells.

I'm not sure if there is anything we can really do for this, but compression would help a lot.  Most of the size of the file seems to come from the serialized principals which seem to be repeated a lot for these entries.  Running a command line lz4 on this file shows a result that is only 2% of the original.  (5MB+ to 128KB)
Comment on attachment 8819573 [details] [diff] [review]
P1 LZ4 compress the session store restore.js file. r=mak

Review of attachment 8819573 [details] [diff] [review]:
-----------------------------------------------------------------

Note I'm not a direct session restore peer, but I can act as a browser peer.
First, I think the lz4 idea is pretty good and easily actionable. I was discussing it with Mikedeboer at the last all hands a few days ago.
Though, it should be moved to a dependency of this bug, that at this point is mostly a tracking bug for ideas on how we can improve the situation. Implementations should live and be discussed in dependent bugs.

For migration, there's an easy path we previously used for other features.
Name the new files with .jsonlz4 extension, make import code check the file ext and take the correct read path based on that.
Uplift only the import code to Aurora/Beta.
At that point, even in case of a downgrade of one or two versions, Firefox will be able to read both compressed and uncompressed files. We usually don't care much about older downgrades, since they can break many other things already.
Attachment #8819573 - Flags: feedback?(mak77) → feedback+
Duplicate of this bug: 934967
I made these changes based on previous patch by :bkelly and ::mak's suggesions.
Attachment #8872205 - Flags: feedback?(mdeboer)
For migration, after we loaded a session file with js extension, should we also delete all session files with js extension?
Flags: needinfo?(mdeboer)
Depends on: 934967
Flags: needinfo?(mdeboer)
Keywords: meta
Comment on attachment 8872205 [details] [diff] [review]
Compress session store using lz4

We discussed during our 1:1.
Attachment #8872205 - Attachment is obsolete: true
Attachment #8872205 - Flags: feedback?(mdeboer)
Comment hidden (offtopic)
Comment hidden (offtopic)

Comment 52

2 years ago
I am probably one of the first people (if not the first) who noticed this behavior of writing too much data by Firefox on SSD (https://support.mozilla.org/en-US/questions/975231) so I am very happy to see something was done for solving this.
However let me provide new data I found about writes, as I understand from Web-Extension jump? Correct me if I'm wrong, it's about webappstore.sqlite-wal in addition to everything else.
4 hours test with 30 tabs total (yet 27 of those weren't even loaded after I launched FF), basically with 3 or 4 active tabs.
Browsing wasn't active at all, 30 minutes could pass before jumping to next link.
Result: 973MB was written in FF Profile where almost half of it by browser extention data folder so it doesn't count.
So out of other 500MB webappstore.sqlite-wal took 188MB, webappstore-sqlite - 24MB (210MB total for 4 hours), following notoriously known cookies-sqlite-wal - 162MB, cookies-sqlite - 72MB, places-sqlite-wal - 37MB.

Comment 53

2 years ago
Sorry for doublepost. Didn't find a way to attach file or edit previous post.
https://screenshots.firefox.com/28IB1gumqPDrZ4hS/null
This image is lightly edited but it represents what it shows except results are from few minutes longer logging, so bascially not a big change.
(In reply to AjvarXX from comment #52)
> However let me provide new data I found about writes, as I understand from
> Web-Extension jump? Correct me if I'm wrong, it's about
> webappstore.sqlite-wal in addition to everything else.
> 4 hours test with 30 tabs total (yet 27 of those weren't even loaded after I
> launched FF), basically with 3 or 4 active tabs.
> Browsing wasn't active at all, 30 minutes could pass before jumping to next
> link.
> Result: 973MB was written in FF Profile where almost half of it by browser
> extention data folder so it doesn't count.
> So out of other 500MB webappstore.sqlite-wal took 188MB, webappstore-sqlite
> - 24MB (210MB total for 4 hours), following notoriously known
> cookies-sqlite-wal - 162MB, cookies-sqlite - 72MB, places-sqlite-wal - 37MB.

While you are obviously encountering a large amount of disk traffic from Firefox, "webappsstore" is a separate storage component from what this bug is focused on (session storage). Everything with "webappsstore" in its filename is related to DOM storage (e.g. LocalStorage API). As such, activity in that storage engine is significantly driven by sites you visit. In other words, if you visit sites that make heavy use of DOM storage, you'll see more disk activity to "webappsstore" files. You can see how sites you visit are using storage by using the Storage Inspector (https://developer.mozilla.org/en-US/docs/Tools/Storage_Inspector).

That being said, there are bugs on file tracking improvements to the way Firefox handles DOM storage. Bug 857888 and all the bugs under bug 1286798 seem like good places to start reading. The Core :: DOM bug component would be the appropriate place to file bugs related to "webappsstore" files.

Also, 200+ MB to the "cookies" files seems excessive to me. I would file a bug reporting that under the Core :: Networking: Cookies component.

Comment 55

2 years ago
Gratitude for explanation. Will start from there!
No longer blocks: 1419053
Summary: Session Restore generates sustained large disk traffic → Session Restore generates sustained large disk traffic [meta]
Whiteboard: [fxperf]

Comment 56

a year ago
Would it be possible when installing on Android or IOS to use a different default interval? (because these are likely to be using SSD), or at least warn the installer of this issue and suggest changing the interval? I have had firefox installed on a tablet for some time having forgotten all about this issue.

Comment 57

4 months ago
workaround

I have browser.sessionstore.interval set to 60, which is working quite well

Comment 58

23 days ago

I believe we can stop saving the session all the time: save it only when quitting Firefox. In case Firefox crash we can use an exception to save the information.

The only problem is in the rare case of a system crash. For those rare cases, we can only save the URL of all open tab when tab is open/closed. Saving URL in sqlite will be very few IO. In case of a crash, you will have to reload all the tab, and perhaps you lose some information. But is that really a problem? I will be perhaps affected once a year. On top of that, I don't think peoples are waiting for a full restore in case of a system crash.

Today, Firefox is using so much IO that I have to change browser.sessionstore.interval to have normal performance for other programs. The current situation is not viable.

Comment 59

22 days ago

(In reply to perdrisat from comment #58)

I believe we can stop saving the session all the time: save it only when quitting Firefox. In case Firefox crash we can use an exception to save the information.

Bad idea. But maybe it should be an option in about:config allowing to change time between savings, and to turn off this savings between Firefox restarts.

Comment 60

22 days ago

(In reply to Robert Ab from comment #59)

Bad idea.

For what it's worth, I strongly disagree. Firefox is constantly wasting battery power (via wakeups) and SSD longevity (by frequent and sizeable filesystem writes). This behavior is completely unnecessary, not at all typical for a program that is primarily a document reader, and hidden from the vast majority of users with no obvious way to disable it even if they do eventually discover it. This, the current behavior, is an ill-conceived, bloody awful idea.

Please don't be so dismissive of legitimate suggestions for fixing the problem. Let's instead try to make Firefox better.

Comment 61

22 days ago

(In reply to Forest from comment #60)

(In reply to Robert Ab from comment #59)

Bad idea.

For what it's worth, I strongly disagree. Firefox is constantly wasting battery power (via wakeups) and SSD longevity (by frequent and sizeable filesystem writes). This behavior is completely unnecessary, not at all typical for a program that is primarily a document reader, and hidden from the vast majority of users with no obvious way to disable it even if they do eventually discover it. This, the current behavior, is an ill-conceived, bloody awful idea.

Please don't be so dismissive of legitimate suggestions for fixing the problem. Let's instead try to make Firefox better.

But maybe it should be an option in about:config allowing to change time between savings, and to turn off this savings between Firefox restarts. So everybody can decide what it is better for him/her.

Another possible solution. Session managers (like Tab Session Manager) can be installed in Firefox. TSM is using IndexedDB now as extension storage, which make much easier on HDD/SSD.

In case Firefox crash we can use an exception to save the information.

Bad idea.

I think it's a fine idea, but not that this is not acceptable in general.

There are multiple situations beyond the "system crash" (which I take as meaning "the whole computer crashes") in which Firefox will not be given the chance to run exception handlers. For example, when a user force-kills Firefox because it hung irrecoverably, or on Linux when the out-of-memory killer kicks in (which triggers a SIGKILL signal where the process is terminated without any chance to run code at shutdown).

This means there must be some form of saving in between to guard against that.

However, it is totally sensible to reduce any form of "periodic" saving as much as possible.

For example, there's no point in saving the current state if it hasn't changed, and it might also make sense to save only the changes instead of the full state every time.

Also, giving about:config settings to tune the tradeoff between safety and resource usage sounds great. However, we should still make the general situation as good as possible.

On top of that, I don't think peoples are waiting for a full restore in case of a system crash.

This one I strongly oppose. I expect Firefox to recover in this situation.

I have some tabs open in Firefox for over 10 years, and it has never permanently lost them, across thousands of restarts and hundreds of hard crashes. This is one of the features where Firefox does a lot better than Chromiu. This is how good programs behave, and it is feasible (many modern programs have this property of being extremely reliable in not losing user data; Sublime Text is another example).

And it's critical for a good user experience to not have users lose data.

Comment 63

22 days ago

(In reply to Niklas Hambüchen from comment #62)

This means there must be some form of saving in between to guard against that.

I hope we can rephrase that to, "it makes sense to offer the option of saving in between, to guard against such crashes." After all, to think of it as a "must" would be to presume to know users' needs better than they do themselves, to waste their resources without their permission, and to assume that a browser cannot exist without this behavior. :)

For example, there's no point in saving the current state if it hasn't changed

Indeed. For example, none of the web pages/apps/sites that I use need their state saved when I haven't been typing or clicking in them. Save frequency can therefore be reduced to zero for such cases.

Optimizing all three of these would be great:

  • what to save
  • how to represent it
  • when to save it

Making the last one easily configurable, perhaps with options like "regularly", "infrequently and only after user input", and "never", would be flexible enough to work for most people?

(In reply to Forest from comment #63)

(In reply to Niklas Hambüchen from comment #62)
...

For example, there's no point in saving the current state if it hasn't changed

Indeed. For example, none of the web pages/apps/sites that I use need their state saved when I haven't been typing or clicking in them. Save frequency can therefore be reduced to zero for such cases.

Do you understand how this already works?

The time interval doesn't mean data is always resaved every X seconds. Interval is when state change is checked, and ONLY if the state has changed is anything written. If you are seeing writes when there was no state change then that is a bug you can file.

Comment 65

22 days ago

I expect Firefox to recover in this situation.

Perhaps I was not clear enough in my comment #58. I'm not saying we should not save tabs or take the risk to lose them.

There are two aspects:

Opened tabs/windows

We should constantly save which tab are open (this is not the case now, it's only append every 15secondes). With this information we can restore all the opened tabs in any case. But this is very light information (basically just URL and windows).

Sessions

Sessions hold a lot more information, this is the "page state". Session can sometimes change without interaction from the user. I believe we can store session when Firefox close instead of a regular interval. I believe session information are not critical as opened tab are.

I'm sure we can do something smarter, but this bug is 3 years old and nothings appended yet.

But maybe it should be an option in about:config
No. Option are not there to fix bugs. Having constantly ~1-2mb/s of IO is a bug, not a feature. This have dramatic implication when you run on a networked disk for example.

Comment 66

22 days ago

I'm going to restrict comments to users with editbugs because at this point, comments are going in circles and not adding new information.

A significant amount of work happened since this bug was first filed that has reduced the severity of the problem.

There's still more that can be done, that's why this bug is marked up as a metabug and there are still open related bugs.

We're not going to compromise on the ability of users to restore after a crash (of either the machine or Firefox). There is still plenty of scope to improve the situation without doing so.

If you're seeing dramatic performance issues unless you alter internal session store prefs, please file a new dependent bug as comment #64 suggested, with detailed steps about how to reproduce the issue starting with a clean Firefox profile (ie what add-ons to install, which sites show problems), and ideally after some checks with the Firefox profiler ( https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem ) and/or disk monitoring tools like those from sysinternals that show that the problem is actually sessionstore-related.

Restrict Comments: true
You need to log in before you can comment on or make changes to this bug.