Open Bug 1907728 Opened 1 year ago Updated 10 months ago

recovery.jsonlz4 is not created ("out of memory")

Categories

(Core :: JavaScript Engine, defect, P3)

Firefox 127
defect

Tracking

()

UNCONFIRMED

People

(Reporter: post, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(8 files, 6 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0

Steps to reproduce:

My issue is similar to #1472470 which is closed an I cannot reopen.

My browsing sessions collect a large amount of tabs over the time.

Quit the browser (or crash the operating system)

Actual results:

recovery.jsonlz4/(sessionstore.jsonlz4) is not written, which means on next start the session is set-back to some old state.
Browser Toolbox shows that the Error happens in SessionWriter.sys.mjs Line 218 (else if (this.state == STATE_RECOVERY)) when calling IOUtils.writeJSON(). Error message is "out of memory". (Same behaviour is reproducible when changing owner and access rights of sessionstore-backups folder and recovery.jsonlz4 in user profile)

Unlike #1472470 functionality does not come back even when closing tabs.

Expected results:

"Session state correctly stored, or user presented with situation" (quoting #1472470)

It may be of interest that while the User Agent shows "Linux x86_64" the actual Firefox binary is compiled 32-bit.

I attached modified SessionFile.sys.mjs Just for reference, here's a small part of the diff in text and full part of the diff as attachment }, err => { // Catch and report any errors. - console.error("Could not write session state file ", err, err.stack); this._failures++; + let str = "Could not write session state file, Tabs: " + numActiveTabs + " (" + numClosedTabs + "), Entries: " + numActiveEntries + " (" + numClosedEntries + "), biggest meta string length Tab: " + longestStrsActiveEntries_ID + " " + longestStrsActiveEntries + " (" + longestStrsActiveEntries_Title + ") "; + console.error(str + " ", err, err.stack); + Services.prompt.alert(null, 'SessionStore error', "(One of " + this._failures + ' errors)\n\n' + str + '\n\n' + err) // By not doing anything special here we ensure that |promise| cannot // be rejected anymore. The shutdown/cleanup code at the end of the // function will thus always be executed.

To get the number linked, I write it without leading "#": Bug 1472470

To forestall questions: Yes, I cleaned the tabs from closed tabs settings page. Yes, I've got free disk memory and free RAM memory on my computer (otherwise browsing would not work).

I tested further, result is:

  1. Functionality comes back immediately when reproducing the error by taking and re-granting file access rights
  2. Functionality is broken after error "out of memory", even if amount of tabs is significantly reduced. (sometimes, not every time, sessionstore.jsonlz4 may be written correctly)
  3. Waiting for some time, browsing around worses the situation, even with similar amount of tabs. So maybe there's a neat contingent of multipurpose-memory?

Just happend now: functionality came back after error "out of memory" after closing few tabs.

In other words: It is not reproducible whether functionality comes back.

Component: Untriaged → Session Restore

Thanks for the report. A couple of questions:

  • When the session does successfully save, how big is that file?
  • Is this running on a 32 bit system?
  • What kind of output are you getting from the patch you made? Session restore needs to save the tab and its history as well as session cookies and sessionHistory. Is there a particular tab or other pattern that seems to be causing the issue?
Flags: needinfo?(post)
  • The last successful recovery.jsonlz4 was 2034682 B (2 MB), and I find previous.jsonlz4 from saturday with 2513223 B (2,5 MB) in sessionstore-backups.
  • Hardware/system:
    1. The first/main bug reporting machine has native 32 bit hardware, an old processor. It runs Firefox 127.0, as reported
    2. I also remember having these issues on a 64-bit system with 32-bit Firefox 115.6.0 esr. Shall I try to reproduce or is it sufficient that I find no sessionstore.jsonlz4 on this profile? (only jsonlz4 files are recovery.jsonlz4 and previous.jsonlz4 from 8th Jan and 10th Feb, respectively). Newest files/folders are datareporting (18th Feb.), SiteSecurityServiceState.txt (empty), prefs.js, permissions.sqlite, bookmarkbackups (all 12th Feb, from newer to older)
  • Output follows. Output is produced with a slightly enhanced variant which makes a question mark (?) for the number of closed entries instead of printing always 0:
Could not write session state file, Tabs: 376 (1), Entries: 1008 (?), biggest meta string length Tab: 221 43873 (G***n)   out of memory undefined SessionFile.sys.mjs:554:17
  • Observation: So far I found no pattern in non-working tabs. I have a breakpoint on successful write and a dialogue on failed write. Occasionally there was one (1) successful write but only in the beginning of the fails. One of the successes was after I opened Openstreetmap. Then hundreds of fails and no success. I cannot see a link to the reported string size of the tab. I closed the biggest (~200KB), no effect. Clearing closed tabs in Firefox View does not help. Clearing the debugging console does not help. Garbage collecting does not help. Complete system swapoff does not help.

I attach new SessionFile.sys.mjs and a screenshot of the dialogue (very nasty, comes up about every 30-60 sec because of my settings).

Attached file Newer SessionFile.sys.mjs (obsolete) —

Further modified SessionFile.sys.mjs which prints ? instead of 0 because that part does not get analyzed.

Attachment #9412640 - Attachment is obsolete: true
Attached file SessionFile.sys.mjs.diff (obsolete) —

Newer SessionFile.sys.mjs.diff. Compared to the shipped version.

Attachment #9412641 - Attachment is obsolete: true
Flags: needinfo?(post)

It may be of interest that in Parent process Browser Toolbox I can type let t = "abc" and it works.

But when a breakpoint in the failure part of the promise resolution happens, let t = "abc" leads to: Uncaught ReferenceError: self is not defined Even the grey result preview under the js expression input box shows that. This was by the way not so nice because it happened when I was writing my debug modification for SystemFile.sys.mjs which then had to be made somewhat "blind".

In fresh state, those "self is not defined" errors did not occur. As of now I cannot, however, confirm that let t = "" works exactly when sessionstore works. Maybe I should file another bug issue for it?

Following messages in Parent process Browser Toolbox might be of interest:

  • sizeToContent() is deprecated and will be removed in the future. commonDialog.js:132:10 --- this happens exactly when invoking the alert() dialogue for my debug message
  • Warning: attempting to write 6624 bytes to preference browser.topsites.contile.cachedTiles. This is bad for general performance and memory usage. Such an amount of data should rather be written to an external file. --- this takes about 10-20 time of my error dialogues/messages until it happens
  • X-Content-Type-Options header warning: value was “nosniff;”; did you mean to send “nosniff”? --- this happens 1-3 times between two of my error dialogues/messages

The severity field is not set for this bug.
:sfoster, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(sfoster)

I want to focus on this part of the report, because I suspect this is only tangentially related to Session Restore.

(In reply to u Ltd. from comment #0)

recovery.jsonlz4/(sessionstore.jsonlz4) is not written, which means on next start the session is set-back to some old state.
Browser Toolbox shows that the Error happens in SessionWriter.sys.mjs Line 218 (else if (this.state == STATE_RECOVERY)) when calling IOUtils.writeJSON(). Error message is "out of memory". (Same behaviour is reproducible when changing owner and access rights of sessionstore-backups folder and recovery.jsonlz4 in user profile)

So, we're throwing an exception in the call in SessionWriter.write where we call IOUtils.writeJSON.

:peterv, I see your name in the hg blame for the IOUtils.writeJSON implementation, so I'm hoping you can help direct this. For some reason, IOUtils.writeJSON is failing here, and its reporting "out of memory". SessionWriter runs in the parent process, and for a ~2MB file that seems like an unlikely cause - as the reporter indicates Firefox is otherwise functional and doesn't appear to be out of memory at all. Is there anyway to confirm this, or can you suggest anything to help get this bug triaged? Is the fact this is occurring on "native 32 bit hardware, an old processor" significant here?

Flags: needinfo?(sfoster) → needinfo?(peterv)

Following information from me which I collected since filing the bug:

  1. Reported on 127, happens also on Firefox 128.0
  2. The 2 MB of the session file occurred after I cleaned up some tabs with the highest amount of string data. Before it was 5 MB. So sfoster is right, the file size is arbitrary.
  3. After some time (somewhere 5h - 100h), session becomes very hardly usable because of massive drawing errors (many sections painted white, show regular content/button texts/... only in short flashes.
    • Console output is:
      [GFX1-]: Handling webrender error 2
      [GFX1-]: wr_renderer_render: OutOfMemory
      
      (This happens also before but in the moment of crash, when interacting with GUI, occurres very massively)
    • Just enough time to save some links with Ctrl-C, Ctrl-V from address bar
    • Little later, whole firefox crashes
    • Maybe I have to note that I'm using an very small graphic card Intel 945GM. The other computer uses Intel HD 3000 SNB GT2. I cannot recall whether it happens on other computers with non-Intel chipsets I know.
  4. I have debugged into the C code. Debugger is very slow as of lacking variable symbols in my distributions' symbol package and long searching time for method names, but I found that runs up to dom/system/IOUtils.cpp:630,631
    if (!nsContentUtils::StringifyJSON(cx, aValue, string, UndefinedIsNullStringLiteral)) {
    where StringifyJSON returns false. BTW StringifyJSON returns true for other, much smaller JSON structs but I don't recall whether they were in the same thread.

(In reply to Sam Foster [:sfoster] (he/him) from comment #14)

:peterv, I see your name in the hg blame for the IOUtils.writeJSON implementation, so I'm hoping you can help direct this. For some reason, IOUtils.writeJSON is failing here, and its reporting "out of memory". SessionWriter runs in the parent process, and for a ~2MB file that seems like an unlikely cause - as the reporter indicates Firefox is otherwise functional and doesn't appear to be out of memory at all. Is there anyway to confirm this, or can you suggest anything to help get this bug triaged? Is the fact this is occurring on "native 32 bit hardware, an old processor" significant here?

Not really, I only made that code actually report errors when creating the promise, before my change we were just dropping errors on the floor but still bailing out anyway, so the end result would have been the same.

If nsContentUtils::StringifyJSON is returning false as comment 15 claims then JS_Stringify is failing, which is JS engine code. It could be that the JS engine is running into OOM.

Flags: needinfo?(peterv)

(In reply to Peter Van der Beken [:peterv] from comment #16)

If nsContentUtils::StringifyJSON is returning false as comment 15 claims then JS_Stringify is failing, which is JS engine code. It could be that the JS engine is running into OOM.

Ok, this starts to sound like another variation on bug 1849393. IIRC the process of serializing the session state object graph will use quite a bit more memory than either the in-memory JS representation or the string that gets output.

It still doesn't explain why we are hitting this OOM though. Is there a bug here, or is this normal and expected? I.e. is it possible we are miscalculating either the amount of memory we'll need to serialize to JSON on this 32-bit system, or we're not successful in requesting more to complete the operation? Is that something you can address :willyelm? See comment 14 for a quick summary.

Flags: needinfo?(wmedina)

Can you get the output of free -h on this system, while Firefox is running? Is there actually memory that should be available or is the out-of-memory exception accurate?

Flags: needinfo?(post)
user:~> free -h
               total        used        free      shared  buff/cache   available
Mem:           2.9Gi       1.2Gi       1.0Gi       123Mi       1.1Gi       1.7Gi
Swap:          4.0Gi       518Mi       3.5Gi
user:~> # Session restore file successfully written 
user:~> # ... after some browsing. Two more tabs are open ...
user:~> # Error in session file writing
user:~> free -h
               total        used        free      shared  buff/cache   available
Mem:           2.9Gi       1.5Gi       461Mi       114Mi       1.4Gi       1.4Gi
Swap:          4.0Gi       519Mi       3.5Gi
user:~>

I can start image viewers, pdf viewers, spreadsheet calculations and GIMP (not necessarily at the same time) but there is definitely enough memory+swap to do that. Also my system did not show the signs of out-of-memory condition which esp. are freezing or very-slow reacting X server.

Flags: needinfo?(post)

I'm not sure what's going on here, but I could imagine this happening from a combination of fragmentation and bug 1774733, which could cause the memory required by the JSON to be 8x the actual size. This will only happen up to 128MB, at which time it'll switch to doubling instead. I could imagine that being a large enough contiguous allocation to run into problems with fragmentation on a 32bit system, even with 1.4GB available. The "... after some browsing" aspect makes fragmentation seem more likely.

I also notice that JS_Stringify is insisting on 16 bits chars, which could be another doubling on top for the memory that "should" be required for mostly or entirely ASCII data. (At least, I'm assuming that a sessionstore JSON blob, even if it contains Unicode titles, would still mostly be ASCII.)

I'm not that familiar with the browser code, but I'm guessing that all of this is happening in the parent process. Given that you seem to be very technically capable, would it be possible to get a copy of /proc/(pid)/maps and /proc/(pid)/smaps when it is misbehaving, where (pid) is the parent process pid? (You can get it from about:processes, it'll be the one listed at the top.) That may be too high a level to see the fragmentation, since mozjemalloc might grab a large mapping and then fragment its allocations internally, but it could still give some useful hints.

At any rate, the main part of this bug does seem to be about the OOM within the JS engine code. It also seems to me that the behavior when an OOM is hit here is suboptimal, but that appears to be (the stalled) bug 1034038.

Component: Session Restore → JavaScript Engine
Flags: needinfo?(post)
Product: Firefox → Core
Flags: needinfo?(wmedina)
Blocks: sm-runtime
Severity: -- → S3
Priority: -- → P3

When it comes to memory contents (and when I was quick reading the needinfo request I thought this was about it), I am cautious because of privacy.
So I set up another profile with no extensions active, opened some "normal" sites and looked whether the error would happen. By good look it happened in a more or less reproducible way. Steps to reproduce follow. It takes about 20 minutes to reproduce unfortunately. Maybe it could be automated:

(0. Restart Firefox)
.1. Install modified mjs files into browser/omni.ja to see custom error window
.2. Set sessionrestore intervals to very short time so that the error window will be shown soon after the memory failure happens.
.3. Disable ad blocker for www.innotrans.de which will make cross-origin requests
.4. Open https://www.innotrans.de/en/visit/exhibitor-directory/#/search/f=h-entity_orga;st=2;v_sg=0;v_fg=0;v_fpa=FUTURE
.5. Scroll down, at the end of the list click "Show more"
.6. Repeat this ~175 times, loading further 25 entries each time
.7.? Click on one company to see detail window - maybe not needed
.8. After having loaded ~2000 entries (step 6. not completed), open a new tab with some other website.
.9. Maybe now the error window appears. Otherwise finish step 6. and repeat steps 4./5./6./7./8.
.10. Probably now the error window appears.
.11. Reinstate old browser/omni.ja in case you don't want to look at this bug any more.

Please note that in version 128 which I am using the error window has a fully transparent client area until you click on it or resize it (same with the "About Mozilla Firefox" window). Probably this is a graphic driver topic for my specific machine.

I will attach screenshots how to reproduce and the requested maps of the main process. I also captured the maps ca. 30-200 seconds after the first error happened as requested by you. They have following PID structure:

   firefox(7528)-+-Isolated Web Co(7803)
                 |-Isolated Web Co(7808)
                 |-Isolated Web Co(7812)
                 |-Isolated Web Co(7885)
                 |-Privileged Cont(7696)
                 |-Socket Process(7583)
                 |-Utility Process(7794)
                 |-Web Content(7909)
                 |-Web Content(11198)
                 |-Web Content(11421)
                 `-WebExtensions(7745)

The to be selected list items are marked red in the screenshot

Flags: needinfo?(post)

After reproduce step 2. (replace omni.ja) obviously the browser has to be restarted again, so you can 0. close the browser, 1. install omni.ja and start the browser.

Update: I was able to more or less reproduce the behavior on Firefox 130 on Linux 6.11.0. The omni.ja has to be updated. Inside the SessionFile.sys.mjs is the same, I just changed the count message and will upload the file hereafter. (I also noticed that the file names of the former versions incorporate a wrong date).
It seemed to me to be somewhat harder to reproduce, but it is definitely there. From time to time I invoked sudo cat /proc/$mainpid/maps | wc and found out that the number raised and felled, so there seems to be some kind of garbage collecting.

Attachment #9412848 - Attachment is obsolete: true
Attached file Recent diff SessionFile.sys.mjs.diff (obsolete) —
Attachment #9412850 - Attachment is obsolete: true
Attachment #9427943 - Attachment is obsolete: true
Attachment #9427944 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: