Closed Bug 1906755 Opened 5 months ago Closed 4 months ago

Memory spikes (leak) in extensions process

Categories

(WebExtensions :: General, defect)

Firefox 128
defect

Tracking

(Not tracked)

RESOLVED MOVED

People

(Reporter: juraj.masiar, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached video memory_leak.mp4

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0

Steps to reproduce:

For the past few weeks I've been noticing a huge memory spikes.
They last about two minutes - in this period, Firefox is not responding well, the CPU is used a lot and memory consumption changes rapidly.

As you can see in the attached video, for one moment the memory jumps to 17GB, then goes back down to 10GB and after few minutes it goes back to 5GB (which is what I expect since I have a lot of things opened).

This is happening in random intervals, for example shortly after waking up from sleep or randomly during the day.

Few weeks back it created and created this report:
https://crash-stats.mozilla.org/report/index/fe18e162-447c-46d5-97f8-8f7c40240627

The Bugbug bot thinks this bug should belong to the 'WebExtensions::Untriaged' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Product: Firefox → WebExtensions

I've just remembered, I think this started shortly after I swapped my QHD monitor with 4K.
Which would explain, why I didn't see it happening in my work PC.

Hello,

Would you be able to provide some more detailed steps to reproduce by any chance?
Or maybe determine which extension you suspect might be causing this memory leak?

Flags: needinfo?(juraj.masiar)

Sadly, I can't even imagine a JavaScript code that would be able to allocate 10GB of RAM in a few seconds.
I mean, have you seen the video? The memory consumption is going up and down several GB within seconds.
Is there something I can run when I see it next time? Maybe some profiler?
And what about the crash report? Anything useful there?

Flags: needinfo?(juraj.masiar)

The crash report just confirms that the extension process crashed due to OOM (out of memory).

To help us with diagnosing this issue, could you capture a profile with profiler.firefox.com and share the profile? The profiler has a limited buffer, it will discard older data over time but still keep the most recent data, which should be sufficient for highlighting any issues here.

Flags: needinfo?(juraj.masiar)

It just happened again and I was able to capture the profile for a minute maybe, when it went up to 18GB.
And I managed to download it!
I've tried to upload it but it was still growing, then when it got above 40GB it crashed (I have 64GB RAM).

Can I upload the capture profile here? Or should I be worried about private data? It's my main browser full of private stuff :).
Rob, if you don't mind, I'll send it to you through PM in discourse.

Flags: needinfo?(juraj.masiar)

The redacted version of Juraj's recorded profile is at https://share.firefox.dev/3Wov2oH . It shows the WebExtensions + Parent Process with the Memory track enabled.

I can see that the number of API calls and event dispatches is not too strange, by clicking on "Parent Process" and filtering by "ExtensionParent". When I click in the blue areas within the "Parent Process" with the tab on "Call Tree", then most of the time "memcpy" is selected. The fact that so much time is spent there is a strong signal that the size of the underlying event data is significant.

Zooming in on a specific instance:

  • "WebExtensions", "Call Tree", click around 27.3 s in the graph (memcpy) with caller at _doSend, which is called ultimately via set (storage.session.set)
  • "Parent Process", "Marker Table", 28.366s: admin@fastaddons.com_GroupSpeedDial, api_call: storage.session.set - the graphs before show significant memcpy (first 258ms receiving via IPC, secondly 175ms from crossing IPC to internal message handler)
  • "Parent Process", "Marker Table", 28.366s: admin@fastaddons.com_GroupSpeedDial, api_event: storage.session.onChanged - this is when the parent wants to dispatch the event. The graphs show 940ms in memcpy in passing the event listener data (for onChanged) (Firefox source: get args()), 1182ms in memcpy for cloning the data in preparation for transfer over IPC (Firefox source: _doSend(), handled by nsFrameMessageManager::GetParamsForMessage), 1526ms in memcpy for the actual copying of the data for IPC (still in _doSend, handled by mozilla::dom::JSWindowActorParent::SendRawMessage).
  • So far, I can thus count about a half second for just copying memory for receiving the storage.session.set call, followed by 3.5 seconds to copy memory just for dispatching storage.session.onChanged. Not counting other processing time, this means that the parent process takes 4 full seconds for copying bytes to process your request to update storage.session. The memory spikes by 6 GB, until GC kicks in and frees most of it. I am really wondering how much data the extension is trying to store in practice
  • Then the WebExtensions process receives the data and dispatches the event. Just receiving and preparing for dispatching to the extension takes a bit over 2 seconds, spread over IPC and (lazy) deserialization of the data. The memory usage delta also peaks at almost 6 GB, of which 4 GB is immediately freed after GC. I wonder what happened with the 2 GB

So in short, the extension seems to be trying to save an excessive amount of data (this is a bug in the extension, and a bug in Firefox is that we don't enforce the 10 MB quota), and another issue is that somehow Firefox does not immediately free up the memory after receiving it.

Note that the extension seems to be alternating between storage.session.get+set and storage.session.onChanged, with some storage.local.get/set near the end of the profile. According to Juraj, the extension saves the last 90 "error" logs in storage.session, but it is not known what monstrosity is accounting for the gigabytes of values.

There is at least one actionable task here: We should enforce the documented quota for storage.session.set. Anything is better than the current unlimited quota.

Hello Rob,
My error logger stores an array of objects (up to 90), for example like this:

{
  "name": "takeScreenShot ",
  "context": "bc",
  "errName": "Error",
  "errMessage": "Invalid tab ID: 143",
  "err": "Error: Invalid tab ID: 143",
  "data": "{\"id\":143,\"index\":1,\"windowId\":1964,\"highlighted\":false,\"active\":false,\"attention\":false,\"pinned\":false,\"status\":\"complete\",\"hidden\":true,\"discarded\":false,\"incognito\":false,\"width\":1658,\"height\":926,\"lastAccessed\":1721242489752,\"audible\":false,\"autoDiscardable\":true,\"mutedInfo\":{\"muted\":true,\"reason\":\"extension\",\"extensionId\":\"admin@fastaddons.com_GroupSpeedDial\"},\"isInReaderMode\":false,\"sharingState\":{\"camera\":false,\"microphone\":false},\"successorTabId\":-1,\"cookieStoreId\":\"firefox-default\",\"url\":\"moz-extension://bdf368b4-8e8f-465a-9ac5-5517a6d89311/reload/reload.html\",\"title\":\"______________________________________________\",\"favIconUrl\":\"\"}",
  "time": 1721242492798,
  "stack": "u@moz-extension://bdf368b4-8e8f-465a-9ac5-5517a6d89311/background/background_gsd.js:2:105962\n5524/l/<@moz-extension://bdf368b4-8e8f-465a-9ac5-5517a6d89311/background/background_gsd.js:2:106180\n"
}

The "data" part may vary, depending on the error. So I can imagine, I'm by accident logging out something bigger, like a whole screenshot (my addon has a "Auto-reload" feature that I use a lot, that updates thumbnails of pages regularly, for example weather radar info).

And when we chat about a similar thing, where multiple get/set operations would cause a big memory consumption, you've mentioned it's because it's multiplied by the number of handlers. It turns out, it's also heavily multiplied by the size of the data.

So did an experiment now, and my Nightly managed to consume almost 20GB of RAM, just by running 20 get/set operations!
STR:

  1. install my speed dial: https://addons.mozilla.org/addon/groupspeeddial/
  2. open new tab
  3. execute this:
(async () => {
  console.time('memory leak experiment');
  for (let i = 0; i < 20; i++) {
    const {tmp = []} = await browser.storage.session.get('tmp');
    const data = {
      name: 'memory leak experiment',
      time: Date.now(),
      stack: Error().stack,
      screenshot: await browser.tabs.captureVisibleTab(),  // see, I'm "accidentally" logging a whole screenshot!
    };
    await browser.storage.session.set({tmp: tmp.concat(data).slice(-100)});
  }
  console.timeEnd('memory leak experiment');
})();

Note that after running this, there is "only" about 100MB of data in the session store:

JSON.stringify(await browser.storage.session.get()).length
101042504 

Here is a profiler capture:
https://share.firefox.dev/3yazIFi

This would explain, why it doesn't happen at my work PC, where I don't have 4K monitor (so screenshots are much smaller). Also these issues started after I upgraded to 4K monitor at home.

Depends on: 1908925

(In reply to juraj.masiar from comment #8)

Hello Rob,
My error logger stores an array of objects (up to 90), for example like this:

 { ...
  "data": "... favIconUrl\":\"data:image/gif;base64,... ",

Looks like you may also be storing tabs.Tab types. The faviconUrl field of these can also be relatively large. I wish that we did not put data:-URLs there, but a smaller URL; any work on that is tracked at bug 1315616.

The "data" part may vary, depending on the error. So I can imagine, I'm by accident logging out something bigger, like a whole screenshot (my addon has a "Auto-reload" feature that I use a lot, that updates thumbnails of pages regularly, for example weather radar info).

And when we chat about a similar thing, where multiple get/set operations would cause a big memory consumption, you've mentioned it's because it's multiplied by the number of handlers.

This is referring to bug 1842042. Although not the case in the original report here, Juraj tried to reproduce the issue before, in an extension that did happen to have multiple storage.onChanged / storage.session.onChanged listeners.

It turns out, it's also heavily multiplied by the size of the data.
...

So did an experiment now, and my Nightly managed to consume almost 20GB of RAM, just by running 20 get/set operations!

The test case calls storage.session.get, tabs.captureVisibleTab, and storage.session.set with the new data. 20x.
The profile (https://share.firefox.dev/3yazIFi) shows that the delta in the WebExtensions process is about 2 GB, and about 8 GB delta in the parent process (eventually GC'd).

Note that after running this, there is "only" about 100MB of data in the session store:

This is too much, I filed bug 1908925 about enforcing a quota.

This would explain, why it doesn't happen at my work PC, where I don't have 4K monitor (so screenshots are much smaller). Also these issues started after I upgraded to 4K monitor at home.

What is the size of the data from an individual tabs.captureVisibleTab call?

Blocks: webext-perf
Status: UNCONFIRMED → NEW
Component: Untriaged → General
Ever confirmed: true
See Also: → 1315616, 1842042

Each screenshot is about 900KB.
But I've just realized, while I was experimenting with it, I have executed it multiple times (trying to log different kinds of data to see the impact), each time in a new tab to "clean up" the old context... not realizing that the session storage is shared :), and already full of last 100 logs.
So it did run only 20 times, but each run updated about 100MB of data.
So reducing it to 10MB should help a lot!

I still can't believe I'm logging a whole screenshot though...
In any case, I've refactored my logger now, I've added size check, removed some logs, and reduced total items to 50.
In the past, my error logger lived as a simple array variable in the persistent background script, I've moved it to session store because of Chrome MV3.
Not fully realizing how expensive it can be, especially due to onChange handlers... I think will revert it back to simple array variable, just to be sure.

Thank you Rob for your help!

See Also: → 1910660

I'm closing this bug because I identified the actionable parts and split them off this bug:

  • bug 1908925 : enforce quota in storage.session - this is the immediate trigger for this bug. A bug in the add-on resulted in large amounts of data being written to storage.session, and a quota would have mitigates the impact on Firefox.
  • bug 1842042 : reduce overhead when dispatching onChanged events.
  • bug 1910660 : reduce overhead of extension event dispatch and API calls.

Not fully realizing how expensive it can be, especially due to onChange handlers... I think will revert it back to simple array variable, just to be sure.

In the future, if you encounter the need to update a bounded array (a ring buffer essentially), what you could do is to store each item at a key, and maintaining one key to keep track of the current index. This enables you to insert items in the list without having to read+write all data at once. As an example, the layout of the storage (of size 50) could be as follows:

- index: 0  // 0 until 49, then wrap around to 0.
- item_0: ...
- item_1: ...
- ...
- item_49: ...
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → MOVED

Thanks a lot Rob!
And great idea with the ring buffer storage! I've actually used something similar before for storing list of data in storage.sync to avoid conflicts (since Firefox is a bit slow synchronizing it).

See Also: → 1913018
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: