Lots of memory used for activity stream PNGs encoded as data URIs

RESOLVED FIXED in Firefox 62

Status

()

defect
P1
normal
RESOLVED FIXED
a year ago
10 months ago

People

(Reporter: mccr8, Assigned: imjching)

Tracking

(Blocks 1 bug)

unspecified
Firefox 62
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox62 fixed)

Details

(Whiteboard: [MemShrink:P1][fxperf:p3])

Attachments

(2 attachments)

(Reporter)

Description

a year ago
I was looking in about:memory, and I see megabytes of strings that look like they are based 64 encoded data URIs for PNGs. The largest one is 1MB. I'm not entirely sure what all of these are, but I extracted the largest string, and another smaller one, pasted them into my URL bar, and they were both images from my about:newtab page.

edited down, the about:memory entries look like this:
261.90 MB (100.0%) -- explicit
├───58.07 MB (22.17%) ++ window-objects
├───46.15 MB (17.62%) -- js-non-window
│   ├──30.12 MB (11.50%) -- zones
│   │  ├──26.46 MB (10.10%) -- zone(0x7f97bfe79000)
│   │  │  ├───9.20 MB (03.51%) ++ compartment([System Principal], shared JSM global)
│   │  │  ├───7.22 MB (02.76%) ++ (22 tiny)
│   │  │  ├───7.04 MB (02.69%) -- strings
│   │  │  │   ├──1.64 MB (00.62%) ++ string(<non-notable strings>)
│   │  │  │   ├──1.00 MB (00.38%) ++ string(length=573258, copies=1, "data:image/png;base64,[...]" (truncated))
│   │  │  │   ├──0.50 MB (00.19%) ++ string(length=286406, copies=1, "data:image/png;base64,[...]" (truncated))
│   │  │  │   ├──0.50 MB (00.19%) ++ string(length=313370, copies=1, "data:image/png;base64,[...]" (truncated))
│   │  │  │   ├──0.50 MB (00.19%) ++ string(length=393482, copies=1, "data:image/png;base64,[...]" (truncated))
│   │  │  │   ├──0.50 MB (00.19%) ++ string(length=480114, copies=1, "data:image/png;base64,[...]" (truncated))
│   │  │  │   ├──0.25 MB (00.10%) ++ string(length=143278, copies=1, "data:image/png;base64,[...]" (truncated))

These images shouldn't be kept around in such a memory inefficient format.
I've seen similar entries recently, for example in bug 1427594, comment 6.
Flags: needinfo?(usarracini)
Getting rid of data URIs are definitely something we've wanted to do for a while. Bug 1184701 would let us use the moz-page-thumbs protocol which would mean we could just use img  tags with the src being the URL, and then we could ditch all these data URIs. In order for us to make the protocol work in the content process and with Activity Stream, we'd need to sort out bug 1385306 first.  If bug 1385306 gets done and we want Activity Stream working with the protocol, we need to get moz-page-thumbs:// working with either a codeBase principal or a null principal, which means we need to let the protocol be loadable by anyone or somehow special case about:newtab, which brings into question security. There's still a lot of questions around what the best approach is here... maybe using indexedDB and Blobs would avoid all of this? 

Overall there's a non trivial amount of work to get us into a state where we don't use data URIs but it's on our radar and we definitely want to do this.
Flags: needinfo?(usarracini)
Priority: -- → P3

Comment 3

a year ago
Potentially we can avoid some message passing of the full data uris if we cache them in indexedDB. Although now there would be copies of the data uri there and additional messaging and caching overhead.
(In reply to Ursula Sarracini (:ursula) from comment #2)
> Getting rid of data URIs are definitely something we've wanted to do for a
> while. Bug 1184701 would let us use the moz-page-thumbs protocol which would
> mean we could just use img  tags with the src being the URL, and then we
> could ditch all these data URIs. In order for us to make the protocol work
> in the content process and with Activity Stream, we'd need to sort out bug
> 1385306 first.  If bug 1385306 gets done and we want Activity Stream working
> with the protocol, we need to get moz-page-thumbs:// working with either a
> codeBase principal or a null principal, which means we need to let the
> protocol be loadable by anyone or somehow special case about:newtab, which
> brings into question security. There's still a lot of questions around what
> the best approach is here... maybe using indexedDB and Blobs would avoid all
> of this? 
> 
> Overall there's a non trivial amount of work to get us into a state where we
> don't use data URIs but it's on our radar and we definitely want to do this.

Can you just use blobs?
Whiteboard: [MemShrink] → [MemShrink:P1]
Whiteboard: [MemShrink:P1] → [MemShrink:P1][fxperf]
Whiteboard: [MemShrink:P1][fxperf] → [MemShrink:P1][fxperf:p3]

Updated

a year ago
Duplicate of this bug: 1450187
ni?ing over to ursula for comment 4
Flags: needinfo?(usarracini)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #4)
> Can you just use blobs?


Can you expand on what you mean here? Can we send blobs over IPC? Store the blobs in indexedDB?
Flags: needinfo?(usarracini) → needinfo?(erahm)
(In reply to Ursula Sarracini (:ursula) from comment #7)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #4)
> > Can you just use blobs?
> 
> 
> Can you expand on what you mean here? Can we send blobs over IPC? Store the
> blobs in indexedDB?

I'm not really a blob expert, but yes IPC supports them [1 for example] and IDB seems to as well [2].

This is a somewhat old mozilla hacks article [3], but I think it covers the basics of blob support in idb and using createObjectURL to get a URL you can use to map an img's src attribute to a blob.

Let me know if I can help out further, odds are I'll redirect you to someone with more knowledge though :)

[1] https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/dom/file/ipc/IPCBlobUtils.h#12-210
[2] https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/dom/indexedDB/ActorsParent.cpp#6708-6736
[3] https://hacks.mozilla.org/2012/02/storing-images-and-files-in-indexeddb/
Flags: needinfo?(erahm)
Blob enthusiast here.  They key features of Blobs are that they are a handle on immutable cross-global memory that, under ideal conditions, does not duplicate the memory used when sent via postMessage().  That is, if you postMessage(k1MegabyteString) to 10 separate globals, you will end up with 1 megabyte used in each global.  You will also put 1 megabyte of data over the wire if IPC is involved.  If you do that a second time, each of those globals will end up with ANOTHER copy of the string and there will be the same wire overhead.  However, if you do `const blobString = new Blob([k1MegabyteString])` and postMessage that blob:
  - Bad news: creation of the blob will probably create its own copy of the string.  So you need to drop references to the string and let it be GC'ed if you want to be a responsible memory user.
  - Good news: Under ideal situations, at most one copy of that backing memory will end up in each process, and the wire size of the blob for IPC is super tiny.  Because when you postMessage the Blob around, you are just postMessaging a handle to the underlying memory.
  - Also good news: Many APIs are happy to provide you with a Blob directly so you don't need to round-trip through JS strings or ArrayBuffers.  For example, HTMLCanvasElement.toBlob() is more efficient than toDataURL().  If you need to manipulate Blobs, you can create composite Blobs from bits and pieces of other blobs using Blob.slice() and new Blob(["a string maybe", anArrayMaybe, otherBlob, otherOtherBlob.slice(100, 200)]).

You may have detected some hand-waving there about "under ideal conditions".  The short story of that is:
- IndexedDB-sourced Blobs are perfectly ideal.  They are references to on-disk files.  Passing them around never duplicates anything in memory.  The catch here is that if you do an IDB put({ fooBlob }) and then an IDB get() that returns `diskBackedFooBlob`, `fooBlob` is still a memory-backed Blob, not the ideal disk-backed `diskBackedFooBlob`.
- Because of Blob/File semantics, it is possible for us to further optimize things so that everything is an ideal condition.  For example, we have a bug on file to morph `fooBlob` so that it is the same as `diskBackedFooBlob` after the IDB transaction commits.  I think there may also be some cross-process optimizations for memory-backed Blobs that IDB already does that we could adopt in general.  Additionally, we have a mechanism to spill huge blobs generated by XHR and MediaRecorder to temporary files on disk, etc. etc.
One last thing to mention that may be relevant here: IndexedDB file-backed Blobs are reference counted over all the values stored in an IndexedDB database and in memory which means free data de-duplication and cleanup.

That is:
  const sitePic = await asyncCanvasToBlobUsingPromises(canvas);
  idb.put({ key: "visit-1", image: sitePic });
  // later, using either the same sitePic from memory or retrieved from IDB
  idb.put({ key: "visit-2", image: sitePic });

On disk, we store both values as structured clones that reference the single on-disk Blob.  Once both values are removed from the database and all in-memory references to the IndexedDB-sourced Blobs are forgotten, we will purge the sitePic from disk.  This is done using SQLite transactions and is fully atomic.

This means that if you're dealing with large images that may be referenced by multiple object structures, you don't need to build your own management layer that does its own reference counting or garbage collection sweeps or dealing with edge-cases related to crashes while your accounting was mutating something non-atomically.  You get it for free.  Plus, you don't have to worry if there are any in-memory references to the picture that might break/etc.

Caveats:
- De-duplication happens based on object identity, not content-addressible hashing or something magic.  If you new Blob(), you're going to get another copy of the Blob on disk.
- Each IDB database is its own island.  If you get() a blob from IDB database A and put() it into IDB databse B, there will be two copies on disk.
This 900MB is basically all the memory used in that Content Process....  also, what's up with two about:newtabs with different Id's?  (perhaps a newtab is actually open) - and what's with the 3rd copy in js-non-window??

Updated

11 months ago
Duplicate of this bug: 1452995

Updated

11 months ago
Assignee: nobody → jlim
Iteration: --- → 62.2 - Jun 4
Priority: P3 → P1

Updated

11 months ago
Blocks: 1445085

Comment 14

11 months ago
> what's up with two about:newtabs with different Id's?  (perhaps a newtab is actually open) - and what's with the 3rd copy in js-non-window??
about:newtab is preloaded per window so that the next new tab will be immediately shown. mconley has been working on ways to be smarter about when to preload in bug 1353013.
See Also: → 1353013

Comment 15

11 months ago
Commits pushed to master at https://github.com/mozilla/activity-stream

https://github.com/mozilla/activity-stream/commit/fc7a8a59059136ff23a25f9fecb2cf8df0d3180f
Bug 1436615 - Lots of memory used for activity stream PNGs encoded as data URIs

Signed-off-by: Jay Lim <jlim@mozilla.com>

https://github.com/mozilla/activity-stream/commit/a645823cbed8b3a202a25d9f9b5027d2a4dbe860
Merge pull request #4150 from imjching/master

Bug 1436615 - Lots of memory used for activity stream PNGs encoded as data URIs
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → FIXED

Updated

11 months ago
Blocks: 1466971

Updated

10 months ago
Blocks: 1467485
You need to log in before you can comment on or make changes to this bug.