Closed Bug 959925 Opened 6 years ago Closed 6 years ago

pdf.js: too many copies of images

Categories

(Firefox :: PDF Viewer, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Firefox 29

People

(Reporter: njn, Assigned: njn)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Decoding and painting an image in pdf.js involves lots of copies of the data.

(a) pdf.worker.js is a JS worker. It first decodes the colour data from the PDF
    into an RGB array (24 bits per pixel), and the opacity data into an A array
    (8 bits per pixel). These two combined are effectively copy 1.

(b) pdf.worker.js combines the RGB and A arrays into a new RGBA typed array,
    which has 32 bits per pixel. This is copy 2. Copy 1 is now dead, and will
    be reclaimed by the next GC.

(c) pdf.worker.js transfers copy 2 to the main pdf.js thread. (This is a
    transfer, not a copy, possible because it's a typed array.)

(d) pdf.js does |d = createImageData()|, which ends up in
    mozilla::dom::CreateImageData(), which allocates a Uint8ClampedArray. This
    is copy 3. pdf.js copies the typed arrays (copy 2) into the ImageData (copy
    3). Copy 2 is now dead, and will be reclaimed by the next GC.

(e) pdf.js does |putImageData(d, 0, 0)|, which ends up in
    mozilla::dom::CanvasRenderingContext2D::PutImageData_explicit.
    
(f) PutImageData_explicit() creates a new gfxImageSurface() and copies in the
    data. This is copy 4, and it is short-lived.

(g) PutImageData_explicit() then calls CreateSourceSurfaceFromData(), creating
    a cairo surface. This is copy 5, and it is short-lived. Copy 3 is now dead,
    and will be reclaimed by the next GC(?)

Each copy can easily be multiple MB, and in the worst case (which is probably
common) we have five copies of the data alive at once.

So, ways to improve it.

- In steps (d) and (e) we can split the image up into smaller chunks, and then
  create/put those chunks one at a time. This will make copies 3, 4 and 5 much
  smaller.

- In steps (a) and (b) it is possible to decode the data directly into the RGBA
  array, which would merge copies 1 and 2. However, it's not easy, due to the
  various kinds of colour encodings and the possibility of images being
  resized.

If we did both of those, we end up with one full-size copy, and three copies of
each chunk, but two of the copies of each chunk would be very short-lived.
Hmm, typically you get bad memory consumption when scrolling quickly through a scanned document, i.e. when we're dealing with many pages in quick succession. No individual page is a problem, it's the handling of many pages that adds up; in particular, the longer-lived copies add up quickly. So chunking actually won't help much -- copies 4 and 5 are so short-lived they don't matter much, and copy 3 just gets split up into lots of smaller pieces, all of which hang around just as long as a single larger array would. (And in some ways they're worse.)

However, since an ImageData is basically just a wrapper around a Uint8ClampedArray, I wonder if it's possible to avoid making copy 3 by making copy 2 an ImageData object. Is an ImageData transferable from a worker? Alternatively, can you construct an ImageData from an existing Uint8ClampedArray?
> copy 3 just gets split up into lots of smaller pieces

Actually, that's not right, because a single ImageData object can be reused for each piece.

> Is an ImageData transferable from a
> worker? Alternatively, can you construct an ImageData from an existing
> Uint8ClampedArray?

AFAICT, the answer is "no" in both cases :(
(In reply to Nicholas Nethercote [:njn] from comment #2)
> > copy 3 just gets split up into lots of smaller pieces
> 
> Actually, that's not right, because a single ImageData object can be reused
> for each piece.
> 
> > Is an ImageData transferable from a
> > worker? Alternatively, can you construct an ImageData from an existing
> > Uint8ClampedArray?
> 
> AFAICT, the answer is "no" in both cases :(

worker.postMessage(imageData, [imageData.data.buffer]); should work.
> worker.postMessage(imageData, [imageData.data.buffer]); should work.

That will transfer the array buffer, but then we need to be able to create a new ImageData from an existing ArrayBuffer without doing copying, and that isn't possible.
Oh, we don't support ImageData constructor yet :(
Attached file imagedata_worker.html
This code works on Chrome but doesn't on Firefox :(
Looks like we break imageData.data.length when we passed around the imageData.
Depends on: 959957
Depends on: 959958
The imagedata_worker testcase seems to work just fine for me in a Mac 2014-01-11 nightly....
Ah, I tested it with Firefox 26. Indeed it worked in Nightly. Sorry for the confusion.
(In reply to Nicholas Nethercote [:njn] from comment #4)
> > worker.postMessage(imageData, [imageData.data.buffer]); should work.
> 
> That will transfer the array buffer, but then we need to be able to create a
> new ImageData from an existing ArrayBuffer without doing copying, and that
> isn't possible.

So we could pass around ImageData objects to workaround the absence of the ImageData constructor.
> So we could pass around ImageData objects to workaround the absence of the
> ImageData constructor.

Ah, so we can transfer ImageData objects:

  worker.postMessage(imageData, [imageData.data.buffer]);

This means "copy the |imageData|, except transfer the |imageData.data.buffer| property". Nice.

However, if I instead take the approach of calling createImageData/putImageData on chunks of the pixel data, then I don't need to transfer the ImageData. And the chunk approach (a) also causes copy 4 and copy 5 to be smaller, and (b) doesn't rely on features that browsers might not implement reliably. So I will pursue the chunk approach. But thanks for the information!
Duplicate of this bug: 931468
This patch implements the chunking of the createImageData/putImageData. Each
chunk is 16 rows of the image. Referring to comment 0, it greatly reduces the
size of copies 3, 4 and 5. Of these, copy 3 is the most important, because it's
a JS allocation and so lives until the next GC (i.e. quite some time).

For example, if we have a 2000x2000 image (not untypical, and smaller than
many), a full copy will take 2000*2000*4 = 16,000,000 bytes. In comparison,
each chunk is only 2000*16*4 = 128,000 bytes, which is 125x smaller.

On a 3 page document (file2.pdf from comment 12), this reduces peak physical
memory when scrolling through the entire document from ~315 MiB to ~265 MiB.
The savings on bigger documents will be larger.

It also, believe it or not, is marginally faster than the old code. I measured
the time taken for the putBinaryImageData() function to execute. In a browser
that supports set() and subarray(), it improved the time from ~70ms to ~50ms.
In a browser that doesn't support those functions, the time was about the same.

Yury: if you are happy with this patch, would you mind again handling the pull
request on the github side, and testing? And please feel free to do any
additional performance testing.
Attachment #8360785 - Flags: review?(ydelendik)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
No longer depends on: 959958, 959957
This patch avoids creating the opacity (A) array for images that don't have a
mask (which is the common case for scanned documents) by writing the opacity
values directly into the RGBA array. This avoids one quarter of copy 1 (from
comment 0).

I renamed |buf| to make it clear which buffer was the A buffer and which was
the RGBA one.
Attachment #8360874 - Flags: review?(ydelendik)
Comment on attachment 8360874 [details] [diff] [review]
Don't create the opacity buffer for images that lack a mask.

Review of attachment 8360874 [details] [diff] [review]:
-----------------------------------------------------------------

Unable to run tests. Please submit as a pull request to the https://github.com/mozilla/pdf.js with review items addressed.

::: browser/extensions/pdfjs/content/build/pdf.worker.js
@@ +28178,3 @@
>        var smask = this.smask;
>        var mask = this.mask;
> +      var aBuf;

Unneeded renaming of the buf variable

@@ +28224,5 @@
> +      }
> +
> +      if (aBuf) {
> +        for (var i = 0, ii = width * actualHeight; i < ii; ++i) {
> +          rgbaBuf[4*i + 3] = aBuf[i];

No need to multiply by 4 here. Introduce e.g. j that will be incremented by 4 and started from 3.
Attachment #8360874 - Flags: review?(ydelendik) → feedback+
Comment on attachment 8360785 [details] [diff] [review]
Do createImageData/putImageData in chunks, to save memory.

Review of attachment 8360785 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Unable to run tests. Please submit as a pull request to the https://github.com/mozilla/pdf.js for testing.
Attachment #8360785 - Flags: feedback+
Attachment #8360785 - Flags: review?(ydelendik)
> Unable to run tests.

What does that mean?

I setup a git clone and did the testing as described at https://github.com/mozilla/pdf.js/wiki/Contributing. It seems like the testing coverage for images isn't very good -- for the RGB changes I'm having to change getRgbBuffer for every sub-class of ColorSpace, and I have |throw| statements in the ones I haven't managed to test manually (which is most of them) and none of the test are hitting them. Am I missing something?
> > +      var aBuf;
> 
> Unneeded renaming of the buf variable

As I mentioned in comment 14, we now have two buffers in that function: the RGBA buffer, and the A buffer. I'd like it to be clear which is which.
(In reply to Nicholas Nethercote [:njn] from comment #18)
 > > Unneeded renaming of the buf variable
> 
> As I mentioned in comment 14, we now have two buffers in that function: the
> RGBA buffer, and the A buffer. I'd like it to be clear which is which.

Oh, I see 'a' stands for 'alpha' (thought it's an article)

(In reply to Nicholas Nethercote [:njn] from comment #17)
> I setup a git clone and did the testing as described at
> https://github.com/mozilla/pdf.js/wiki/Contributing. It seems like the
> testing coverage for images isn't very good -- for the RGB changes I'm
> having to change getRgbBuffer for every sub-class of ColorSpace, and I have
> |throw| statements in the ones I haven't managed to test manually (which is
> most of them) and none of the test are hitting them. Am I missing something?

Not sure. We have tests cases for the most of color spaces. Probably external PDF files or ref images are missing. Could you submit a pull request(s)?
I did the chunking and skip-RGB-and-A approaches as pull requests:
https://github.com/mozilla/pdf.js/pull/4138
https://github.com/mozilla/pdf.js/pull/4139
(In reply to Nicholas Nethercote [:njn] from comment #20)
> I did the chunking and skip-RGB-and-A approaches as pull requests:
> https://github.com/mozilla/pdf.js/pull/4138
> https://github.com/mozilla/pdf.js/pull/4139

And with those in place, in the common cases (no resizing, no opacity mask) we drop from 5 copies to approximately 1.03 copies of the data. (The 0.3 comes from the three slices that replace copies 3, 4 and 5.)
Depends on: 960051
The chunking has been merged to pdf.js master: https://github.com/mozilla/pdf.js/pull/4138
Hmm. Further measurements show that the above changes only make a small difference.

Consider http://pbadupws.nrc.gov/docs/ML0909/ML090920467.pdf, a scanned PDF with 226 pages, and each page is approximately 34,000,000 bytes when represented in 32-bit RGBA form.

Follow these steps:
- Start measuring resident memory consumption (RSS), e.g. via top.
- Open the PDF.
- Scroll to the document's end as fast as possible by holding down the PgDn key.
- Wait for all the pages to be decoded and rendered.

On my fast desktop Linux machine, the waiting takes approximately one minute. More relevantly for this bug, RSS gradually climbs in an ascending sawtooth pattern to about 7,800 MiB, whether or not the patches I've written for this bug are applied. And guess what? 226 * 34,000,000 is over 7,328 MiB.

In other words, even though I reduced five copies of each page's date down to 1, the four copies I removed were all short-lived, whereas the remaining copy is very long-lived.

I haven't yet worked out what is holding onto the arrays for each page, and whether it's deliberate or accidental. The data does get released eventually... if I wait a while, and then trigger "minimize memory usage" in about:memory, then RSS drops back to ~200 MB (but minimizing immediately doesn't do it). bdahl, do you know what controls this, and whether it's deliberate? I'm hoping that it's accidental, and that with some judicious nulling of some references that it'll become collectable much earlier.

Finally: evince is the native PDF viewer on Linux. When I use it to view the document, I can't get its RSS to exceed 150 MB. I don't expect pdf.js should be as memory-efficient as evince, but currently it's not even close -- 7,800 MiB for a 226 page document is ridiculous.
Flags: needinfo?(bdahl)
So there is a mechanism to discard the imageData: PDFPageProxy.complete(), which calls _tryDestroy(). However, somehow when you scroll quickly the intra-thread messaging ordering occurs in such a way that none of the pages get to complete() until all the subsequent pages have been displayed. In fact, if you scroll slowly through the document -- e.g. one page per second, which is enough to render each page fully before moving to the next one -- the peak RSS is about 1 GB.
I worked it out. The problem is with continueCallback. If defined, it gets
called when a page's image is received by the main thread. At this point, the
page is almost done, and we'd really like to finish processing the last few ops
in order to reach complete(), whereupon the page's pixel data can be freed.

However, viewer.js tries to be clever. It makes continueCallback (which is
called after every 15ms of operation processing) return early if the page being
rendered isn't the "most important" page, which I guess is the page that is
currently in the viewer. So we end up postponing the completion of every page
prior to the final page until the final page is rendered, but that doesn't
happen until all the previous pages have sent their image data to the main
thread, because the image data sending is in sequential page order.

The attached patch just ignores continueCallback. With it applied, I get a
peak RSS of about 880 MB instead of 7,800. As it stands the patch probably
isn't reasonable, but something needs to be done.
Attachment #8362353 - Flags: feedback?(ydelendik)
Attachment #8362353 - Flags: feedback?(bdahl)
> I worked it out. The problem is with continueCallback. 

Yury pointed me at https://github.com/mozilla/pdf.js/issues/1887, which covers this.
The three patches landed (squashed into one):
https://github.com/mozilla/pdf.js/commit/f7e354dfe54b215c2539735701923c02908293f3

I filed bug 962351 for the scrolling issue in comment 25.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(bdahl)
Resolution: --- → FIXED
Let's leave this open for bug 960051.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 949650
This has now been imported (bug 960051).
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
You need to log in before you can comment on or make changes to this bug.