Closed Bug 878397 Opened 11 years ago Closed 11 years ago

pdf.js: High "explicit" (though low "resident") due to lots of inefficient mask arrays

Categories

(Firefox :: PDF Viewer, defect, P3)

22 Branch
x86
macOS
defect

Tracking

()

VERIFIED FIXED
Firefox 29
Tracking Status
firefox29 --- verified

People

(Reporter: augustmiles, Assigned: n.nethercote)

References

Details

(Whiteboard: [pdfjs-c-performance][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/4122)

Attachments

(2 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130528181031

Steps to reproduce:

1) Start firefox, so memory usage is low to begin
2) open "about:memory"
3) in a new tab: Open the small pdf file http://www.henle.de/media/review/0015.pdf (600kb)
4) switch to memory tab and update.

This is a comment on a Bach score and has a mixture of text and musical notation

more than 1GB of memory are used to render the page.

If I watch the activity monitor memory usage will sometimes stay high for several minutes, rending a Mac unusable for several minutes- impossible to swtich applications, even open the "force quite" window.


Actual results:

Machine becomes non responsive, beach balls, go to "about:memory"

1,727.82 MB (100.0%) -- explicit
├──1,247.53 MB (72.20%) -- window-objects
│  ├──1,211.97 MB (70.14%) -- top(http://www.henle.de/media/review/0015.pdf, id=25)
│  │  ├──1,205.29 MB (69.76%) -- active/window(http://www.henle.de/media/review/0015.pdf)
│  │  │  ├──1,198.55 MB (69.37%) -- js-compartment(resource://pdf.js/web/viewer.html, http://www.henle.de/media/review/0015.pdf)
│  │  │  │  ├──1,185.93 MB (68.64%) -- objects-extra
│  │  │  │  │  ├──1,185.61 MB (68.62%) ── elements/non-asm.js
│  │  │  │  │  └──────0.32 MB (00.02%) ── slots
│  │  │  │  └─────12.62 MB (00.73%) ++ (6 tiny)
│  │  │  └──────6.74 MB (00.39%) ++ (4 tiny)
│  │  └──────6.68 MB (00.39%) ++ js-zone(115984800)
│  ├─────18.63 MB (01.08%) ++ top(http://www.henle.de/en/detail/index.html?Title=The+Well-Tempered+Clavier+Part+I+BWV+846-869_15, id=14)
│  └─────16.94 MB (00.98%) ++ (4 tiny)



1,337.90 MB (100.0%) -- js-main-runtime
├──1,252.50 MB (93.62%) -- compartments
│  ├──1,192.87 MB (89.16%) -- objects-extra
│  │  ├──1,186.50 MB (88.68%) ── elements/non-asm.js
│  │  └──────6.37 MB (00.48%) ++ (2 tiny)
│  ├─────45.46 MB (03.40%) -- gc-heap
│  │     ├──24.40 MB (01.82%) ++ objects
│  │     ├──17.00 MB (01.27%) ++ shapes
│  │     └───4.07 MB (00.30%) ── scripts
│  └─────14.17 MB (01.06%) ++ (10 tiny)
├─────64.76 MB (04.84%) -- zones
│     ├──54.79 MB (04.10%) -- gc-heap
│     │  ├──27.10 MB (02.03%) -- strings
│     │  │  ├──19.62 MB (01.47%) ── short
│     │  │  └───7.48 MB (00.56%) ── normal
│     │  ├──25.22 MB (01.89%) ── unused-gc-things
│     │  └───2.47 MB (00.18%) ++ (3 tiny)
│     └───9.97 MB (00.75%) ++ (3 tiny)
├─────18.04 MB (01.35%) ── runtime
└──────2.60 MB (00.19%) ++ gc-heap


Expected results:

Much lower memory usage as per chrome/preview or other
pdf viewers.
I tested and the initial memory usage was indeed high in Nightly.
I only got it as high as ~500MB. Waited a bit, measured again and it dropped to ~120MB. Scrolled through the entire document then it dropped to ~58MB.

Tested again:
* Starts low.
* Grows as high as ~590MB
* Waited 20 seconds without having the pdf in focus
* Measured it again at ~20MB
* Scrolled through the document and measured at ~104MB
* Waited for GC and memory usage ends at ~58MB.


Is this the same behavior you're experiencing or memory usage doesn't drop?
Component: Untriaged → PDF Viewer
Whiteboard: [MemShrink]
Provide the full memory log, there is a button to save the log.
Attached file memory dump
requested memory dump from about:memory
loic- The memory dump is only in firefox23, so I moved up from 22 to 23.

Jesper- if I have the about:memory tab open the memory goes down after
about 20/30 seconds.
If however I just open the pdf and scroll then the "activity montitor" (ie top)
gives me memory going up to 1.3GB. I then leave firefox and memory
remains at 1.3G for many minutes. The machine is then dreadfully unresponsive-
beachballing when using other programs.

As soon as I open a new tab the memory drops back to a few hundred mega in top.

Thus to reproduce the problem you must open the about:memory tab before 
reading the pdf, otherwise the new tab seems to trigger a memory event.
Priority: -- → P3
Whiteboard: [MemShrink] → [MemShrink][pdfjs-c-performance]
Blocks: 881974
Whiteboard: [MemShrink][pdfjs-c-performance] → [pdfjs-c-performance]
I have seen similar memory usage for small PDFs consisting of scanned images. I can upload a sample if needed.
I am also seeing extremely high memory usage of 1GB or more when viewing PDF files of scanned images.  See further details at https://github.com/mozilla/pdf.js/issues/3630.  Please let me know if I can provide more information to help resolve this issue.
I've started investigating this. The first thing I found is something odd in about:memory's numbers:

1,152.41 MB (100.0%) -- explicit
├──1,022.56 MB (88.73%) -- window-objects
│  ├────954.06 MB (82.79%) -- top(file:///home/njn/bach.pdf, id=8)
│  │    ├──948.90 MB (82.34%) -- active/window(file:///home/njn/bach.pdf)
│  │    │  ├──942.93 MB (81.82%) --
js-compartment(resource://pdf.js/web/viewer.html, file:///home/njn/bach.pdf)
│  │    │  │  ├──942.14 MB (81.75%) -- objects
│  │    │  │  │  ├──934.63 MB (81.10%) -- malloc-heap
│  │    │  │  │  │  ├──932.89 MB (80.95%) ── elements/non-asm.js
...
    509.42 MB ── resident
    509.98 MB ── resident-unique
    1,868.29 MB ── vsize

It's *really* strange to see "resident" so much smaller than "explicit"; they're normally much closer. My guess is we're allocating a really big array but never touching most of it, so it doesn't get page into physical memory.

----

I added some diagnostic printfs to jemalloc and found that we're doing hundreds of allocations of three sizes: 13.77 MiB, 11.21 MiB, and 19.89 MiB. With gdb I determined these were due to typed array allocations, and the typed arrays had 1-byte elements (i.e. Int8Array or Uint8Array or Uint8ClampedArray). I then added diagnostic printfs to pdf.js and pdf.worker.js and discovered that PDFImage.createMask() in pdf.worker.js is entirely responsible for these. Also, there are four distinct calling patterns, each of which happens multiple times:

( 1)     21 (47.7%, 47.7%): createMask: h=2833, w=1037
( 2)     18 (40.9%, 88.6%): createMask: h=2550, w=1416
( 3)      3 ( 6.8%, 95.5%): createMask: h=2741, w=1037
( 4)      2 ( 4.5%,100.0%): createMask: h=3683, w=1416

These correspond to the seen amounts, if we do h*w*4.

Now, the only caller of PDFImage.createMask is this code:

  operatorList.addOp(OPS.paintImageMaskXObject,
    [PDFImage.createMask(imgArray, width, height, inverseDecode)]
  );

Just for a laugh I replaced the second argument with []. I thought it would totally bomb but the PDF actually renders fairly well -- the text looks fine but the musical notation is missing, and the load throbber never stops spinning. Intriguing, nonetheless.
AFAICT, PDFImage_createMask takes an array that encodes 1 bit of (transparency?) information per pixel of an image, and converts that into an array that is 32x bigger. E.g. we take something that's 0.43MB and re-encode it so it takes 13.77 MB. This is extraordinarily inefficient, and surely we can do something smarter.
> Just for a laugh I replaced the second argument with []. I thought it would
> totally bomb but the PDF actually renders fairly well -- the text looks fine
> but the musical notation is missing, and the load throbber never stops
> spinning. Intriguing, nonetheless.

I forgot to mention that the memory usage was drastically lower when I tried this.
Summary: More the 1GB of memory for a small pdf file: pdf.js → pdf.js: High "explicit" (though low "resident") due to lots of inefficient mask arrays
Status: UNCONFIRMED → NEW
Ever confirmed: true
As best as I can tell:

- After PDFImage_createMask expands the compact array into the non-compact array, it puts it onto the operator list which eventually gets passed from the worker to the main thread (implemented in pdf.js) via a 'RenderPageChunk' message.

- And then it ends up being passed to CanvasGraphics_paintImageMaskXObject(), and then onto putBinaryImageData().

- putBinaryImageData() then creates an ImageData object (tmpImgDataPixels) and copies the non-compact array's data into it, and then calls putImageData() on tmpImgDataPixels.

One idea is to delay the expansion, and instead pass the compact array to the main thread, and then unpack it directly into the ImageData object. I.e. the non-compact array is just an inefficient intermediate, so remove it altogether. However, that requires also passing |inverseDecode|.

Another idea is to create the ImageData object immediately and pass it to the main thread, instead of passing the non-compact array.
> Another idea is to create the ImageData object immediately and pass it to
> the main thread, instead of passing the non-compact array.

Hmm, but creating the ImageData requires access to the maskCanvas, which the worker doesn't have.
> One idea is to delay the expansion, and instead pass the compact array to
> the main thread, and then unpack it directly into the ImageData object.

Hmm, but IIRC to do this the compact array must be a TypedArray.
Here's a rough draft of a patch that copies the compact alpha data into a typed
array, so it can be passed from the worker to the main thread, and expands it
directly into the ImageData object on the main thread. The two documents (see
comment 0 and comment 11) I've been testing appear to display correctly. My
rough measurements show that it reduces "explicit" memory usage greatly, though
I'm out of time today to do precise measurements.

I don't really have any idea what I'm doing, so please let me know if this
seems reasonable.
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Attachment #8359053 - Flags: feedback?(bdahl)
I've done some measurements.
> I've done some measurements.

... and here they are.

First, my process.
- Start Firefox
- Start |peak-rss|, a script which measures RSS ten times a second
- Load the PDF, scroll quickly to the bottom then back to the top
- Open about:memory and hit measure, and look at the "elements/non-asm.js"
  entry for the pdf.js compartment

0015.pdf
--------
            RSS-peak    JS-elements-peak
before      ~ 650 MiB   ~1100 MiB
after       ~ 540 MiB   ~ 400 MiB

37355_1_A.pdf
-------------
            RSS-peak    JS-elements-peak
before      ~ 800 MiB    ~800 MiB
after       ~2400 MiB   ~1000 MiB


The JS-elements-peak results are excellent. The RSS-peak results are less
impressive. It's weird that the non-compact arrays don't contribute much to
RSS... even if big chunks of them aren't written to, when they are read I would
expect them to page in. Maybe the Linux kernel is being really clever and is
managing to not page in all-zero pages when they are read. But this behaviour
might not occur on other platforms.

bdahl suggested that the non-compact array is created in the worker thread to
avoid janking the main thread, which complicates things. We might be able to
speed up the unpacking loop by handling 8 bits at a time, since it seems common
that all 8 bits will have the same value.
Oh, this is interesting: I tried flipping inverseDecode just to see what happens.  Various images appear inverted (white-on-black instead of black-on-white). Also, the peak-RSS for 0015.pdf (without my patch) jumps to 1200 MiB and for 37355_1_A.pdf is jumps to 2400 MiB. So my theory about the kernel optimizing zero pages seems correct.
This works well on the cases I've been testing.

Yury, can you please take this over? I don't know how to test and land pdf.js
patches. Thanks!
Attachment #8359544 - Flags: review?(ydelendik)
Attachment #8359053 - Attachment is obsolete: true
Attachment #8359053 - Flags: feedback?(bdahl)
BTW, when/how do pdf.js updates get imported into Firefox? I see various bugs like bug 947260 for updating -- when do they occur? This is a pretty big memory win, it'd be great to get it into Firefox quickly once it lands in the git repo.
When I feel like it or when I'm asked. Given that I'm following the progress on this bug, I'll be sure to get a patch up as soon as it's landed :)
Oh, wait! I was making the new array 8x bigger than necessary, because I
computed its length as width * height, even though we only need 1 bit per
pixel.

With this change, I'm struggling to get peak-RSS higher than ~400 for both
files. Cool!

(BTW, in comment 17 I accidentally switched the before and after lines for
37355_1_A.pdf.)
Attachment #8359574 - Flags: review?(ydelendik)
Attachment #8359544 - Attachment is obsolete: true
Attachment #8359544 - Flags: review?(ydelendik)
> (BTW, in comment 17 I accidentally switched the before and after lines for
> 37355_1_A.pdf.)

Actually, I totally botched the results for 37355_1_A.pdf.  Here are new results for both files:


0015.pdf
--------
            RSS-peak    JS-elements-peak
before      ~650 MiB    ~1100 MiB
after       ~400 MiB    ~ 200 MiB

37355_1_A.pdf
-------------
            RSS-peak    JS-elements-peak
before      ~800 MiB    ~2400 MiB
after       ~400 MiB    ~ 500 MiB


Pretty good! Shrinking those big arrays by 32x really makes a difference.
Even with the fix, there's still some inefficiency.  For 0015.pdf, the vast majority of the typed array allocations remaining follow this pattern:

- buildPaintImageXObject() calls DecodeStream_getBytes() with a length of 451704.
- It calls DecodeStream.ensureBuffer, which rounds that up to the nearest power of two (524288), and allocates a Uint8Array of that rounded-up size.
- buildPaintImageXObject() calls createMask(), which allocates a Uint8Array of length 451704 and copies (and possibly inverts) the data from the first UInt8Array into it.

So, two inefficiencies:

- The rounding up to a power of two isn't necessary.

- It would be even better if we could decode (and possibly invert) directly into the second typed array. But this seems tricky, because there are several kinds of streams and we in some of them the first typed array might not be used.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #21)
> When I feel like it or when I'm asked. Given that I'm following the progress
> on this bug, I'll be sure to get a patch up as soon as it's landed :)

Yury landed in on github:
https://github.com/yurydelendik/pdf.js/commit/3f533a1cb04adbf422d58ce71eed6d4e851611ee

I think that's all we need to do here.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Hasn't actually been committed to the pdf.js repo yet. I assume Yury will submit a pull request for it soon, though :)
Will resolve this when it's merged to m-c from upstream.
Status: RESOLVED → REOPENED
Flags: needinfo?(ryanvm)
Resolution: FIXED → ---
Depends on: 960051
Comment on attachment 8359574 [details] [diff] [review]
Use a more compact typed array to pass the image mask from the worker to the main thread.

Looks good. The nbytes parameter can be calculated in the createMask function, changed and landed as https://github.com/yurydelendik/pdf.js/commit/3f533a1cb04adbf422d58ce71eed6d4e851611ee
Attachment #8359574 - Flags: review?(ydelendik) → review+
Whiteboard: [pdfjs-c-performance] → [pdfjs-c-performance][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/4122
Flags: needinfo?(ryanvm)
Bug 960051 merged this into m-c.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
Keywords: verifyme
Verified on Beta 29.0b6 (Build ID: 20140407135746) on Mac OS X 10.9.2 using STR from comment 17 and a memory usage script to get the RSS peak. Comparison in memory usage was done against the Firefox 29 Nightly build from January 11.
- RSS-peak 
     - drops by ~500 MB for both PDFs 
- JS-elements-peak 
     - 0015.pdf - drops from ~1100 Mb (Jan 11 Nightly) to ~250 Mb (29.0b6)
     - 37355_1_A.pdf - drops from ~2200 Mb (Jan 11 Nightly) to ~500 Mb (29.0b6)
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: