Closed Bug 962351 Opened 11 years ago Closed 10 years ago

pdf.js: if you scroll quickly through a scanned document, all the decoded pixel data ends up in memory

Tracking

()

Status:

RESOLVED INCOMPLETE

People

(Reporter: n.nethercote, Unassigned)

References

Details

(Whiteboard: [pdfjs-c-performance])

Nicholas Nethercote [inactive]

Reporter

Description

•

11 years ago

(https://github.com/mozilla/pdf.js/issues/1887 is open for this problem. This bug is for tracking the problem on the Mozilla side.)

Consider http://pbadupws.nrc.gov/docs/ML0909/ML090920467.pdf, a scanned PDF with 226 pages, and each page is approximately 34,000,000 bytes when represented in 32-bit RGBA form.

Follow these steps:
- Start measuring resident memory consumption (RSS), e.g. via top.
- Open the PDF.
- Scroll to the document's end as fast as possible by holding down the PgDn key.
- Wait for all the pages to be decoded and rendered.

On my fast desktop Linux machine, the waiting takes approximately one minute. More relevantly for this bug, RSS gradually climbs in an ascending sawtooth pattern to about 7,800 MiB, whether or not the patches I've written for this bug are applied. And guess what? 226 * 34,000,000 is over 7,328 MiB.

Now, there is a mechanism to discard the imageData: PDFPageProxy.complete(), which calls _tryDestroy(). However, somehow when you scroll quickly the intra-thread messaging ordering occurs in such a way that none of the pages get to complete() until all the subsequent pages have been displayed. The problem is with continueCallback. If defined, it gets called when a page's image is received by the main thread. At this point, the
page is almost done, and we'd really like to finish processing the last few ops
in order to reach complete(), whereupon the page's pixel data can be freed.

However, viewer.js tries to be clever. It makes continueCallback (which is
called after every 15ms of operation processing) return early if the page being
rendered isn't the "most important" page, which I guess is the page that is
currently in the viewer. So we end up postponing the completion of every page
prior to the final page until the final page is rendered, but that doesn't
happen until all the previous pages have sent their image data to the main
thread, because the image data sending is in sequential page order.

If we just ignore callback, the peak RSS is about 880 MB instead of 7,800. That's not a reasonable solution, but it shows the potential for improvement.

Nicholas Nethercote [inactive]

Reporter

Comment 1

•

10 years ago

Yury and I discussed this on IRC:

21:58	njn	yury: any thoughts on https://bugzilla.mozilla.org/show_bug.cgi?id=959925#c25 ?
22:01	yury	njn: continueCallback is needed for smooth scrolling
22:02	njn	yury: my patch wasn't intended to be the final solution, but to demonstrate the problem
22:03	yury	njn: yeah, we know that
22:03	yury	otherwise, on slow machine it will take long time to render the single page
22:03	njn	yury: do you have any thoughts about how to fix the high memory consumption without losing the responsiveness that continueCallback provides?
22:03	yury	and when we render single page at one time there is no problem
22:04	yury	njn: bdahl wanted to look into better prioritize render requests
22:05	njn	yury: sounds wise; currently requests get pushed to the back of the queue if they're not high priority; perhaps just pushing them back slightly would be better, if that's possible
22:06	njn	yury: I only partially understand the request/prioritization code
22:06	yury	so the bottom line: we have to give control to the main thread and continueCallback is doing that
22:10	yury	we need to limit amount of simultaneous requests send to the worker and have a way to abort existing
22:10	njn	yeah, sounds reasonable
22:10	njn	if you scroll way past a page, no point continuing to work on it
22:11	yury	not sure, when people use two-finger scroll they kinda scroll back and forward to find the page
22:12	njn	yury: hence the "way past"
22:12	yury	sure
22:13	yury	that means we have to break getOperatorList execution
22:13	yury	which is recursive by nature
22:14	njn	yury: I think this change is beyond my abilities, but hopefully one of you guys can move forward with it
22:14	yury	it's not simple task to do, yes
22:18	njn	yury: should I file an issue in GitHub for this?
22:18	yury	njn: there might be a way to trigger earlier clean up
22:18	yury	for images
22:18	njn	that might be good enough
22:23	yury	njn: https://github.com/mozilla/pdf.js/issues/1887
22:25	njn	yury: two years old :(
22:26	yury	njn: 1.5 :)

Nicholas Nethercote [inactive]

Reporter

Comment 2

•

10 years ago

Bug 962933 is an alternative approach that will help a lot with this, at least for some documents.

Brendan Dahl [:bdahl]

Updated

•

10 years ago

Whiteboard: [pdfjs-c-performance]

Nicholas Nethercote [inactive]

Reporter

Comment 3

•

10 years ago

Since https://github.com/mozilla/pdf.js/issues/1887 exists, there's not much point having this bug open.

Nicholas Nethercote [inactive]

Reporter

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

pdf.js: if you scroll quickly through a scanned document, all the decoded pixel data ends up in memory

Categories

(Firefox :: PDF Viewer, defect, P3)

Tracking

()

People

(Reporter: n.nethercote, Unassigned)

References

Details

(Whiteboard: [pdfjs-c-performance])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Updated