Closed Bug 962351 Opened 6 years ago Closed 5 years ago
.js: if you scroll quickly through a scanned document, all the decoded pixel data ends up in memory
(https://github.com/mozilla/pdf.js/issues/1887 is open for this problem. This bug is for tracking the problem on the Mozilla side.) Consider http://pbadupws.nrc.gov/docs/ML0909/ML090920467.pdf, a scanned PDF with 226 pages, and each page is approximately 34,000,000 bytes when represented in 32-bit RGBA form. Follow these steps: - Start measuring resident memory consumption (RSS), e.g. via top. - Open the PDF. - Scroll to the document's end as fast as possible by holding down the PgDn key. - Wait for all the pages to be decoded and rendered. On my fast desktop Linux machine, the waiting takes approximately one minute. More relevantly for this bug, RSS gradually climbs in an ascending sawtooth pattern to about 7,800 MiB, whether or not the patches I've written for this bug are applied. And guess what? 226 * 34,000,000 is over 7,328 MiB. Now, there is a mechanism to discard the imageData: PDFPageProxy.complete(), which calls _tryDestroy(). However, somehow when you scroll quickly the intra-thread messaging ordering occurs in such a way that none of the pages get to complete() until all the subsequent pages have been displayed. The problem is with continueCallback. If defined, it gets called when a page's image is received by the main thread. At this point, the page is almost done, and we'd really like to finish processing the last few ops in order to reach complete(), whereupon the page's pixel data can be freed. However, viewer.js tries to be clever. It makes continueCallback (which is called after every 15ms of operation processing) return early if the page being rendered isn't the "most important" page, which I guess is the page that is currently in the viewer. So we end up postponing the completion of every page prior to the final page until the final page is rendered, but that doesn't happen until all the previous pages have sent their image data to the main thread, because the image data sending is in sequential page order. If we just ignore callback, the peak RSS is about 880 MB instead of 7,800. That's not a reasonable solution, but it shows the potential for improvement.
Yury and I discussed this on IRC: 21:58 njn yury: any thoughts on https://bugzilla.mozilla.org/show_bug.cgi?id=959925#c25 ? 22:01 yury njn: continueCallback is needed for smooth scrolling 22:02 njn yury: my patch wasn't intended to be the final solution, but to demonstrate the problem 22:03 yury njn: yeah, we know that 22:03 yury otherwise, on slow machine it will take long time to render the single page 22:03 njn yury: do you have any thoughts about how to fix the high memory consumption without losing the responsiveness that continueCallback provides? 22:03 yury and when we render single page at one time there is no problem 22:04 yury njn: bdahl wanted to look into better prioritize render requests 22:05 njn yury: sounds wise; currently requests get pushed to the back of the queue if they're not high priority; perhaps just pushing them back slightly would be better, if that's possible 22:06 njn yury: I only partially understand the request/prioritization code 22:06 yury so the bottom line: we have to give control to the main thread and continueCallback is doing that 22:10 yury we need to limit amount of simultaneous requests send to the worker and have a way to abort existing 22:10 njn yeah, sounds reasonable 22:10 njn if you scroll way past a page, no point continuing to work on it 22:11 yury not sure, when people use two-finger scroll they kinda scroll back and forward to find the page 22:12 njn yury: hence the "way past" 22:12 yury sure 22:13 yury that means we have to break getOperatorList execution 22:13 yury which is recursive by nature 22:14 njn yury: I think this change is beyond my abilities, but hopefully one of you guys can move forward with it 22:14 yury it's not simple task to do, yes 22:18 njn yury: should I file an issue in GitHub for this? 22:18 yury njn: there might be a way to trigger earlier clean up 22:18 yury for images 22:18 njn that might be good enough 22:23 yury njn: https://github.com/mozilla/pdf.js/issues/1887 22:25 njn yury: two years old :( 22:26 yury njn: 1.5 :)
Bug 962933 is an alternative approach that will help a lot with this, at least for some documents.
Since https://github.com/mozilla/pdf.js/issues/1887 exists, there's not much point having this bug open.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.