Last Comment Bug 711388 - WORKER_RUNTIME_HEAPSIZE too small for big PDFs in PDF.JS
: WORKER_RUNTIME_HEAPSIZE too small for big PDFs in PDF.JS
Status: VERIFIED FIXED
[qa!]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 9 Branch
: Other Other
: -- normal (vote)
: mozilla12
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
Mentors:
: 708308 714923 722658 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-16 04:54 PST by Julian Viereck
Modified: 2012-03-05 07:02 PST (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
-
affected
verified
verified
unaffected
11+
verified


Attachments
Patch, v1 (14.89 KB, patch)
2011-12-29 16:09 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
jonas: review+
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Julian Viereck 2011-12-16 04:54:35 PST
Starting from FF9, it's not possible to use WebWorkers to process a 32mb big PDF in PDF.JS. This was working before. I have tested this failing on Mac only so far and it was confirmed by another Mac user (for me that's OsX 10.7.2).

I was able to fix this in the current Nightly by increasing the `WORKER_RUNTIME_HEAPSIZE` number in https://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#82 to

    // The size of the worker runtime heaps in bytes.
    #define WORKER_RUNTIME_HEAPSIZE 128 * 1024 * 1024 

Also, hitting this WORKER_RUNTIME_HEAPSIZE limit doesn't result in an error during execution. try {...}catch() doesn't catch anything, nor does worker.onerror is called.


=== Detailed explanation of failing in PDF.JS ===

What happens in PDF.JS:
1. the main thread loads the PDF file
2. the main thread spawns a new WebWorker and postMessage the PDF content to it
3. the worker creates a new PDFDoc instance based on the posted data
4. during the construction of the PDFDoc object, some data of the PDF is read
5. the JS engine stops execution at a reproduceable point. There is no error thrown. This means, surrounding the code that breaks with a try{ }catch() doesn't catch anything, nor does a worker.onerror callback on the main thread gets called.
6. if the main thread does another postMessage afterwards to the same worker, the worker acts "as normal", but as it "died" during the former step, rendering the PDF is broken

I was able to track down the point of "dying" to this line:

  https://github.com/jviereck/pdf.js/blob/master/src/parser.js#L506

The following switch statement isn't evaluated anymore, which can be seen from this console output (as copied from the WebConsole):

[22:52:17.407] enter lexerGetObj @ http://localhost:8888/src/worker.js:12
[22:52:17.430] post comments: true @ http://localhost:8888/src/worker.js:12
[22:52:17.454] page_request1 @ http://localhost:8888/src/worker.js:12

The "post comments: true" is from the parser.js#L506 source line. The next entry "page_request1" is from the next postMessage the main thread sends to the worker. That means after line 506 in parser.js, no other console log statements are executed anymore, although there should be as there are console.logs for any case-statement after line 506 in parser.js.

=== Way to reproduce this ===

1. Checkout the current PDF.JS code from git@github.com:mozilla/pdf.js.git
2. In the new directory, run "make server". This will download a bunch of PDFs on the first run
3. Once the files are downloaded, point Firefox to http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf

= Expected result:

After some time (30sec+), you see the first page of the pdf.pdf document rendered.

= Actual result:

You see an error indicator at the top of the viewer "An error occurred while rendering the page.". Clicking on "More Information" pops up 

  PDF.JS Build: undefined 
  Message: pdfDoc is null 
  Stack: wphSetupPageRequest(1)@http://localhost:8888/src/worker.js:115 
  messageHandlerComObjOnMessage([object MessageEvent])@http://localhost:8888/src/worker.js:44

The `pdfDoc` object is null, as the execution "dies" during the construction of the "PDFDoc" instance, that is normally assigned to the `pdfDoc` variable.
Comment 1 Boris Zbarsky [:bz] (TPAC) 2011-12-16 08:46:47 PST
What's the JS engine issue here?  Lack of a catchable exception?  The heap size is a DOM issue, yes?
Comment 2 Julian Viereck 2011-12-16 08:57:42 PST
Boris, feel free to move this bug into a different component. As this has something to do with JS, I thought it may be suitable to fill it for the "JavaScript Engine" component.
Comment 3 Boris Zbarsky [:bz] (TPAC) 2011-12-16 09:44:32 PST
I'm just trying to figure out whether the bug is about the heap size not being big enough or about the way that cap is enforced.  Is it just the former?
Comment 4 Julian Viereck 2011-12-16 10:04:29 PST
This bug is about the heap size not being big enough, which makes PDF.JS fail on big files. Guess I should fill another bug that there should be an error message or another kind of "notification" if the head size limit is reached.

Sorry for that confusion :/
Comment 5 Boris Zbarsky [:bz] (TPAC) 2011-12-16 10:11:18 PST
OK, thanks for the clarification.

I guess we used to just use the browser runtime's heap for this, which is basically unlimited in size...

Ben, I don't see anything in bug 649537 discussing the max heap size to use.
Comment 6 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-16 10:13:20 PST
Workers should almost certainly take the OOM exception and fire something at worker.onerror.
Comment 7 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-16 13:42:54 PST
Yeah, there are two bugs here.

1. The memory limit is set based on the limit we give the main thread's runtime. However, there's another hook that I didn't know about which basically removes the cap on the main thread's heap size. I think we should do that for workers too.

2. We need to make sure that OOM gets reported properly. I think what's happening is that OOM is being confused with workers' force-kill mechanism.
Comment 8 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-29 16:09:01 PST
Created attachment 584878 [details] [diff] [review]
Patch, v1

Removes the cap on memory usage and ties it to the same pref as the main runtime. Also fixes the problem where we weren't reporting the OOM via a bubbling error event.
Comment 9 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-03 13:02:55 PST
*** Bug 714923 has been marked as a duplicate of this bug. ***
Comment 10 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-04 11:14:23 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e8af7eb5f55
Comment 11 Ed Morley [:emorley] 2012-01-04 17:28:53 PST
https://hg.mozilla.org/mozilla-central/rev/0e8af7eb5f55
Comment 12 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-04 17:36:10 PST
Comment on attachment 584878 [details] [diff] [review]
Patch, v1

[Approval Request Comment]

Regression caused by (bug #):

bug 649537

User impact if declined:

Workers can't use nearly as much memory as the main thread. This is a problem for PDF.js in particular but any worker that needs more than 32mb heap will silently fail.

Testing completed (on m-c, etc.): 

Intentionally running tinderboxes out of memory is not wise, however local testing confirms that this patch is correct.

Risk to taking this patch (and alternatives if risky):

Almost no risk with workers. It is possible that malicious or dumb workers may cause the main thread to run out of memory more often, though we're willing to accept that behavior.
Comment 13 christian 2012-01-05 15:08:02 PST
Comment on attachment 584878 [details] [diff] [review]
Patch, v1

[triage comment]
Denied for beta but approved for aurora. The benefit seems very edge-case (workers aren't used over a large portion of the web, let alone workers hitting this high memory limit) but the risk also seems contained to the affected code. PDF.js isn't shipping in Fx yet and there is a way to work around (disable web workers), though of course the experience degrades.

If you disagree, please renominate with additional justification.
Comment 14 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-01-05 20:50:17 PST
Christian: Since FF10 is going to turn into an ESR release, I suspect this code will end up susriving into an era where workers will be used more. Would it make sense to take this patch for an ESR dot-release once it has baked in FF11 or some such?

If so I think it's totally fine to not take it for FF10 for now.
Comment 15 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-06 09:11:26 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/551447a921f3
Comment 16 Alex Keybl [:akeybl] 2012-01-09 11:46:51 PST
(In reply to Jonas Sicking (:sicking) from comment #14)
> Christian: Since FF10 is going to turn into an ESR release, I suspect this
> code will end up susriving into an era where workers will be used more.
> Would it make sense to take this patch for an ESR dot-release once it has
> baked in FF11 or some such?
> 
> If so I think it's totally fine to not take it for FF10 for now.

We'll be tracking top issues for FF10 throughout it's >1year lifespan. We can reconsider taking this in the future if necessary.
Comment 17 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-01-09 13:21:09 PST
So how do we track this for ESR if not through the firefox10 flag?
Comment 18 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-11 18:46:13 PST
*** Bug 708308 has been marked as a duplicate of this bug. ***
Comment 19 Alex Keybl [:akeybl] 2012-01-12 12:40:47 PST
(In reply to Jonas Sicking (:sicking) from comment #17)
> So how do we track this for ESR if not through the firefox10 flag?

This will only be a top issue for ESR if enterprise IT/users complain about it. Let's wait for their feedback before tracking.
Comment 20 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-31 09:11:17 PST
*** Bug 722658 has been marked as a duplicate of this bug. ***
Comment 21 lazierthanthou 2012-01-31 20:53:16 PST
(In reply to Jonas Sicking (:sicking) from comment #14)
> Christian: Since FF10 is going to turn into an ESR release, I suspect this
> code will end up susriving into an era where workers will be used more.
> Would it make sense to take this patch for an ESR dot-release once it has
> baked in FF11 or some such?
> 
> If so I think it's totally fine to not take it for FF10 for now.

I do not know how much it counts but I agree with the view above.
Bug 722658 (which is a duplicate of this) affects the SQLite Manager addon which our web developers use extensively in our organization.
Comment 22 Alex Keybl [:akeybl] 2012-02-16 15:35:55 PST
Please land on mozilla-esr10 as soon as possible. For more information on the landing process, please see https://wiki.mozilla.org/Release_Management/ESR_Landing_Process
Comment 23 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-02-16 16:24:35 PST
https://hg.mozilla.org/releases/mozilla-esr10/rev/14f36ee6e58c
Comment 24 Virgil Dicu [:virgil] [QA] 2012-02-29 04:44:42 PST
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (X11; Linux x86_64; rv:12.0a2) Gecko/20120227 Firefox/12.0a2

Verified on 11Beta4 and Firefox 12 Aurora on Ubuntu 11.10 (used steps in comment 0).
No error message displayed, pdf loaded entirely. Could previously reproduce on F9.0.1.
Comment 25 Virgil Dicu [:virgil] [QA] 2012-03-05 07:02:43 PST
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20100101 Firefox/10.0.3

Verified on Firefox 10.0.3 ESR on Ubuntu 11.10 with the same steps from comment 0. No error message received, page loaded completely.

Marking as verified across branches (also see comment 24).

Note You need to log in before you can comment on or make changes to this bug.