Closed Bug 876906 Opened 11 years ago Closed 11 years ago

[B2G][PDF viewer] PDF viewer can't load all W2 form pages

Categories

(Firefox OS Graveyard :: Gaia::PDF Viewer, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-, b2g18 affected)

RESOLVED FIXED
1.1 QE4 (15jul)
blocking-b2g -
Tracking Status
b2g18 --- affected

People

(Reporter: sarsenyev, Assigned: bdahl)

References

Details

(Whiteboard: leorun3 [TD-47901][MemShrink:P2][js4b2g] leorun4)

Attachments

(2 files, 1 obsolete file)

Description:
PDF viewer loads only three W2 form pages from http://www.irs.gov/ and then shuts down and a user is returned to browser

Repro Steps:
1) Updated to Inari Build ID: 20130528070204
2) Open Browser from the home screen
3) Open this link http://www.irs.gov/pub/irs-pdf/fw2.pdf 

Actual:
The first three pages load and after a few seconds the file shuts down

Expected:
All pages are loaded and displayed

Environmental  Variables:
Unagi Build ID: 20130528070209
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/611d727390a4
Gaia: 5f499ab1e4a274d3d28e20139c9cdc546dc4fc69

Environmental  Variables:
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/9aaf82b015cf
Gaia: 50abeb2c622bfe48803b732776daecb2fa34e575

Notes:
Repro frequency: 100%
Whiteboard: leorun3
Talking with leo, memory usage in our PDF viewer currently questionable per https://bugzilla.mozilla.org/show_bug.cgi?id=885287#c1. From what I've heard from test runs so far, it's relatively easy to OOM on the PDF viewer.
Whiteboard: leorun3 → leorun3 [MemShrink]
Attached file logcat
Whiteboard: leorun3 [MemShrink] → leorun3 [MemShrink] [TD-47901]
blocking-b2g: --- → leo+
Priority: -- → P1
Target Milestone: --- → 1.1 QE3 (24jun)
bdahl, are you still working on pdf?

can you do a memory use profile of this?
Flags: needinfo?(bdahl)
Justin Lebar recently created a well-documented Memory Debugging wiki that should help:

https://wiki.mozilla.org/B2G/Debugging_OOMs
Thanks for the link Mike. I'll take this for now and run a memory profile.
Assignee: nobody → mhenretty
Flags: needinfo?(bdahl)
Bytes Used	Count		Symbol Name
  44.58 MB      93.9%	12912	 	malloc
  38.50 MB      81.1%	4680	 	 js_malloc
  38.50 MB      81.1%	4680	 	  js::detail::BumpChunk::new_(unsigned long)
  38.50 MB      81.1%	4680	 	   js::LifoAlloc::getOrCreateChunk(unsigned long)
  38.50 MB      81.1%	4680	 	    js::LifoAlloc::alloc(unsigned long)
  16.25 MB      34.2%	4160	 	     js::frontend::ParseNodeAllocator::allocNode()
  13.47 MB      28.3%	431	 	     js::analyze::Bytecode* js::LifoAlloc::new_<js::analyze::Bytecode>()
   8.38 MB      17.6%	13	 	     js::analyze::Bytecode** js::LifoAlloc::newArray<js::analyze::Bytecode*>(unsigned long)
 168.00 KB       0.3%	42	 	     js::frontend::ObjectBox* js::LifoAlloc::new_<js::frontend::ObjectBox, JSObject*, js::frontend::ObjectBox*>(JSObject*, js::frontend::ObjectBox*)

Launching the PDV viewer application causes a memory allocation spike of around 45MB. Diving into this a little further, it seems the majority of the allocation happens in the JS parsing (34%), and JS analyzing (28.3%, and 17.6%). Gregor says the analyzing part is JS type inference. Brendan, is there any way we can do some lazy loading of the JS in PDF.js to avoid this spike?
Flags: needinfo?(bdahl)
See Also: → 874678
Same symptoms as bug 874678. Loading pdf.js causes OOM because the JS parser allocates too much memory.
Naveed, can we get some help here?
Flags: needinfo?(nihsanullah)
Note:

Testing on the Peak (Twist): I was able to load pdf
Sorry, accidental posting earlier. To continue, testing on the Peak, I was able to load PDF files just fine, but I did notice proportional memory go to around 60mb for the pdf viewer process on initial load:

                          |     megabytes     |
           NAME  PID NICE  USS  PSS   RSS VSIZE OOM_ADJ USER
     PDF Viewer 3300    1 56.2 58.8  71.0 287.0       2 app_3300


But as the bug mentions, scrolling past page 3 of http://www.irs.gov/pub/irs-pdf/fw2.pdf produces another memory spike this time around 155mb, which causes it to get killed.

                          |      megabytes      |
           NAME  PID NICE   USS   PSS   RSS VSIZE OOM_ADJ USER
     PDF Viewer 3300    1 154.0 156.6 168.2 372.3       2 app_3300
NBP is going to help investigate the JS memory utilization and bring in anyone he needs.
Flags: needinfo?(nihsanullah)
(In reply to Michael Henretty [:mikehenrty] from comment #10)
> Sorry, accidental posting earlier. To continue, testing on the Peak, I was
> able to load PDF files just fine, but I did notice proportional memory go to
> around 60mb for the pdf viewer process on initial load:

Briefly looking at profile taken on the browser while loading one of the page I noticed that a large portion of the memory was in objects-extra/elements (Memory allocated for object element arrays, which are used to represent indexed object properties.)

(In reply to Gregor Wagner [:gwagner] from comment #8)
> Same symptoms as bug 874678. Loading pdf.js causes OOM because the JS parser
> allocates too much memory.

PdfJS does not include additional script and it does not use any eval.  Thus, the JS parser has no reason to OOM unless the phone is already running other applications.  If you are able to load the first page of the PDF, this would surpized me if this is a OOM in the parser.

If the kernel is killing the newly opened application, this might be caused by a policy in the kernel, It kills the application which is doing most of the allocations instead of killing a background application.

@mikehenrty: Does the bug reproduce if before going to the next page, you switch to the home screen and come back to the application by using the task switcher?  Switching applications should cause a memory pressure which will reclaim the memory.  If you cannot reproduce it that way, then we could solve a part of this problem with the patch I am making for Bug 869263.

@mikehenrty: how have you done the profile, with the DMD?

I'll see if I can reproduce it on an unagi.
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> @mikehenrty: Does the bug reproduce if […].  If you cannot reproduce it that way, then we could
> solve a part of this problem with the patch I am making for Bug 869263.

Sorry, "If you cannot" -> "If you can"

Also, note that this depends on the phone kernel.  If the kernel does not provide a way to trigger a memory pressure visible from the application, then the patch in Bug 869263 would be of no help.  For example the Unagi does not provide this feature.
(just got back from vacation)
It would be good to do a sync up of the current version of pdf.js and the version that is on gaia.  We've had quite a few changes since then.  If that doesn't help we can try to be more aggressive about clearing out caches or not caching things at all.

I'll work on putting together a pull request.
Flags: needinfo?(bdahl)
(In reply to Brendan Dahl from comment #14)
> I'll work on putting together a pull request.

Sweet!
Assignee: mhenretty → bdahl
Blocks: 881974
Status: NEW → ASSIGNED
Whiteboard: leorun3 [MemShrink] [TD-47901] → leorun3 [TD-47901]
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> If you are able to load the first page of the PDF, this
> would surpized me if this is a OOM in the parser.

Makes sense. Parsing pdf.js (1.4mb) does cause memory spike of around 60mb, but you are right the app does not immediately crash.


> @mikehenrty: Does the bug reproduce if before going to the next page, you
> switch to the home screen and come back to the application by using the task
> switcher?

Unfortunately, when I go to the homescreen it automatically closes the PDF viewer process immediately.


> @mikehenrty: how have you done the profile, with the DMD?

I did the memory profile 3 different ways. I used b2g-desktop (ipc disabled) with Mac Instruments to get the call tree with allocation details that I posted in my first comment. Then on Unagi I ran b2g-procrank in a while loop to observe the allocations. And then on the Peak, I ran b2g-info in a while loop. The stats I posted in comment#10 were from the peak.
Blocks: 885287
Target Milestone: 1.1 QE3 (26jun) → 1.1 QE4 (15jul)
(In reply to Michael Henretty [:mikehenrty] from comment #7)
> […] JS analyzing (28.3%, and
> 17.6%). Gregor says the analyzing part is JS type inference. […]

The analyzing work is the bytecode analysis which was used to annotate the bytecode with information needed for the JIT.  Sadly, we cannot backport the removal of it as it is needed by JM, but this issue would not appear as much on a mozilla-central builds. (see Bug 865059 & Bug 875276)
(In reply to Michael Henretty [:mikehenrty] from comment #7)
> JS parsing (34%)
> and JS analyzing (28.3%, and 17.6%).

Just for the notice, these 2 set of allocations are done in 2 different LifoAlloc, the first one is done in "cx->tempLifoAlloc()" and the second one in "cx->analysisLifoAlloc()".  When the analysis is run, the parser is supposed to have finish it job, and start running the script.  So this memory spike is not important as the next GC will get rid of the tempLifoAlloc (see PurgeRuntime in jsgc.cpp) previously used by the parser.
(In reply to Nicolas B. Pierron [:nbp] from comment #18)
> (In reply to Michael Henretty [:mikehenrty] from comment #7)
> > JS parsing (34%)
> > and JS analyzing (28.3%, and 17.6%).
> 
> Just for the notice, these 2 set of allocations are done in 2 different
> LifoAlloc, the first one is done in "cx->tempLifoAlloc()" and the second one
> in "cx->analysisLifoAlloc()".  When the analysis is run, the parser is
> supposed to have finish it job, and start running the script.  So this
> memory spike is not important as the next GC will get rid of the
> tempLifoAlloc (see PurgeRuntime in jsgc.cpp) previously used by the parser.

Do we count these allocations in the mallocCounter? If not we don't have a good way to tigger the GC. The low memory killer seems to be too slow and we kill the app before we can even trigger a GC.
(In reply to Gregor Wagner [:gwagner] from comment #19)
> (In reply to Nicolas B. Pierron [:nbp] from comment #18)
> > (In reply to Michael Henretty [:mikehenrty] from comment #7)
> > > JS parsing (34%)
> > > and JS analyzing (28.3%, and 17.6%).
> > 
> > Just for the notice, these 2 set of allocations are done in 2 different
> > LifoAlloc, […]
> 
> Do we count these allocations in the mallocCounter? If not we don't have a
> good way to tigger the GC. The low memory killer seems to be too slow and we
> kill the app before we can even trigger a GC.

We do count them as part of the mallocCounter as we allocate the BumpChunk with js_malloc in ds/LifoAlloc.cpp.  But I guess we do not trigger a GC as we do not yet have allocated any GC-Cell.

The parser is supposed to free its memory once it is finished:

  Parser<ParseHandler>::~Parser()
  {
    …
    cx->tempLifoAlloc().freeAllIfHugeAndUnused();
  }

and looking at "freeAllIfHugeAndUnused()", we can see that Huge correspond to 50 MB, which is Enormous for B2G.  I think we should change this value to prevent the parser to keep its 16 MB, when we are compiling for B2G.  The best would be a pref, but I don't think this would help for the backport to b2g18.  So I will make it a compile-time ifdef for this bug (in order to be backported) and make a follow-up patch to add a preference.
Whiteboard: leorun3 [TD-47901] → leorun3 [TD-47901][MemShrink][js4b2g]
The current code explicitly free the memory if the parser complete with more than 50 MB of allocated memory.  This patch add a compile-time check to reduce this value to 4 MB.  PdfJS is the application with the largest file (size & lines) by far(*), and if files of other applications are compressed into one, then this does not matter if we free memory just after the parsing is finished.

This patch only add a #ifdef such as it can be backported without much risk.  I will open a follow-up bug to add a preference for it.

I will check the start-up time of a few applications with/without this patch with the build produced by tbpl:
  before: https://tbpl.mozilla.org/?rev=e1751478d411
  after: https://tbpl.mozilla.org/?tree=Try&rev=2fd9153d466a

(*) The second one being the email application with js/ext/mailapi/worker-bootstrap.js which weight the third of the size.
Attachment #769166 - Flags: review?(luke)
Comment on attachment 769166 [details] [diff] [review]
Reduce purge lower bound for the JS parser on B2G.

This patch depends on Bug 852305, which introduces the function freeAllIfHugeAndUnused() which use the 50 MB threshold, which would have to be backported too.
Two questions:
 - the #if condition looks reversed
 - so I understand: the cx->tempLifoAlloc is released (if nothing is running) on every GC.  This freeAllIfHugeAndUnused() just moves the free'ing earlier, so are you seeing an issue here where the GC does not follow soon enough?
(In reply to Luke Wagner [:luke] from comment #23)
> Two questions:
>  - the #if condition looks reversed

Indeed, it is reversed … :/

>  - so I understand: the cx->tempLifoAlloc is released (if nothing is
> running) on every GC.  This freeAllIfHugeAndUnused() just moves the free'ing
> earlier, so are you seeing an issue here where the GC does not follow soon
> enough?

Yes, this is exactly that.  The tempLifoAlloc is supposed to be released during the next GC, but we are just starting to analyze the bytecode (Gecko 18) before the execution of any script.  So reducing this value will release the memory early and give the ability to load larger scripts as we would have free their parseNode at the time of the analysis.
Oops, eager copy and paste, Thanks luke for noticing the inverted condition.

  after²: https://tbpl.mozilla.org/?tree=Try&rev=ff52d64b1381
Attachment #769219 - Flags: review?(luke)
Attachment #769166 - Attachment is obsolete: true
Attachment #769166 - Flags: review?(luke)
Attachment #769219 - Flags: review?(luke) → review+
Extracting just page 4 of the PDF I'm still running into OOM.  The issue seems to be with the image that is 5784x1448 pixels.  I'm looking into ways we can down sample before drawing it.
Comment on attachment 769219 [details] [diff] [review]
Reduce purge lower bound for the JS parser on B2G.

I checked with and without this patch on an Unagi with 256 MB of RAM to go from the starting the application to the rendering of the first page.  First of all, none of the applications are crashing.

I monitored the memory usage on both versions with
  watch -n.1 adb shell b2g-info

I noticed that both versions are stabilizing at 50.3 MB once the first page is rendered.  In both case the memory do a stop when the application starts rendering the progression bar. 

  Without the patch the memory stops around 39 MB.
  With the patch the memory stops around 30 MB.

The fact that both versions are stabilizing at the same level after the first page is rendered highlight that without the patch, a GC collects the memory previously allocated by the parser.

I cannot reproduce a crash on the start-up of the PDF Viewer, which was the case that this patch was supposed to address, as reported by comment 7.

I will try to backport patches on top of b2g18, as Gecko 18 allocates the bytecode analysis ahead of the interpretation, which might cause more memory pressure (higher spikes) which might not directly benefit from a GC.

I also compared the start-up time of other applications, both version have about the same start-up time for applications.  I was hopping to test the email application knowing that this is the application with the second largest JS file but I was unable to do so because of Bug 888669.

Original version: 8e3a124c9c1a (mozilla-central)
Modified version: ff52d64b1381 (try)
That sounds great! I tried to apply your patch on b2g18 but the const you changed isn't present on b2g18. Can we get the same benefit on b2g18 somehow?
(In reply to Brendan Dahl from comment #26)
> Extracting just page 4 of the PDF I'm still running into OOM.  The issue
> seems to be with the image that is 5784x1448 pixels.  I'm looking into ways
> we can down sample before drawing it.

That would be great! Page 4 needs ~170MB currently :( Far away from what we can handle.

Bytes Used	Count		Symbol Name
 136.00 MB      75.7%	35	 	js::ArrayBufferObject::allocateSlots(JSContext*, unsigned int, unsigned char*)
  32.39 MB      18.0%	18	 	mozilla::gfx::DrawTargetCG::Init(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSize const&, int, mozilla::gfx::SurfaceFormat)
(In reply to Gregor Wagner [:gwagner] from comment #28)
> That sounds great! I tried to apply your patch on b2g18 but the const you
> changed isn't present on b2g18. Can we get the same benefit on b2g18 somehow?

This constant comes from the patch landed for Bug 852305 (comment 22), and hopefully this is a small change too.
Whiteboard: leorun3 [TD-47901][MemShrink][js4b2g] → leorun3 [TD-47901][MemShrink][js4b2g] leorun4
I tried to open W2 form on b2g18, and noticed with a 512MB kernel and the following command:
  watch -n.1 adb shell b2g-procrank

that the memory is spiking above 52 MB, then suddenly fallback to 30MB, and grow to 50MB.  The spike is likely what has been reported in comment 7, and in this case a GC has probably triggered the removal of the parser lifo alloc chuncks.

With a backported version of Bug 850523, Bug 852305 and the current patch, I observed a similar spike to 54 MB with a sudden fallback to 38 MB.

I am not sure what I am observing there, I will probably have to dig in with some spew to get more information on what is going on.
(In reply to Nicolas B. Pierron [:nbp] from comment #31)
> With a backported version of Bug 850523, Bug 852305 and the current patch, I
> observed a similar spike to 54 MB with a sudden fallback to 38 MB.

The good news, is that when I tried to instrument it, I noticed that I forgot to apply the patch that I attached in this bug :)

And I do see a spike at 40 MB which goes down back to 30 MB and later increase as expected.

> I am not sure what I am observing there, I will probably have to dig in with
> some spew to get more information on what is going on.

The bad news, is that we are loading a 1.5 MB file twice.

As I wanted to be sure I instrumented the freeAllIfHuge function, and notice that we were doing 2 free of ~10MB.  So I instrumented the parser to print the file name and we do load app://pdfjs.gaiamobile.org/content/build/pdf.js twice during the loading of the PDF.

I know PdfJS has a worker and I would not be surprised if the file is loaded twice, once for the worker, and once in the page. I will check the thread identifier to confirm this hypothesis.
Flags: needinfo?(bdahl)
We have a worker where the file is loaded again.  It's possible to disable the worker, but even when I disable it I still get a crash.  

If I disable images the page loads fine.  We seem to allocate too much for processing the e-file image.  It would be best to have some sort of streaming/down sampled image, but this isn't trivial to implement with the current way we decode pdf streams and images.  

If you're interested in disabling the worker add #disableWorker=true or modify https://github.com/mozilla-b2g/gaia/blob/master/apps/pdfjs/content/web/viewer.js#L2657
Flags: needinfo?(bdahl)
(In reply to Brendan Dahl from comment #33)
> If I disable images the page loads fine.  We seem to allocate too much for
> processing the e-file image.  It would be best to have some sort of
> streaming/down sampled image, but this isn't trivial to implement with the
> current way we decode pdf streams and images.  

Could we prevent loading any image which is highly over-sized compared to the screen resolution & zoom-level, or some bad heuristic like that?
(In reply to Nicolas B. Pierron [:nbp] from comment #34)
> Could we prevent loading any image which is highly over-sized compared to
> the screen resolution & zoom-level, or some bad heuristic like that?

Yes, that's the back up plan in case we can't get the allocations down.
Whiteboard: leorun3 [TD-47901][MemShrink][js4b2g] leorun4 → leorun3 [TD-47901][MemShrink:P2][js4b2g] leorun4
Any update on a resolution to this bug?
On the pdf.js side of things we've got two fixes to lower memory usage and we're going to disable images above a certain size for now. We should have an update to gaia today or tomorrow.
(In reply to Dietrich Ayala (:dietrich) from comment #36)
> Any update on a resolution to this bug?

On the JS side, the patch I made for the memory consumption of the parser does not solve the problem, as the memory is freed on GC and I cannot reproduce the problem, even with only 256 MB of RAM.  In addition, the memory consumption after the first page is rendered is higher than what is ever consumed by the parser plus the few objects, so the applications would still be killed later if we applied the patch.
Thanks! Are there separate bugs for those parts? Or applying all those patches here?
(In reply to Dietrich Ayala (:dietrich) from comment #39)
> Thanks! Are there separate bugs for those parts? Or applying all those
> patches here?

I am moving the parser patch to Bug 892097, but as :njn noticed, this is not a big concern for Gecko 26, since we are using lazy parsing.  So unless there is a huge interest for this patch (which I am not sure to understand) I can attach the backported versions of the previous patches to Bug 892097.
Depends on: 892721
Depends on: 892097
No longer depends on: 892721
Depends on: 892721
Setting the dependency on Bug 892097 and making that a leo nom as my understanding is we would need to backport patches of that bug to resolve this leo+ issue.
(In reply to bhavana bajaj [:bajaj] from comment #41)
> Setting the dependency on Bug 892097 and making that a leo nom as my
> understanding is we would need to backport patches of that bug to resolve
> this leo+ issue.

Bug 892097, does not fix the PdfJS issue.  I free the memory sooner, but even if we free the memory sooner, PdfJS memory usage is still high and the crash cannot be avoided that way.
This issue is no longer reproducing on today's leo build Build ID: 20130717070237

Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/582e3a7018b0
Gaia: c506c50adaaebcf729ac3c27887ba2931ab79040
Platform Version: 18.1
RIL Version: 01.01.00.019.164 

The user can view all 11 Pages of the W2 form in the PDF Viewer without issue.
Unblocking based upon comment 43.
blocking-b2g: leo+ → -
(In reply to Alex Keybl [:akeybl] from comment #44)
> Unblocking based upon comment 43.

I'm going to mark fixed actually based on comment 43 as well.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
> > Extracting just page 4 of the PDF I'm still running into OOM.  The issue
> > seems to be with the image that is 5784x1448 pixels.  I'm looking into ways
> > we can down sample before drawing it.
> 
> That would be great! Page 4 needs ~170MB currently :( Far away from what we
> can handle.
> 
> Bytes Used	Count		Symbol Name
>  136.00 MB      75.7%	35	 	js::ArrayBufferObject::allocateSlots(JSContext*,
> unsigned int, unsigned char*)

I've been investigating pdf.js image handling recently. An image of that size should need 5784*1448*4 bytes, which is 31.94 MiB. So I don't know where the 136.00 MB comes from.

However, I did find in bug 959925 that when handling an image we actually have *five* copies of the pixel data live at once, and I have patches in that bug that reduce the number of copies to one.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: