Open Bug 1609631 Opened 9 months ago Updated 6 months ago

OOM when trying to preview large PDF file

Categories

(Core :: IPC, defect, P1)

72 Branch
Desktop
All
defect

Tracking

()

Tracking Status
firefox-esr68 --- affected
firefox72 --- affected
firefox73 --- affected
firefox74 --- affected

People

(Reporter: stevenj, Assigned: haik)

References

()

Details

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0

Steps to reproduce:

We select a PDF file from an INPUT FILE control. The PDF is retrieved via an AJAX call and shown in the PDF previewer. (In JavaScript we use FileReader.readAsDataURL to pass the response to the OBJECT.)
(Difficult to share an example as it affects our internal admin system.)

Actual results:

In Firefox 71, on both Windows and Linux, this works as expected with all files so far.

Expected results:

With release 72.0.1 small files work but large files (approx >6Mb) cause Firefox to crash.

I am attempting to understand and reproduce this issue, but I am having a hard time.
I have found a large PDF file online (https://cartographicperspectives.org/index.php/journal/article/view/cp43-complete-issue/pdf) that can be used to reproduce, but I do now know what steps I must take in order to reproduce a crash.

Can you please write a list of detailed steps to reproduce?

Flags: needinfo?(stevenj)

Hi Daniel,
thank you. I'm not having issues when I load a PDF in it's own tab.

I have created a page at https://www.firstonscene.co.uk/test.html that illustrates the issue.

If I select a file that is less than around 8Mb (not 6Mb as previously noted) then it shows in the previewer and can be navigated.

If I try the same with a file larger than this then it loads and initially appears normal but it to consume resources and after a few seconds Firefox reports that the tab has crashed. During this time I notice sluggish response.

Hope this helps.

Flags: needinfo?(stevenj)

I have managed to reproduce this issue in Nightly v74.0a1, Beta v73.0b11 and Release v72.0.2. In ESR v68.4.2esr, the tab does not instantly crash, but it shows the "A page is slowing down your browser" yellow top notification bar and only after some more time, the crash also occurs, so the reproduction is also partially happening in ESR, but not the same as in the other channels.

https://crash-stats.mozilla.org/report/index/b4e9533c-678f-418e-a066-9ee690200129#tab-bugzilla

This issue was confirmed on Windows 10 and Mac OS 10.13.6.

Furthermore, I have attempted to regress the issue and find the regressor, but I could not find a good version. firefox65 and firefox50 dows not open the page and a crash could not be consistently reproduced.

Status: UNCONFIRMED → NEW
Has STR: --- → yes
Component: Untriaged → Performance
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → Desktop

The slowdown in ESR68 shouldn't be an issue; we should reassign this for the crash to be investigated. Probably doesn't really belong here either...

Component: Performance → IPC
Summary: Firefox crashes when trying to preview large PDF file → OOM when trying to preview large PDF file

bp-b4e9533c-678f-418e-a066-9ee690200129 is a failure to allocate 140MB, in a single allocation, while deserializing a SimpleURIParams. I'm assuming that's because of an extremely large data: URL.

I'm not an expert on the DOM side of things, but is it possible to get the data as a Blob and use createObjectURL instead?

I don't know the code base at all, nor how to use Mercurial, so apologies if this is unrelated.
We didn't have any problems in Firefox 71 so I had a look for changes to FileReader that were first implemented in Firefox 72. Found this:

https://hg.mozilla.org/mozilla-central/rev/617d82c93a06e8903b769d0bc7055cbaf91ed410

I'll try to find time to (1) use the call noted by Jed above (2) test with a different file type.
Cheers

The priority flag is not set for this bug.
:jld, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)
Component: IPC → DOM: Content Processes
Flags: needinfo?(jld)

I think the main problem here is that we use an infallible string SetLength operation when we try to deserialize the message: https://hg.mozilla.org/releases/mozilla-esr68/annotate/e215d711213984716b1c44db7d1254397de6ef4b/ipc/glue/IPCMessageUtils.h#l375

We should instead do a fallible allocation, and return false if it fails, so we kill the content process that sends the huge message rather than crashing the parent.

Component: DOM: Content Processes → IPC

:kmag Just to note that I first saw this issue Firefox 72, 71 was fine. The link you include seems to refer a change made way back in 49?

The crash report in comment 3 is a bit strange. There's plenty of virtual and physical memory, but only 41MB of page file. Maybe something weird is happening with the page file?

The link is to the crashing line in the crash report.

The priority flag is not set for this bug.
:jld, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)
Assignee: nobody → haftandilian
Flags: needinfo?(jld)
Priority: -- → P1

In addition to changing the SetLength calls to be fallible, we also would need to change deserialization in the generated protocol message handlers to not call FatalError() when ReadIPDLParam() fails due to OOMs for large messages (for some definition of large messages).

Crashing the tab seems like a good choice when the tab process is sending huge messages and the parent is not about to OOM anyway.

However, if the parent is low on memory and close to OOM'ing, is killing the tab process going to free up enough memory to be useful? We might end up killing lots of tabs. It might be worthwhile to experiment and see if this reduces parent OOM crashes.

To get an idea of which large messages we're sending, I trimmed down the PDF from comment 1 to the first 10 pages and it ends up being 9MB. Here are the large messages we send using the FileReader.readAsDataURL() test case from comment 2.

  PBrowser::Msg_OnStateChange | seqno       -151 | from pid  20848 | size 25161060 (24MB)
       PBrowser::Msg_VisitURI | seqno       -154 | from pid  20848 | size 12580524 (12MB)
 Browser::Msg_NewWindowGlobal | seqno       -157 | from pid  20848 | size 12580812 (12MB)
PBrowser::Msg_NewWindowGlobal | seqno       -157 | from pid  20848 | size 12580812 (12MB)

For Msg_VisitURI, we send the URI over IPC from the child to the parent to populate the history, but in the parent we throw away the URI because it's too large to pass nsNavHistory::CanAddURI(). (It's also not a supported type in this case being a data URI.) In this example we could trim off 12MB of IPC traffic and allocations by checking the URI suitability in the child before sending it over. We could also skip sending the URI for sessions where the history is disabled.

(In reply to Haik Aftandilian [:haik] from comment #13)

In addition to changing the SetLength calls to be fallible, we also would need to change deserialization in the generated protocol message handlers to not call FatalError() when ReadIPDLParam() fails due to OOMs for large messages (for some definition of large messages).

I think Nika was working on something for that.

(In reply to Haik Aftandilian [:haik] from comment #14)

  PBrowser::Msg_OnStateChange | seqno       -151 | from pid  20848 | size 25161060 (24MB)

For Msg_OnStateChange, we are sending the same data URI twice in one message.

You need to log in before you can comment on or make changes to this bug.