1531088 - Memory allocated by a content web app (5GB+) is not released soon enough after its reloads, eventually causing OOM

Reporter

Description

•

5 years ago

Nightly x64 (official builds), win10 with 32g of RAM.

I have a web app that processes ~1GB large logs and converts them to an in memory representation. The consumption may go to gigabytes, which is expected.

The problem: on later reload of the app, which triggers another round of parsing of the same set of log files immediately (from domcontentloaded), causes more and more memory to be consumed. Doing such a reload two or three times leads to: 1. huge slow down of the parsing process (apparently GC interruptions), 2. the parsing script ends up with OUT OF MEMORY exception eventually, 3. we don't release the memory even after some longer time, w/o the activity (didn't try to close the tab or hard reload to not restart the parsing automatically - sometimes it's not even possible because of the huge load).

We go up to 10GB of private bytes allocated for the tab handling content process. I have to kill that content process to be able to work again.

STR:

go to: https://mayhemer.github.io/backtrack-analyzer/
download, unpack and load (as baseline) all logs at: https://janbambas.cz/dwnlds/backtrack-logs-1551291722.tgz
wait for the message to go from "Loading..." to "Select objective" (will take some time)
then reload the page, the "loading..." message will appear again
repeat until the time to finish loading is either unbearable (minutes) or you get "out of memory" exception

ER:
we do GC more quickly to not let the content process pile up JS objects and eventually make JS run out of memory

This is quite an edge case, but I think worth reporting.

I don't know how much this is actionable, but could also be a recent leak regression because I work with these large log files and the app quite for some time, and this reloads+out-of-memory problem appeared just recently. Or it may just be that the logs have grown on my side recently :)

Andrew McCreight [:mccr8]

Comment 1

•

5 years ago

Yeah, this sort of thing has been an issue for awhile. The main example I'm aware of is asm.js applications, which allocate a large block of memory. There was a fix for those where if you try to allocate the big block and it fails, to immediately do a GC. It is harder to fix for something like this were presumably the memory is scattered around in a lot of different allocations. I'm not sure what a good fix is.

Paul Bone [:pbone]

Comment 2

•

5 years ago

This is the relevant part of a diff of two memory reports, one before loading any files and the other after loading the files. We can see that most of the memory is indeed in the JS heap, and quite a lot is strings.

├──2,949.03 MB (94.77%) -- top(https://mayhemer.github.io/backtrack-analyzer/, id=NNN)
│ │ ├──1,627.79 MB (52.31%) -- active/window(https://mayhemer.github.io/backtrack-analyzer/)
│ │ │ ├──1,610.80 MB (51.77%) -- js-realm(https://mayhemer.github.io/backtrack-analyzer/)
│ │ │ │ ├──1,610.73 MB (51.76%) -- classes
│ │ │ │ │ ├──1,022.50 MB (32.86%) -- class(Object)/objects
│ │ │ │ │ │ ├────675.63 MB (21.71%) ── gc-heap
│ │ │ │ │ │ └────346.87 MB (11.15%) ── malloc-heap/slots
│ │ │ │ │ ├────588.15 MB (18.90%) -- class(Array)/objects
│ │ │ │ │ │ ├──520.94 MB (16.74%) ── gc-heap [+]
│ │ │ │ │ │ └───67.21 MB (02.16%) -- malloc-heap
│ │ │ │ │ │ ├──67.21 MB (02.16%) ── elements/normal
│ │ │ │ │ │ └───0.00 MB (00.00%) ── slots [+]
│ │ │ │ │ └──────0.08 MB (00.00%) ++ (4 tiny)
│ │ │ │ └──────0.07 MB (00.00%) ++ (4 tiny)
│ │ │ └─────16.99 MB (00.55%) ++ (3 tiny)
│ │ └──1,321.24 MB (42.46%) -- js-zone(0xNNN)
│ │ ├──1,288.19 MB (41.40%) -- strings
│ │ │ ├──1,255.41 MB (40.34%) ++ (2905 tiny)
│ │ │ └─────32.79 MB (01.05%) ++ string(length=118, copies=122791, "RunScript [https://www.redditstatic.com/desktop2x/vendors~~Chat~~Client~~Gifts~~Governance~Reddit.31686eb897e3abe26481.js]")
│ │ └─────33.05 MB (01.06%) ++ (10 tiny)

Paul Bone [:pbone]

Comment 3

•

5 years ago

(In reply to Paul Bone [:pbone] from comment #2)

│ │ │ └─────32.79 MB (01.05%) ++ string(length=118, copies=122791, "RunScript [https://www.redditstatic.com/desktop2x/vendors~~Chat~~Client~~Gifts~~Governance~Reddit.31686eb897e3abe26481.js]")

Curious, I don't think I had a reddit tab open when I captured this. Maybe a service worker..

Paul Bone [:pbone]

Comment 4

•

5 years ago

I recorded a profile of 3 reloads of the data:

http://bit.ly/2UeaFrO

Between the 1st and 2nd the GC starts, it finishes after the 2nd load begins. But because the GC uses snapshot-at-the-beginning it can't free the memory that I assume was released when the 2nd load started.

Compare this to the gap between the 2nd and 3rd where the GC starts after the 3rd load begins, that GC is able to free a fair amount of memory (maybe I got lucky).

Regardless of this the 2nd GC will have been able to free what was released after the 1st GC started. So this shouldn't build up memory and OOM (so a more aggressive test is needed). Bug 1519298 will help us understand this better.

I also noticed that the GC collected only a single zone, at least in my profiling, that could limit what can be freed but didn't seem to for the 2nd GC.

Depends on: 1519298

Jon Coppeard (:jonco)

Updated

•

5 years ago

Blocks: GCScheduling

Jon Coppeard (:jonco)

Updated

•

5 years ago

Priority: -- → P3

Andrew McCreight [:mccr8]

Comment 5

•

5 years ago

I wonder if we should have more aggressive GC/CC scheduling for this reloading case. People love to spam reload, but it causes real problems for our usual heuristics, which assume that a page is going to live for more than a few seconds.

Jan de Mooij [:jandem]

Comment 6

•

5 years ago

(In reply to Andrew McCreight [:mccr8] from comment #5)

I wonder if we should have more aggressive GC/CC scheduling for this reloading case. People love to spam reload, but it causes real problems for our usual heuristics, which assume that a page is going to live for more than a few seconds.

I think that would be great. We're seeing this also on the devtools tests. reftests work around it with WindowUtils.runNextCollectorTimer but it would be nice if we didn't need that kind of thing.

Honza Bambas (:mayhemer)

Reporter

Comment 7

•

5 years ago

One small nit: when I load large logs into the web app, work with them for a while, then load a different set of logs that are by magnitude smaller (few MB), the content process later starts to hang periodically. I can see a full CPU core load and movement in memory (seems to go up and later down - but I have to look more closely to confirm!) Note that all the expected initial allocations of the new web app reload (and new set of logs load) is already done around at least 30+ seconds ago.

Honza Bambas (:mayhemer)

Reporter

Comment 8

•

5 years ago

(In reply to Honza Bambas (:mayhemer) from comment #7)

One small nit: when I load large logs into the web app, work with them for a while, then load a different set of logs that are by magnitude smaller (few MB), the content process later starts to hang periodically. I can see a full CPU core load and movement in memory (seems to go up and later down - but I have to look more closely to confirm!) Note that all the expected initial allocations of the new web app reload (and new set of logs load) is already done around at least 30+ seconds ago.

er... not sure I have reloaded the page before the new (smaller) set of logs load. I will observe more closely next time.

Paul Bone [:pbone]

Comment 9

•

5 years ago

Hi Honza, I'd also be great to get a profile of that particular behaviour - sounds odd to me.

Honza Bambas (:mayhemer)

Reporter

Comment 10

•

5 years ago

(In reply to Paul Bone [:pbone] from comment #9)

Hi Honza, I'd also be great to get a profile of that particular behaviour - sounds odd to me.

ni->me to not forget.

Flags: needinfo?(honzab.moz)

Jon Coppeard (:jonco)

Updated

•

5 years ago

Blocks: 1533449

Jon Coppeard (:jonco)

Updated

•

5 years ago

No longer blocks: GCScheduling

Honza Bambas (:mayhemer)

Reporter

Comment 11

•

5 years ago

stalled on my side.

Flags: needinfo?(honzab.moz)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

Memory allocated by a content web app (5GB+) is not released soon enough after its reloads, eventually causing OOM

Categories

(Core :: JavaScript: GC, defect, P3)

Tracking

()

People

(Reporter: mayhemer, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Comment 11

Updated