Closed Bug 847185 Opened 11 years ago Closed 7 years ago

B2G: OOM for basic HTML games

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: gwagner, Unassigned)

References

()

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files)

We are trying to bring HTML games on B2G phones but currently we hit OOM before some of them even start. A simple one is moster dash: http://chrome.monsterdashgame.com/

These are the numbers of FF nightly desktop when starting the game. The TI memory usage seems very high:

 ├──303.37 MB (51.34%) -- top(http://chrome.monsterdashgame.com/, id=7)
│  │  ├──302.61 MB (51.21%) -- active
│  │  │  ├──293.15 MB (49.61%) -- window(http://chrome.monsterdashgame.com/)
│  │  │  │  ├──290.38 MB (49.14%) -- js/compartment(http://chrome.monsterdashgame.com/)
│  │  │  │  │  ├──140.82 MB (23.83%) -- objects-extra
│  │  │  │  │  │  ├──140.53 MB (23.78%) ── elements [2]
│  │  │  │  │  │  └────0.29 MB (00.05%) ── slots [2]
│  │  │  │  │  ├──140.06 MB (23.70%) -- type-inference
│  │  │  │  │  │  ├──125.06 MB (21.16%) ── analysis-pool
│  │  │  │  │  │  ├───10.29 MB (01.74%) ── type-scripts [2]
│  │  │  │  │  │  └────4.71 MB (00.80%) ++ (3 tiny)
│  │  │  │  │  ├────6.33 MB (01.07%) ++ gc-heap
│  │  │  │  │  └────3.17 MB (00.54%) ++ (6 tiny)
│  │  │  │  └────2.78 MB (00.47%) ++ (4 tiny)
│  │  │  └────9.46 MB (01.60%) ++ (3 tiny)
│  │  └────0.76 MB (00.13%) ++ cached/window(about:home)


On the akami phone I see following before we hit OOM:
Browser (pid 1431)

Explicit Allocations
91.08 MB (100.0%) -- explicit
├──81.16 MB (89.11%) -- window-objects/top(http://chrome.monsterdashgame.com/, id=1)
│  ├──81.16 MB (89.11%) -- active/window(http://chrome.monsterdashgame.com/)
│  │  ├──80.96 MB (88.89%) -- js/compartment(http://chrome.monsterdashgame.com/)
│  │  │  ├──70.11 MB (76.97%) -- objects-extra
│  │  │  │  ├──70.02 MB (76.88%) ── elements
│  │  │  │  └───0.09 MB (00.09%) ── slots
│  │  │  ├───6.71 MB (07.37%) ── script-data
│  │  │  ├───2.68 MB (02.94%) ── analysis-temporary
│  │  │  ├───1.04 MB (01.14%) ++ gc-heap
│  │  │  └───0.44 MB (00.48%) ++ (4 tiny)
│  │  └───0.20 MB (00.22%) ++ (4 tiny)
│  └───0.00 MB (00.00%) ── cached/window(http://chrome.monsterdashgame.com/)/dom/other
Whiteboard: [MemShrink]
I don't see many GCs before we close the browser. I will investigate in this direction.
Doing a GC should pretty much wipe out the analysis-pool memory.
The high level of objects-extra/elements is reminiscent of bug 839631.
Attached file about:memory
attached: whole about:memory

The game simulates the c++ heap with a 64 MB JS array.

b2g-ps:
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      108   1     184312 52736 ffffffff 400d4330 S /system/b2g/b2g
Browser          app_3826  3826  108   197212 112444 ffffffff 4099a228 R /system/b2g/plugin-container
> The game simulates the c++ heap with a 64 MB JS array.

AIUI, the B2G phones have ~60 MiB available for all user apps.  If that's true, requesting a 64 MiB array up-front is pretty hopeless.
Whiteboard: [MemShrink] → [MemShrink:P2]
(In reply to Nicholas Nethercote [:njn] from comment #5)
> > The game simulates the c++ heap with a 64 MB JS array.
> 
> AIUI, the B2G phones have ~60 MiB available for all user apps.  If that's
> true, requesting a 64 MiB array up-front is pretty hopeless.

This sounds like this bug is more an evangelism issue than a JS issue, unless proven wrong.
Especially if the game does not even start.
Agreed, but separately, the 140mb of TI is probably too high.  We should file a separate bug on that.
(In reply to Justin Lebar [:jlebar] from comment #7)
> Agreed, but separately, the 140mb of TI is probably too high.  We should
> file a separate bug on that.

This is a known issue, and we already have 2 bugs(*) open on it.  One question remain, is which one should we choose.

(*) doing it on our current implementation (Bug 778724), or doing it after the removal of TI from IonMonkey (Bug 804676).  We will get rid of JM, the question is when do we switch to Ion compared to Baseline landing & B2G fork.
FWIW, on the full RAM Unagi kernel, the game actually loads, but it shows the desktop UI and is *super* slow.
Browser (pid 967)

Explicit Allocations
145.40 MB (100.0%) -- explicit
├──106.75 MB (73.41%) -- window-objects/top(http://chrome.monsterdashgame.com/, id=1)/active
│  ├──104.16 MB (71.64%) -- window(http://chrome.monsterdashgame.com/)
│  │  ├──103.90 MB (71.45%) -- js/compartment(http://chrome.monsterdashgame.com/)
│  │  │  ├───70.42 MB (48.43%) -- objects-extra
│  │  │  │   ├──70.26 MB (48.32%) ── elements
│  │  │  │   └───0.15 MB (00.10%) ── slots
│  │  │  ├───20.96 MB (14.41%) ── analysis-temporary
│  │  │  ├────6.82 MB (04.69%) ── script-data
│  │  │  ├────2.65 MB (01.82%) -- type-inference
│  │  │  │    ├──2.59 MB (01.78%) ── script-main
│  │  │  │    └──0.06 MB (00.04%) ++ (2 tiny)
│  │  │  ├────1.91 MB (01.31%) ++ gc-heap
│  │  │  └────1.14 MB (00.78%) ++ (4 tiny)
│  │  └────0.27 MB (00.18%) ++ (4 tiny)
│  ├────1.49 MB (01.02%) ++ window(https://plusone.google.com/_/+1/fastbutton?bsv&hl=en-US&origin=http%3A%2F%2Fchrome.monsterdashgame.com&url=http%3A%2F%2Fchrome.monsterdashgame.com%2F&ic=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_US.lBnyOLjA3G0.O%2Fm%3D__features__%2Fam%3DqQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAItRSTMOxyDN3ZjCXp_L4ay39dcDvU9LkQ#_methods=onPlusOne%2C_ready%2C_close%2C_open%2C_resizeMe%2C_renderstart%2Concircled&id=I0_1362696548739&parent=http%3A%2F%2Fchrome.monsterdashgame.com&rpctoken=23708123)
│  └────1.10 MB (00.76%) ++ (2 tiny)
├───17.25 MB (11.87%) -- images
│   ├──17.25 MB (11.87%) -- content
│   │  ├──17.25 MB (11.87%) -- used
│   │  │  ├──11.33 MB (07.79%) ── uncompressed-heap
│   │  │  ├───5.92 MB (04.07%) ── raw
│   │  │  └───0.00 MB (00.00%) ── uncompressed-nonheap
│   │  └───0.00 MB (00.00%) ++ unused
│   └───0.00 MB (00.00%) ++ chrome
├───13.11 MB (09.02%) -- js-non-window
│   ├───6.02 MB (04.14%) -- runtime
│   │   ├──4.01 MB (02.76%) ── jaeger-code
│   │   └──2.01 MB (01.38%) ++ (13 tiny)
│   ├───4.90 MB (03.37%) -- gc-heap
│   │   ├──4.76 MB (03.27%) ── decommitted-arenas
│   │   └──0.14 MB (00.10%) ++ (3 tiny)
│   └───2.19 MB (01.51%) ++ compartments
├────5.99 MB (04.12%) ── heap-unclassified
└────2.30 MB (01.58%) ++ (13 tiny)
Watch out, this will crash your browser if you try to load it in the Cleopatra UI (bug 849040).
(In reply to Reuben Morais [:reuben] from comment #9)
> FWIW, on the full RAM Unagi kernel, the game actually loads, but it shows
> the desktop UI and is *super* slow.

Can you enable the DMD in your custom build and check if this is related to Bug 848615.
Where did you find this custom kernel?
(In reply to Reuben Morais [:reuben] from comment #11)
> Created attachment 722541 [details]
> profile_1293_Browser.sym
> 
> Watch out, this will crash your browser if you try to load it in the
> Cleopatra UI (bug 849040).

The profiles show that the application has been compiled with mandreel[1], which and show that only JM is running, and if this is not a recent build, none of the Typed Array optimizations are runnig.

Which version of gecko are you using? Can you try with IonMonkey?

[1] http://arewefastyet.com/#machine=10&view=breakdown&suite=octane, see octane-Mandreel
(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> (In reply to Reuben Morais [:reuben] from comment #11)
> > Created attachment 722541 [details]
> > profile_1293_Browser.sym
> > 
> > Watch out, this will crash your browser if you try to load it in the
> > Cleopatra UI (bug 849040).
> 
> The profiles show that the application has been compiled with mandreel[1],
> which and show that only JM is running, and if this is not a recent build,
> none of the Typed Array optimizations are runnig.
> 
> Which version of gecko are you using? Can you try with IonMonkey?
> 
> [1] http://arewefastyet.com/#machine=10&view=breakdown&suite=octane, see
> octane-Mandreel

It's for our B2G release so it's version 18. And yes, they are compiling with mandreel.
(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> (In reply to Reuben Morais [:reuben] from comment #11)
> > Created attachment 722541 [details]
> > profile_1293_Browser.sym
> > 
> > Watch out, this will crash your browser if you try to load it in the
> > Cleopatra UI (bug 849040).
> 
> The profiles show that the application has been compiled with mandreel,
> which and show that only JM is running, and if this is not a recent build,
> none of the Typed Array optimizations are runnig.

We should backport Bug 837347 to b2g18!
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> Can you enable the DMD in your custom build and check if this is related to
> Bug 848615.

Doesn't look like it. The DMD report is too large to be attached, so here you go: https://dl.dropbox.com/u/10968786/dmd-browser-2384.txt

> Where did you find this custom kernel?

https://intranet.mozilla.org/B2G_Team/Unagi
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> We should backport Bug 837347 to b2g18!

So I just tested with the patch in that bug, and it makes things better. The slow script dialog is not shown anymore, and the profile[0] shows that most of the time is now spent in painting. Memory consumption is unchanged.

[0] https://people.mozilla.com/~bgirard/cleopatra/#report=312bcb091b7ce52bdcf3ebe86fd48e0f890c85a8
(In reply to Reuben Morais [:reuben] from comment #17)
> (In reply to Nicolas B. Pierron [:nbp] from comment #15)
> > We should backport Bug 837347 to b2g18!
> 
> So I just tested with the patch in that bug, and it makes things better. The
> slow script dialog is not shown anymore, and the profile[0] shows that most
> of the time is now spent in painting. Memory consumption is unchanged.
> 
> [0]
> https://people.mozilla.com/~bgirard/cleopatra/
> #report=312bcb091b7ce52bdcf3ebe86fd48e0f890c85a8

Looking at the profile, the problem seems to be in the system function in /system/lib/egl/libGLESv2_adreno200.so is used at 54.2% (probably glFinish as jgilbert mentionned over IRC), can you confirm that they are using requestAnimationFrame for rendering frames?

Running their application in the browser show around 85% - 110%(*) of CPU usage of 1 core of an i7-2820QM CPU @ 2.30GHz. (*) Temperature based dynamic overclocking of i7 cores.

Per frames results are:
 - 47 ms under js::RunScript
 - 340 ms under Paint::PresShell::Paint (including 262ms for glFinish)

At this point they should probably use less draw calls (as reported by jgilbert), which will reduce both the JS and the gl costs, but we cannot improve much.
I wonder if things are better now.
I can confirm that the TI issue has gone away on desktop at least.

│  ├──108.98 MB (13.08%) -- top(http://chrome.monsterdashgame.com/, id=358)
│  │  ├──103.41 MB (12.41%) -- active
│  │  │  ├───89.57 MB (10.75%) -- window(http://chrome.monsterdashgame.com/)
│  │  │  │   ├──86.30 MB (10.36%) -- js-compartment(http://chrome.monsterdashgame.com/)
│  │  │  │   │  ├──71.40 MB (08.57%) -- objects
│  │  │  │   │  │  ├──70.41 MB (08.45%) -- malloc-heap
│  │  │  │   │  │  │  ├──70.23 MB (08.43%) ── elements/non-asm.js
│  │  │  │   │  │  │  └───0.17 MB (00.02%) ── slots
│  │  │  │   │  │  └───0.99 MB (00.12%) ++ (2 tiny)
│  │  │  │   │  ├───9.73 MB (01.17%) ++ baseline
│  │  │  │   │  └───5.18 MB (00.62%) -- (6 tiny)
│  │  │  │   │      ├──2.48 MB (00.30%) ++ type-inference
│  │  │  │   │      ├──1.39 MB (00.17%) ++ shapes
│  │  │  │   │      ├──0.70 MB (00.08%) ── ion-data
│  │  │  │   │      ├──0.58 MB (00.07%) ++ scripts
│  │  │  │   │      ├──0.02 MB (00.00%) ++ sundries
│  │  │  │   │      └──0.02 MB (00.00%) ── regexp-compartment
│  │  │  │   └───3.27 MB (00.39%) ++ (4 tiny)
│  │  │  └───13.84 MB (01.66%) ++ (4 tiny)
│  │  └────5.57 MB (00.67%) ++ js-zone(0x16b90d800)

I still get an OOM crash on my hamachi device and have confirmed that the JS helper / browser process was killed due to to much memory usage via adb dmesg:

<6>[84526.413670] [25178] 35172 25172    18710     3907   0       8           534 JS GC Helper
<6>[84526.413691] [25372] 35372 25372    41770    21962   0       2           134 Browser
<3>[84526.413706] Out of memory: Kill process 25178 (JS GC Helper) score 619 or sacrifice child
<3>[84526.413721] Killed process 25178 (JS GC Helper) total-vm:74840kB, anon-rss:15600kB, file-rss:28kB
<4>[84526.720401] select 1148 (b2g), adj 0, size 13468, to kill
<4>[84526.720430] select 25372 (Browser), adj 2, size 25582, to kill
<4>[84526.720443] send sigkill to 25372 (Browser), adj 2, size 25582

I am unable to get an about:memory snapshot out of the device via the helper python script due to the crash happening too quickly and the script being unable to get all of the files it wants. If someone has a device with more RAM we could probably get a legit snapshot. My assumption is the 64MB JS array is still the culprit.
Gregor, what do you want to do with this? The improvement from comment 0 to comment 20 is excellent, but so long as it requires that 64 MiB array, getting it to work on phones with 256 MiB will be difficult.
Flags: needinfo?(anygregor)
(In reply to Nicholas Nethercote [:njn] from comment #21)
> Gregor, what do you want to do with this? The improvement from comment 0 to
> comment 20 is excellent, but so long as it requires that 64 MiB array,
> getting it to work on phones with 256 MiB will be difficult.

We should try to get some numbers on a real device. We saw that IPC communication can lead to a very different memory footprint in the past.
Eric, you should try to run it on a nexus 4 and get some real numbers there.
Flags: needinfo?(anygregor)
> Eric, you should try to run it on a nexus 4 and get some real numbers there.

Eric doesn't have a Nexus 4.
(In reply to Nicholas Nethercote [:njn] from comment #23)
> > Eric, you should try to run it on a nexus 4 and get some real numbers there.
> 
> Eric doesn't have a Nexus 4.

He should get one. It's much easier to reason about OOM on low end devices on a device with more memory. The cases where we actually have to be on the low-end device are rare (like dealing with low-memory signals)
I think this bug should be closed. The primary problem (type inference memory usage) has been fixed.
Assignee: general → nobody
(In reply to Nicholas Nethercote [:njn] from comment #25)
> I think this bug should be closed. The primary problem (type inference
> memory usage) has been fixed.

Agreed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: