898556 - Triple-check GC tuning on FxOS (was Firefox OS Octane regression on nightly Nexus 4)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Description

•

11 years ago

On a Nexus 4, Octane scores:

Android:      Firefox 22: 1616
            Nightly (25): 2163

Firefox OS: Nightly (25): 1125

this seems like something just flat-out bad is going on; I would expect Octane numbers to be identical between Android and Firefox OS.

(There's a nexus 4 build config ("mako") as part of stock b2g now.)

Here are the individual test numbers, both for nightly; Android first, FxOs second

Richards:    3703  3834
Deltablue:   1958   874 (!)
Crypto:      3150  2756
Raytrace:    2367   542 (!)
EarleyBoyer: 2669  1020 (!)
Regexp:       362   286
Splay:       1874  1076 (!)
NavierStokes:5634  5617
pdf.js:      1306   731 (!)
Mandreel:    2051  2133
GB Emu:      3701  1794 (!)
CodeLoad:    2841  2126 (!)
Box2DWeb:    1458  1650

It looks like there's something just going seriously bad with JS perf when built in a b2g config.

Boris Zbarsky [:bzbarsky]

Comment 1

•

11 years ago

Well, so for a start, the GC prefs are different on FxOS and Fennec.  Specifically:

1)  Fennec has pref("javascript.options.gc_on_memory_pressure", false).
2)  FxOS changes the mem.gc_incremental_slice_ms from 10ms to 30ms.
3)  FxOS changes various GC frequency parameters, which can change GC timing.
4)  FxOS sets a high water mark of 6MB, whereas Fennec uses 32 (or maybe 16 on low-memory
    devices).
5)  FxOS sets the allocation threshold to 1MB (the default value, as used on Fennec, is
    20MB... except in said low-memory device config where it's 3MB.

So my money is on FxOS doing a lot more GC during benchmarks...

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 2

•

11 years ago

Let me update prefs and see what happens!

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 3

•

11 years ago

Okay, definitely GC.  If I take the low memory settings from Fennec (with a few changes, since I think I screwed up; 32mb high water mark instead of 16mb and 3mb alloc threshold), the overall result drops to 635. Some benchmarks like Deltablue drop down to 129.

Here's the numbers again, with the third column being FxOS with same js gc prefs as Firefox for Android; mostly inline, with 2 significant drops and one surprising win.

Overall:     2163  1125  2109
Richards:    3703  3834  3689
Deltablue:   1958   874  1831
Crypto:      3150  2756  2908
Raytrace:    2367   542  1945 (still significant drop)
EarleyBoyer: 2669  1020  3062
Regexp:       362   286   389
Splay:       1874  1076  1259 (still significant drop)
NavierStokes:5634  5617  5606
pdf.js:      1306   731  1366
Mandreel:    2051  2133  2141
GB Emu:      3701  1794  3525
CodeLoad:    2841  2126  2751
Box2DWeb:    1458  1650  1792 (a little surprising win?)

Not sure where this leaves us for B2G.

Summary: Firefox OS Octane regression on nightly Nexus 4 → Triple-check GC tuning on FxOS (was Firefox OS Octane regression on nightly Nexus 4)

Boris Zbarsky [:bzbarsky]

Comment 4

•

11 years ago

One interesting question is whether the tuning should be different for different FxOS processes.  I can maybe see us wanting less aggressive GC in the browser process than elsewhere, maybe.

Nicolas B. Pierron [:nbp]

Comment 5

•

11 years ago

The GC settings have been tuned on a Unagi, which as far as I know, are closer to our target market. (Bug 863398)  The change made at the time are still visible on AWFY [1], can you check if reverting this settings locally improve benchmarks on Nexus 4 ?

One of the constraint on the choice of these GC settings was that we at least got a chance to GC before we reach the end of the memory, as Bug 863398 comment 21 details, this limit is low, because of our target market.

[1] http://arewefastyet.com/#machine=14

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #3)
> Raytrace:    2367   542  1945 (still significant drop)
> Box2DWeb:    1458  1650  1792 (a little surprising win?)

These can be a shift in the scheduling of GCs.  This happens a lot on the unagi runnning AWFY.  Have a look at the breakdown of octane.

> Splay:       1874  1076  1259 (still significant drop)

This benchmark had a lot of GC noise on its own when GCs were disabled.  Can you double check this one on both platforms.

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Whiteboard: [c= ]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 6

•

11 years ago

Cc'ing a few folks for visibility -- there's a lot of talk about Octane in marketing, and right now running Octane on FxOS will give a significantly worse result than Android no matter how low or high end the device is.

nbp, reverting the settings on FxOS did make a difference -- the perf recovered, that's what the third column of numbers is in comment #3

Justin Lebar (not reading bugmail)

Comment 7

•

11 years ago

The Nexus 4 has 2gb of RAM.  The most powerful device we're shipping B2G on today has 512mb of RAM.

I don't think it's a useful exercise to tune B2G to run on a device with 4x as much memory as the highest-end device we're shipping on.  Can we instead compare B2G to Android on 256mb and 512mb devices and tune there?

Andrew McCreight [:mccr8]

Comment 8

•

11 years ago

For comparison, we don't even support Android Firefox with less than 384mb, 50% more than the FxOS minimum.  Furthermore, on FxOS with 256mb, we have like 100mb available for apps, so adding another 128mb is really more than doubling the amount of available RAM for the benchmark.  (I'm not sure how much memory is available to FxOS on Android.)

Luke Wagner [:luke]

Comment 9

•

11 years ago

Can we set the GC params based on the size of the device's memory?  Doing this in an ad hoc manner sounds bad; are there more cases like this where we'd like to set system parameters based on device resources?  (Bug 892097 is a recent example I've seen although not compelling enough on its own.)

Justin Lebar (not reading bugmail)

Comment 10

•

11 years ago

> Can we set the GC params based on the size of the device's memory?

I don't want to beg the question as to whether or not that's necessary, but I don't think there's anything standing in our way from doing this.  But again, I think we should be focusing on device configurations that we actually ship on, so the question should be whether 256mb B2G devices need different params from 512mb B2G devices, noting that those may need different params from 256mb/512mb Android.

wrt comparing between Android and B2G, note also that B2G has multiple Gecko processes running at the same time, and this necessitates being more conservative with the max allowable JS heap size, on a per-process basis.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 11

•

11 years ago

I'm not suggesting that we tune for the Nexus 4, or any other high-end device.  I was going to make a suggestion similar to Luke's.

I think the core issue is this: right now, if you compare FxOS to Firefox on Android on Octane, on identical/similar devices, FxOS will look significantly worse.  Many of our partners are doing exactly this, and they're looking at next gen FxOS devices.

Justin Lebar (not reading bugmail)

Comment 12

•

11 years ago

Okay, it sounds like we're all in agreement.  Just one more piece of clarification:

> I would expect Octane numbers to be identical between Android and Firefox OS.

I'm not sure we should expect this.  I don't know whether Android or the FFOS browser has more RAM available to it, on a device with Xmb of RAM, but that could reasonably affect the scores or how we tune the GC.

Mike Lee [:mlee]

Updated

•

11 years ago

Whiteboard: [c= ] → [MemShrink] [c= ]

Jet Villegas (inactive)

Comment 13

•

11 years ago

Our boot image for the Nexus 4 throttles the device down ( Nexus4-HW-Shrink-Helix-like-boot.img )

  1. CPU Cores: 2
  2. CPU Freq.: 1G HZ
  3. GPU Freq.: 320M HZ
  4. Framebuffer: 480 * 800
  5. Memory: 512 MB

Original Nexus 4 Spec:
  1. CPU Cores: 4
  2. CPU Freq.: 1.7G HZ
  3. GPU Freq.: 400M HZ
  4. Framebuffer: 720 * 1280
  5. Memory: 2 GB

This doesn't appear to be a Memshrink issue.

Whiteboard: [MemShrink] [c= ] → [c= ]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 14

•

10 years ago

Bumping this to ? again -- we're shipping 1.4 on newer/faster/more modern devices, and the GC tuning can make a huge difference in performance.  Let's not be shooting ourselves in the foot.

blocking-b2g: --- → 1.4?

Ben Kelly [:bkelly, not reviewing]

Comment 15

•

10 years ago

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #14)
> Bumping this to ? again -- we're shipping 1.4 on newer/faster/more modern
> devices, and the GC tuning can make a huge difference in performance.  Let's
> not be shooting ourselves in the foot.

Does this accurately describe tarako?  That seems like a more constrained device.  I wonder if we need multiple sets of tuning parameters somehow based on the hardware.

Nicolas B. Pierron [:nbp]

Comment 16

•

10 years ago

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #14)
> Bumping this to ? again -- we're shipping 1.4 on newer/faster/more modern
> devices, and the GC tuning can make a huge difference in performance.  Let's
> not be shooting ourselves in the foot.

I already added a function which currently distinguish between desktop/mobile such as we can emulate the GC settings of a phone on the JS shell, this is just a matter of plumbing to add a preference which set the GC settings based on the memory available on the device.

(In reply to Ben Kelly [:bkelly] from comment #15)
> I wonder if we need multiple sets of tuning parameters somehow
> based on the hardware.

As I discussed with Vlad, this is not the ideal solution, but this is an easy target as soon as we can make a good benchmark for looking at GC pauses.  The benchmark I made for tweaking the results for the Unagi was extremely noisy and this made everything hard to tune.

I am sure there is more variable in the problem than the memory available on the device, such as the memory latency, but we should gather way more devices before investigating these other variables.

Assignee: general → nicolas.b.pierron

Flags: needinfo?(nicolas.b.pierron)

Dave Huseby [:huseby]

Comment 17

•

10 years ago

Do we need this for the tarako memory work?  It seems like there is work here to use the benchmark for the Unagi tweaking on the Tarako for the same purpose.

Flags: needinfo?(jcheng)

Flags: needinfo?(bkelly)

Ben Kelly [:bkelly, not reviewing]

Comment 18

•

10 years ago

I imagine re-evaluating our tuning for a device with half the memory of our other devices would be a good idea.

I think the open question, though, is how to support tarako and the larger devices with a single code base.  If I understand correctly Nicolas is working on this.  He'll need to indicate if its possible in the short term for tarako.

Flags: needinfo?(bkelly)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 19

•

10 years ago

The GC parameters really need to be tuned per-device (or at least tuned separately for each amount of memory) for best performance.

Nicolas B. Pierron [:nbp]

Comment 20

•

10 years ago

(In reply to Dave Huseby [:huseby] from comment #17)
> Do we need this for the tarako memory work?  It seems like there is work
> here to use the benchmark for the Unagi tweaking on the Tarako for the same
> purpose.

All I need for tuning for a device is:
 - A device.
 - Good benchmarks.
 - A months of benchmarking (to plot this 7 dimension space)
 - A week to understand and refine the previous search.

At the moment, I have none of these.

Ben Kelly [:bkelly, not reviewing]

Comment 21

•

10 years ago

Mike, what can we do to get Nicolas a tarako device?

Flags: needinfo?(mlee)

Ben Kelly [:bkelly, not reviewing]

Comment 22

•

10 years ago

Nicolas, what benchmarks did you use when tuning for the unagi?  Are those not applicable for other devices?  Maybe I'm not sure what you mean by "benchmark".

Nicolas B. Pierron [:nbp]

Comment 23

•

10 years ago

I used Octane, and a modified version of the incremental GC test made by bill which I called snappy[1].  Snappy was not really useful as it was extremely noisy, and recently Octane is becoming noisy too.  These benchmark should be refined to ensure that there is as little noise as possible.

[1] http://people.mozilla.org/~npierron/snappy-bench/

Mike Lee [:mlee]

Comment 24

•

10 years ago

Joe, can we get Nicolas a Tarako device?

Status: NEW → ASSIGNED

Flags: needinfo?(mlee)

Whiteboard: [c= ] → [c=benchmark p= s= u=]

Joe Cheng [:jcheng] (please needinfo)

Comment 25

•

10 years ago

1.3T? to discuss this during Tarako triage

blocking-b2g: 1.4? → 1.3T?

Flags: needinfo?(jcheng)

Joe Cheng [:jcheng] (please needinfo)

Comment 26

•

10 years ago

triage: not going to be in time for tarako timeframe. minus

blocking-b2g: 1.3T? → -

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Updated

•

10 years ago

Priority: -- → P3

Nicolas B. Pierron [:nbp]

Updated

•

9 years ago

Depends on: 1216286

Nicolas B. Pierron [:nbp]

Updated

•

9 years ago

Assignee: nicolas.b.pierron → nobody

Status: ASSIGNED → NEW

Component: JavaScript Engine → JavaScript: GC

Nicolas B. Pierron [:nbp]

Updated

•

9 years ago

Flags: needinfo?(nicolas.b.pierron)

Jon Coppeard (:jonco)

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID