Closed Bug 795985 Opened 12 years ago Closed 12 years ago

Optimize gecko to run unmodified peacekeeper benchmark in 256MB RAM

Categories

(Core :: General, defect)

18 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
blocking-basecamp -

People

(Reporter: jsmith, Unassigned)

References

Details

(Whiteboard: [MemShrink:P3])

Attachments

(2 files)

Steps:

1. Go to the browser
2. Go to http://peacekeeper.futuremark.com/run.action

Expected:

The peacekeeper benchmark should finish while running on wifi with no errors.

Actual:

When you get to the first WebGL test (it shows a bubble with rocks going back and forth), it fails to load saying that the page is not responding.
Noming to block mainly because this is a web-based phone and we're likely to be measured on how well we do on these common tests - it doesn't look good if we can't finish a particular test.
blocking-basecamp: --- → ?
Trying to run this on an Otoro causes the browser to crash at some point during the tests.  I can't really identify where, because the tests don't change the URL or otherwise display anything that's visible on the display (other than the gfx results) while they're running.
APITrace? Else we'll probably need to rip it apart manually.
Attached file Logcat
Attached a logcat with this reproducing. Nothing really stands out as obvious in the logcat though.
(In reply to Jeff Gilbert [:jgilbert] from comment #4)
> APITrace? Else we'll probably need to rip it apart manually.

Can you explain how I could get an APITrace?
(In reply to Jason Smith [:jsmith] from comment #6)
> (In reply to Jeff Gilbert [:jgilbert] from comment #4)
> > APITrace? Else we'll probably need to rip it apart manually.
> 
> Can you explain how I could get an APITrace?

I am not that familiar with running APTTrace on mobile devices. Maybe BenWa or Vlad knows more?
Let's block on figuring out why this fails and if it indicates a larger problem.
blocking-basecamp: ? → +
Summary: Cannot complete the peacekeeper benchmark test - fails to load the webgl test on FF OS → Determine why peacekeeper benchmark fails
I'll grab this.
Assignee: nobody → vladimir
So, WebGL is failing in general in many cases and causing the browser to crash.  The cube on get.webgl.com works; WebGL Aquarium causes a crash.

My current guess is that an OOM condition is caused by WebGL usage, which ends up taking down the browser.  Demos such as:

https://www.khronos.org/registry/webgl/sdk/demos/google/nvidia-vertex-buffer-object/index.html

also cause a crash. (ignore the "bound vertex attribute buffers do not have sufficient size for given indices from the bound element array" errors in that, that's a separate issue)
Chris, is this benchmark a priority from a product standpoint?
blocking-basecamp: + → ?
Flags: needinfo?(clee)
(In reply to Andrew Overholt [:overholt] from comment #11)
> Chris, is this benchmark a priority from a product standpoint?

I think I'm less concerned right now about the benchmark issue, but more concerned about comment 10. I'd be inclined to finish off the analysis in comment 10, as Vlad's comments imply that there's quite a problem in the webgl world right now that we should break off into bugs upon completing the analysis here. If we finish off the analysis in comment 10, then I think that's sufficient to close this bug. But I would caution not doing the analysis based on what I'm hearing comment 10.
Yep, I'm still on the case here, but got detoured by some other work.  It's going to be a little tricky to figure out if this is indeed the case though, without doing some extensive memory logging.

Do we have anyone who knows how to do fairly low level debugging on these devices (otoro or unagi).  Something is causing a reboot -- I'd like to be able to set a breakpoint in the kernel at that point and just figure out how we got there.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #13)
> Yep, I'm still on the case here, but got detoured by some other work.  It's
> going to be a little tricky to figure out if this is indeed the case though,
> without doing some extensive memory logging.
> 
> Do we have anyone who knows how to do fairly low level debugging on these
> devices (otoro or unagi).  Something is causing a reboot -- I'd like to be
> able to set a breakpoint in the kernel at that point and just figure out how
> we got there.

cjones, dhylands, likely mwu, probably jlebar
vlad, reboot sounds like a new symptom.  Can you STR it?
I think reboot may have been something else.

Peacekeeper runs fine on Unagi, so I think I'm going to close this as WORKSFORME.  I suspect Otoro is just running out of memory.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
How much USS does "Browser" end up taking on unagi?

Btw, this doesn't fail on unagi because we purposely shipped a misconfigured kernel.  When the first FOTA update goes out, we'll ship the properly-configured kernel.
To be clearer, the otoro actually has 256MB of physical RAM.  The unagi has 512MB of physical RAM.

However, the configuration we care about is 256MB.  There's a kernel update we didn't ship for the dogfooders which configures the unagi to pretend to have 256MB.  We'll be installing that as soon as we can roll out FOTA updates.

I'd like to know how much memory the "Browser" uses, because we have a substantial amount of memory win about to land.  We may be able to load this benchmark after those.  If it uses more than 110MB USS though, we have pretty much no hope of loading it.
So, watching Browser on Unagi while it's running, I see jumps like this (only showing lines where either vsize or rss changes drastically):

   VSIZE  RSS
   156540 72644 
   773760 83240 
   153084 66860 
   211424 79732 
   242160 111664 
   230064 98524 
   138684 50180 
   259884 64624
   332848 69976
   296760 73328
   197732 75956

I think either that 700mb vsize or the 110mb rss is around when webgl was running; unfortunately the test prints no progress so it's hard to tell.  However, Browser sites at around 80-90MB RSS for most of the run, so it's very close to the limit you mention.
Spikes up to 110MB RSS are going to be a challenge, but this makes for a good test case :).
Blocks: slim-fast
Status: RESOLVED → REOPENED
Flags: needinfo?(clee)
Resolution: WORKSFORME → ---
Summary: Determine why peacekeeper benchmark fails → Get peacekeeper benchmark running in 256MB RAM
Ha, nice try, buddy! ;)
Assignee: vladimir → nobody
Component: Canvas: WebGL → General
Whiteboard: [MemShrink]
Vlad, do you think we should block the release on this?
Flags: needinfo?(vladimir)
Unless we have some reason to believe that it's even /possible/ to run this benchmark within however much ram we have available, I don't think we can sanely block on this.
Tough to say if we can or can't until we figure out what's actually using that memory.  I don't know that we have good tools to do that quickly; about:memory might help, if we could get a dump of it every few seconds to see what's going on.  One useful thing to do would be to figure out why it's exiting, that is which allocation is failing with 256mb.
Flags: needinfo?(vladimir)
> about:memory might help, if we could get a dump of it every few seconds to see what's 
> going on.

Sure, run $B2G_ROOT/tools/get_about_memory.py in a loop.
Just realized the bug title is ambiguous, fixed.  (I don't care about peacekeeper itself at all, it's just "fat code".)
Summary: Get peacekeeper benchmark running in 256MB RAM → Optimize gecko to run unmodified peacekeeper benchmark in 256MB RAM
If you don't care about Peacekeeper (I certainly don't), can we pick something we do care about as an example of "fat code" to optimize for?  Like Cut the Rope, or a WebGL game, or something?

I'd rather optimize for something we care about and be pleasantly surprised if we change something so Peacekeeper fits in 256mb of RAM than optimize for this benchmark and hope that it will translate to a page/app we care about.
All I want to do is see if there's something dumb that peacekeeper triggers that we can optimize.

Totally agreed it's way down on the list.
It doesn't sound like this is a terribly high priority for anyone, so I'm moving this to blocking-.

Justin offered to help anyone who is interested in doing measurement with the various tools that we have. But beyond that his plan was to run tests on actual apps instead which I agree is a higher priority.
blocking-basecamp: ? → -
Whiteboard: [MemShrink] → [MemShrink:P3]
With the fix from bug 798491, we make it into the WebGL tests.  I see us dying from a null pointer deref in

F/libc    ( 1174): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
E/OMXCodec( 1174): Attempting to allocate OMX node 'OMX.google.avc.decoder'
E/GeckoConsole( 1174): [JavaScript Warning: "Media resource http://peacekeeper.futuremark.com/resources/videos/riverfly01/riverfly01.mp4 could not be decoded." {file: "http://peacekeeper.futuremark.com/runTest.action" line: 0}]
E/GeckoConsole(  106): [JavaScript Warning: "Media resource http://peacekeeper.futuremark.com/resources/videos/riverfly01/riverfly01.mp4 could not be decoded." {file: "http://peacekeeper.futuremark.com/runTest.action" line: 0}]
D/memalloc(  106): /dev/pmem: Allocated buffer base:0x4a500000 size:348160 offset:2461696 fd:92
D/memalloc( 1174): /dev/pmem: Mapped buffer base:0x46000000 size:2809856 offset:2461696 fd:36
I/DEBUG   (  109): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (  109): Build fingerprint: 'toro/full_otoro/otoro:4.0.4.0.4.0.4/OPENMASTER/eng.cjones.20121026.170606:user/test-keys'
I/DEBUG   (  109): pid: 1174, tid: 5258  >>> /system/b2g/plugin-container <<<
I/DEBUG   (  109): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
I/DEBUG   (  109):  r0 fffffffc  r1 425e4c29  r2 425e5040  r3 00000000
I/DEBUG   (  109):  r4 43ae98a0  r5 43c2bb84  r6 43c578e0  r7 00000000
I/DEBUG   (  109):  r8 00000000  r9 4382fd30  10 4106ff71  fp 4106ff6a
I/DEBUG   (  109):  ip 400317b4  sp 4382fc98  lr 425e4c11  pc 00000000  cpsr 00000010

It's certainly possible this is a failed allocation, but the last two 4Hz samples of memory usage at the crash were

Browser          app_0     1174  106   120860 68688 ffffffff 400e6594 R /system/b2g/plugin-container
Browser          app_0     1174  106   121008 64468 ffffffff 400e8330 S /system/b2g/plugin-container

so we're certainly not in memory pressure.  This looks more like one of the decoder bugs.
Attached image Victoire
With bug 810719 flipped on on beta, I finish the test and have these processes still running afterwards

$ adb shell b2g-ps
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      105   1     176672 57740 ffffffff 400c9330 S /system/b2g/b2g
FM Radio         app_0     484   105   59732  11744 ffffffff 400db330 S /system/b2g/plugin-container
Clock            app_0     557   105   62812  13136 ffffffff 4004f330 S /system/b2g/plugin-container
Calculator       app_0     582   105   59672  10792 ffffffff 400ec330 S /system/b2g/plugin-container
Feedback         app_0     599   105   58712  11168 ffffffff 400ad330 S /system/b2g/plugin-container
Cost Control     app_0     625   105   60760  14584 ffffffff 40062330 S /system/b2g/plugin-container
(Preallocated a  app_0     652   105   55508  8180  ffffffff 40097330 S /system/b2g/plugin-container
Browser          app_0     667   105   92816  38416 ffffffff 4002f330 S /system/b2g/plugin-container

I don't know whether 128 is a good score on this HW.  (Although, the screen turned off several times during the test.)  But now we can go find out :).
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: