Closed Bug 1042305 Opened 6 years ago Closed 6 years ago

Fishietank does not run on 512MB FFOS v2.0

Categories

(Core :: Graphics, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

VERIFIED FIXED
mozilla34
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v1.3T --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- verified
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: tkundu, Assigned: sotaro)

References

(Depends on 1 open bug, )

Details

(Keywords: regression, Whiteboard: [c=regression p= s= u=2.0] [caf priority: p2][CR 695311] [2.1-flame-test-run-1])

Attachments

(5 files)

It is observed that FishIETank Benchmark does not give any result. Once the URL :: http://ie.microsoft.com/testdrive/Performance/FishIETank/Default.html is typed into the default browser address bar, and enter key is hit, no score is displayed. 
The benchmark fails to run.

STEPS:: 

1.) Open Default Browser.
2.) Enter URL :: http://ie.microsoft.com/testdrive/Performance/FishIETank/Default.html

Expected Result : Benchmark should run and Result in FPS should be displayed.

Actual Result:: Benchmark Does not run.

It is still working in v1.4 gaia/gecko (512MB build config) 

But it is not working in v2.0 gaia/gecko (512MB build config): 

V2.0 gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=5f8b1b8a2da9e3b531eee817a669f57fa4d9b9c6

V2.0 gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=e00f7e464333689fcf54edb4945ece94f97f930b
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Whiteboard: [CR 695311]
Whiteboard: [CR 695311] → [caf priority: p2][CR 695311]
blocking-b2g: 2.0? → 2.0+
I'm going to push this into Performance first given that it's a bug with a benchmark. The window though will likely help us identify root cause, so we'll want to move the component here again after finding the window.
Component: General → Performance
No longer blocks: CAF-v2.0-FC-metabug
Tapas, I am trying to understand what this bug looks like when it reproduces.  I have looked at builds from 1.4, 2.0, 2.1 Flame.  Is there any way you could provide more details concerning your Actual Results?  If possible, a video of the bug would be very helpful.  Thanks!
Flags: needinfo?(tkundu)
QA Contact: ddixon
ni myself to check the regression window later
Flags: needinfo?(pchang)
Unable to provide Regression Window. 

Issue DOES occur in earliest Flame Master build we have access to. 

Environmental Variables:
Device: Flame Master
Build ID: 20140417000006
Gaia: 7591e9dc782ac2e97d63a96f9deb71c7b3588328
Gecko: e71ed4135461
Version: 31.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:31.0) Gecko/31.0 Firefox/31.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Duane, can you clarify comment #5 ? Can you confirm the branches/devices this is reproducible on.

Also :NI on :mlee to get started here as its a benchmark bug.
Flags: needinfo?(mlee)
Flags: needinfo?(ddixon)
(In reply to Duane Dixon [:ddixon] from comment #3)
> Tapas, I am trying to understand what this bug looks like when it
> reproduces.  I have looked at builds from 1.4, 2.0, 2.1 Flame.  Is there any
> way you could provide more details concerning your Actual Results?  If
> possible, a video of the bug would be very helpful.  Thanks!

Here is the video : https://drive.google.com/file/d/0B1cSMS8_GuAEc0V5d3BxNjIwUWc/edit?usp=sharing
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #7)
> (In reply to Duane Dixon [:ddixon] from comment #3)
> > Tapas, I am trying to understand what this bug looks like when it
> > reproduces.  I have looked at builds from 1.4, 2.0, 2.1 Flame.  Is there any
> > way you could provide more details concerning your Actual Results?  If
> > possible, a video of the bug would be very helpful.  Thanks!
> 
> Here is the video :
> https://drive.google.com/file/d/0B1cSMS8_GuAEc0V5d3BxNjIwUWc/edit?usp=sharing

Please note that this video is collected for v2.0 .
(In reply to bhavana bajaj [:bajaj] from comment #6)
> Duane, can you clarify comment #5 ? Can you confirm the branches/devices
> this is reproducible on.
> 
> Also :NI on :mlee to get started here as its a benchmark bug.


I was able to reproduce this issue with the same results seen in video: 
https://drive.google.com/file/d/0B1cSMS8_GuAEc0V5d3BxNjIwUWc/edit?usp=sharing

Branch Check:

Issue DOES reproduce on Flame 2.1, 2.0, 1.4 builds, and v122 Base image. 
Issue also reproduces on 2.1 Buri build.  

----------------------------------------
Flame 1.4 Repro? No

Environmental Variables:
Device: Flame 1.4
Build ID: 20140728085914
Gaia: eb3b185325901d4c04e2d43eb58d90835213bea9
Gecko: 8c8883bb5797
Version: 30.0 (1.4)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

-------------------------------------------

Flame 2.0 Repro? Yes

Device: Flame 2.0
Build ID: 20140728104213
Gaia: 90a5a08192586f3e91852053e76fbb8321273f9d
Gecko: 9d7b4ff43468
Version: 32.0 (2.0)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

--------------------------------------------

Flame 2.1 Repro? Yes

Device: Flame Master
Build ID: 20140728123640
Gaia: fadfafa17f5175203b8b9457bfb95e5816f54f58
Gecko: 75fe3b8f592c
Version: 34.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

---------------------------------------------

Flame v122 Base Image  Repro? Yes

Device: Flame 1.3
Build ID: 20140616171114
Gaia: e1b7152715072d27e0880cdc6b637f82fa42bf4e
Gecko: e181a36ebafaa24e5390db9f597313406edfc794
Version: 28.0 (1.3)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0

----------------------------------------------

Buri 2.1 Repro? Yes

Device: Buri Master
Build ID: 20140728123640
Gaia: fadfafa17f5175203b8b9457bfb95e5816f54f58
Gecko: 75fe3b8f592c
Version: 34.0a1 (Master)
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Flags: needinfo?(ddixon)
Triage: Hi Naveed, is it possible that someone on the JS team can look into this issue? Thanks!
Flags: needinfo?(nihsanullah)
Whiteboard: [caf priority: p2][CR 695311] → [caf priority: p2][CR 695311][c=memory p= s= u=2.0]
NI some folks with b2g/ARM hardware. It'd be good to know if this is an OOM or another issue in the JS engine/JITs.
Flags: needinfo?(nihsanullah)
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(mrosenberg)
Flags: needinfo?(dtc-moz)
I tried the following to reproduce this:

Flash Flame to recent build of B2G 2.1.
Limit RAM to 512MB with fastboot oem mem command
Load page in browser.

Results:

The initial screen is drawn incorrectly as in the video and no benchmark results are shown, although fish sometimes appear and disappear from the tank.
However, putting the phone into landscape mode causes the screen to be redrawn correctly.  Rotating back to portrait gets a correct display in this orientation too and after a couple of seconds a score appears.

I don't see anything to indicate this is a JS problem so far.
With a flame v2.1 running with 512mb,

I think this is an OOM.  I am able to get the benchmarks run either (1) after loading the page, quickly zooming on the FPS counter (only the FPS counter) while textures are being loaded.  The JS is still being executed, which means that there is no logical fault involved in the way the benchmark is run by the JavaScript VM; (2) after loading the page, moving the phone to landscape and to portait mode.  The benchmark run fine with the 20 fishes.

I think this is a OOM which is properly handled because (as seen in the video and reproduced on the phone) the animation is restarted.  A few rare times the OOM is not properly handled and an error page is rendered instead.

Zooming out while the benchmark believes to be at 101x140 show 28FPS on the flame with 512mb.  And the benchmarks runs perfectly.  This might be a bug, because when we are zoomed on the FPS counter the fishes are not rendered, and we would expect to only have the FPS of the JavaScript and no overhead from the WebGL context.

101x140: 9 FPS (FPS counter visible)
101x140: 28 FPS (zoomed out, while the benchmark still render at 101x140)
504x701: 13 FPS
685x952: 13 FPS
980x1362: (*)above 7 FPS  (after switching from landscape and back to portait)

Note: Why the hell do we have a screen of 980x1362 on a phone ?

(*) This is just a terrible benchmark which does not output any stable nor deterministic results, why are we still using this benchmark?
Flags: needinfo?(nicolas.b.pierron)
I see the same problem with a flame v2.1 with 1gb.
The same work-around are working too.

So this might not be an OOM after all, but just the fact that the resizing of the windows works around the initialization bug.
I do not see anything alarming in this memory report, this only werid thing is that the Addon-SDK toolkit/loader.js is loaded even if I haven't enabled/used the devtools on the phone.
Flags: needinfo?(mlee)
Not sure if it's related but debug builds give an assertion failure on both Firefox OS and Firefox for Android:

Firefox OS:
F/MOZ_Assert( 1193): Assertion failure: !isStackAddress (Please don't pass stack arrays to the GL. Consider using HeapCopyOfStackArray. See bug 1005658.), at gfx/gl/GLContext.cpp:1818

Android:
F/MOZ_Assert(12080): Assertion failure: !isStackAddress (Please don't pass stack arrays to the GL. Consider using HeapCopyOfStackArray. See bug 1005658.), at gfx/gl/GLContext.cpp:1818

Might be an issue in GrGpuGL::uploadTexData().

(gdb) back
#0  0xb4f85242 in AssertNotPassingStackBufferToTheGL (ptr=<optimized out>) at gfx/gl/GLContext.cpp:1815
#1  mozilla::gl::GLContext::AssertNotPassingStackBufferToTheGL (ptr=<optimized out>) at gfx/gl/GLContext.cpp:1791
#2  0xb4f85514 in raw_fTexImage2D (pixels=0xbee1ccb8, type=5121, format=32993, border=0, height=313, width=3, internalformat=<optimized out>, level=0, target=3553, this=0xb1e52800)
    at gfx/gl/GLContext.h:1560
#3  mozilla::gl::GLContext::fTexImage2D (this=0xb1e52800, target=3553, level=0, internalformat=<optimized out>, width=3, height=313, border=0, format=32993, type=5121, pixels=0xbee1ccb8)
    at gfx/gl/GLContext.h:1576
#4  0xb4f951b4 in glTexImage2D_mozilla (target=3553, level=0, internalformat=32993, width=<optimized out>, height=313, border=0, format=32993, type=5121, pixels=0xbee1ccb8)
    at gfx/gl/SkiaGLGlue.cpp:460
#5  0xb5e57db4 in GrGpuGL::uploadTexData (this=0xb1e54800, desc=..., isNewTexture=<optimized out>, left=0, top=0, width=Cannot access memory at address 0x0
) at gfx/skia/trunk/src/gpu/gl/GrGpuGL.cpp:665
#6  0xb5e5bec8 in GrGpuGL::onCreateTexture (this=0xb1e54800, desc=..., srcData=0xae7bdd78, rowBytes=19648) at gfx/skia/trunk/src/gpu/gl/GrGpuGL.cpp:961
#7  0xb5e260e8 in GrGpu::createTexture (this=0xb1e54800, desc=..., srcData=0xae7bdd78, rowBytes=19648) at gfx/skia/trunk/src/gpu/GrGpu.cpp:122
#8  0xb5e33ae4 in GrContext::createTexture (this=0xb24a03e0, params=<optimized out>, desc=..., cacheID=..., srcData=0xae7bdd78, rowBytes=19648, cacheKey=0xbee20e44)
    at gfx/skia/trunk/src/gpu/GrContext.cpp:384
#9  0xb5e3bd5a in sk_gr_create_bitmap_texture (origBitmap=..., params=0xbee212d8, cache=<optimized out>, ctx=0xb24a03e0) at gfx/skia/trunk/src/gpu/SkGr.cpp:176
#10 GrLockAndRefCachedBitmapTexture (ctx=0xb24a03e0, bitmap=..., params=0xbee212d8) at gfx/skia/trunk/src/gpu/SkGr.cpp:227
#11 0xb5e3be34 in set (params=0xbee212d8, bitmap=..., device=0xb1e7cd00, this=0xbee20f08) at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:103
#12 SkGpuDevice::SkAutoCachedTexture::SkAutoCachedTexture (this=0xbee20f08, device=0xb1e7cd00, bitmap=..., params=0xbee212d8, texture=0xbee20f04) at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:83
#13 0xb5e4a6cc in SkGpuDevice::internalDrawBitmap (this=0xb1e7cd00, bitmap=..., srcRect=..., params=..., paint=..., flags=SkCanvas::kNone_DrawBitmapRectFlag, bicubic=false)
    at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:1505
#14 0xb5e4ad72 in SkGpuDevice::drawTiledBitmap (this=0xb1e7cd00, bitmap=..., srcRect=..., clippedSrcIRect=<optimized out>, params=..., paint=..., flags=SkCanvas::kNone_DrawBitmapRectFlag, tileSize=1024, 
    bicubic=false) at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:1440
#15 0xb5e4b0da in SkGpuDevice::drawBitmapCommon (this=0xb1e7cd00, draw=<optimized out>, bitmap=..., srcRectPtr=0xbee213f0, dstSizePtr=0xbee213d8, paint=..., flags=SkCanvas::kNone_DrawBitmapRectFlag)
    at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:1370
#16 0xb5e4b25e in SkGpuDevice::drawBitmapRect (this=0xb1e7cd00, origDraw=<optimized out>, bitmap=..., src=<optimized out>, dst=..., paint=..., flags=SkCanvas::kNone_DrawBitmapRectFlag)
    at gfx/skia/trunk/src/gpu/SkGpuDevice.cpp:1707
#17 0xb5dc8968 in SkCanvas::internalDrawBitmapRect (this=0xb1e37500, bitmap=..., src=0xbee21630, dst=..., paint=<optimized out>, flags=SkCanvas::kNone_DrawBitmapRectFlag)
    at gfx/skia/trunk/src/core/SkCanvas.cpp:2078
#18 0xb4f6c3d2 in mozilla::gfx::DrawTargetSkia::DrawSurface (this=0xb29e74c0, aSurface=<optimized out>, aDest=..., aSource=<optimized out>, aSurfOptions=..., aOptions=...)
    at gfx/2d/DrawTargetSkia.cpp:340
#19 0xb5380626 in mozilla::dom::CanvasRenderingContext2D::DrawImage (this=0xb1edb600, image=<optimized out>, sx=<optimized out>, sy=<optimized out>, sw=<optimized out>, sh=<optimized out>, dx=-153.5, dy=
    -156.5, dw=307, dh=313, optional_argc=6 '\006', error=...) at dom/canvas/CanvasRenderingContext2D.cpp:3432
#20 0xb510d538 in DrawImage (error=..., dh=313, dw=<optimized out>, dy=<optimized out>, dx=<optimized out>, sh=<optimized out>, sw=<optimized out>, sy=<optimized out>, sx=<optimized out>, image=..., this=
    0xb1edb600) at ../../dist/include/mozilla/dom/CanvasRenderingContext2D.h:276
Flags: needinfo?(dtc-moz)
Jeff, do you have any clue if the assertion seen in a debug build (comment 16) might be related to the flickering that we see in the video (comment 7) and on the phones?
Flags: needinfo?(jgilbert)
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> Jeff, do you have any clue if the assertion seen in a debug build (comment
> 16) might be related to the flickering that we see in the video (comment 7)
> and on the phones?

Probably unrelated, but maybe not.
Depends on: 1005658
Flags: needinfo?(jgilbert)
It looks like it's a temporary buffer which Skia builds on the stack, at:
http://mxr.mozilla.org/mozilla-central/source/gfx/skia/trunk/src/gpu/gl/GrGpuGL.cpp#552

:snorp or :gw280 would know better if we can change all these to use (scoped) heap buffers.
Flags: needinfo?(gwright)
(In reply to Jeff Gilbert [:jgilbert] from comment #18)
> (In reply to Nicolas B. Pierron [:nbp] from comment #17)
> > Jeff, do you have any clue if the assertion seen in a debug build (comment
> > 16) might be related to the flickering that we see in the video (comment 7)
> > and on the phones?
> 
> Probably unrelated, but maybe not.

That is, it's probably worth testing.
Whiteboard: [caf priority: p2][CR 695311][c=memory p= s= u=2.0] → [c=regression p= s= u=2.0] [caf priority: p2][CR 695311]
Flags: needinfo?(pchang)
ni? Thinker for performance bug.
Flags: needinfo?(tlee)
The Skia code in question is an allocator that allocates on the stack if below a certain threshold (128*128 in this case). So we will see a bunch of allocations on the stack for small stuff.

We can probably switch it to use the heap all the time, but I'm not sure what sort of performance implications that will bring.

Snorp - what do you think?
Flags: needinfo?(gwright)
(In reply to George Wright (:gw280) from comment #22)
> The Skia code in question is an allocator that allocates on the stack if
> below a certain threshold (128*128 in this case). So we will see a bunch of
> allocations on the stack for small stuff.
> 
> We can probably switch it to use the heap all the time, but I'm not sure
> what sort of performance implications that will bring.
> 
> Snorp - what do you think?
Flags: needinfo?(snorp)
(In reply to Mike Lee [:mlee] from comment #23)
> (In reply to George Wright (:gw280) from comment #22)
> > The Skia code in question is an allocator that allocates on the stack if
> > below a certain threshold (128*128 in this case). So we will see a bunch of
> > allocations on the stack for small stuff.
> > 
> > We can probably switch it to use the heap all the time, but I'm not sure
> > what sort of performance implications that will bring.
> > 
> > Snorp - what do you think?

I would think one malloc shouldn't kill us when doing texture upload, so I think it's fine.
Flags: needinfo?(snorp)
Flags: needinfo?(tlee)
:gw280, can you please help with a plausible fix here given snorp's response ?
Flags: needinfo?(gwright)
Incidentally, I don't think that this stack/heap issue is anything to do with this actual bug. I suspect that it's an unrelated problem that just happens to be showing up in the logs.
Flags: needinfo?(gwright)
Attached patch heap.patchSplinter Review
So here's a quick and dirty patch to use the heap instead of the stack always for temp storage in GrGL*. I'll test it locally but if someone in QA could also verify it because I'm not sure I have the right devices on me to check.
Tapas -- when you get a chance please try this patch.
Flags: needinfo?(tkundu)
So with the heap allocator I'm still seeing the same results as the video in comment 7, so that's not the cause of the issue. Would like QA to double check for me though as I could have some other bustedness in my setup!
The debug build no longer crashes, but there is still an issue as described in comment 12.
(In reply to George Wright (:gw280) from comment #29)
> So with the heap allocator I'm still seeing the same results as the video in
> comment 7, so that's not the cause of the issue. Would like QA to double
> check for me though as I could have some other bustedness in my setup!

The feedback from Tapas here should be able to verify this for us.
Assignee: nobody → gwright
Inder, you also mentioned this is a 2.0 regression that CAF observed, so leaving the NI on you to help clairfy as discussed on the call.
Just to confirm, the reason the debug build no longer crashes is because we are no longer failing the assert.
I think we're going to need a more accurate regression window, because as far as I'm aware, nothing has gone into Skia or Canvas2D that would account for this regression.
(In reply to George Wright (:gw280) from comment #27)
> Created attachment 8470203 [details] [diff] [review]
> heap.patch
> 
> So here's a quick and dirty patch to use the heap instead of the stack
> always for temp storage in GrGL*. I'll test it locally but if someone in QA
> could also verify it because I'm not sure I have the right devices on me to
> check.

this patch does not solve this issue and we are still hitting same issue .

This isssue is not present in v1.4 and v1.3 . And its a regression in 2.0 .
Flags: needinfo?(tkundu)
I also see "Flame 1.4 Repro? No" in Comment 9 which is confirmed my observation in Comment 35
here is the exact sha1 for v1.4 which can run fishietank benchmark:

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/tag/?h=mozilla/v1.4&id=AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.176
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/tag/?h=mozilla/v1.4&id=AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.176

But I am unable to find any SHA1 in v2.0 which can run fishietank benchmark. Most likely, this regression is in v2.0 from very beginning.
OK, so it was probably the Skia update in Bug 985217 that caused this.
I'm going to start investigating this to gather as much information about it as possible before I'm away on Friday, but is it possible to get a minimal testcase for this? I'm going to see if I can get it to reproduce on Fennec as that will make debugging significantly easier.
Flags: needinfo?(tkundu)
Filed bug 1053492 relating to the heap/stack allocation issue
Flags: in-moztrap?(ychung)
New test case needs to be added. There is no existing test case.
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(ktucker)
(In reply to George Wright (:gw280 PTO Aug 15th - 24th) from comment #39)
> I'm going to start investigating this to gather as much information about it
> as possible before I'm away on Friday, but is it possible to get a minimal
> testcase for this? I'm going to see if I can get it to reproduce on Fennec
> as that will make debugging significantly easier.

Please try STR (Comment 0) on Flame 512MB device.
Flags: needinfo?(tkundu)
Possibly related: I don't see any fish being rendered on FishIE Tank on my HTC One running stock Android (debug build). This is before the Skia rebase landed.

The FPS counter is showing about 8-9fps, but no fish being rendered.
Oh I may have spoken too soon. They're being rendered, just they all happened to be off-screen for a very long time :)
QA Wanted - branch checks need to re-analyzed here, as it's suspected that this should not be reproducing on 1.3.
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage?]
Keywords: qaurgent, qawanted
I performed a re-check on the latest base image. 
Issue DOES reproduce on v123 base image. 

Actual Results: FishIEtank does not fill entire display of the phone and does not run correctly. 

New Video: https://www.youtube.com/watch?edit=vd&v=vjidNth65Lc

Environmental Variables:
Device: Flame 1.3
Build ID: 20140627162151
Gaia: 5c43d012a9351e8aaeacda0b87b0886b7f67d33d
Gecko: e181a36ebafaa24e5390db9f597313406edfc794
Version: 28.0 (1.3)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0
Flags: needinfo?(jmitchell)
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
Keywords: qaurgent, regression
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][2.0-signoff-need+]
Test case added in moztrap:

https://moztrap.mozilla.org/manage/case/14333/
Flags: needinfo?(ktucker)
Flags: in-moztrap?(ychung)
Flags: in-moztrap+
(In reply to Duane Dixon [:ddixon] from comment #46)
> I performed a re-check on the latest base image. 
> Issue DOES reproduce on v123 base image. 
> 
> Actual Results: FishIEtank does not fill entire display of the phone and
> does not run correctly. 
> 
> New Video: https://www.youtube.com/watch?edit=vd&v=vjidNth65Lc
> 
> Environmental Variables:
> Device: Flame 1.3
> Build ID: 20140627162151
> Gaia: 5c43d012a9351e8aaeacda0b87b0886b7f67d33d
> Gecko: e181a36ebafaa24e5390db9f597313406edfc794
> Version: 28.0 (1.3)
> Firmware Version: v123
> User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0

Hey :ddixon, that is not really the issue we are trying to find out. Goal was to check if the URL reported in the screenshot using 1.3 and 2.0 you should:

Actual result : Fishes should be moving and fps counter mush show valid fps

Can you please try this and see if there is any difference you are seeing in 1.3 vs 2.0 ?
Flags: needinfo?(ddixon)
Keywords: qaurgent
Bhavana, yes, there is a difference in behaviour between the 1.3 and 2.0 builds.  

Actual Results for the v123 base image:  
Fish Tank image does not fill display, fish are on screen, and the FPS counter reads between 20-30 frames per second. 

(Flame Device tested with 512 and 319 MB memory.)
 
Device: Flame 1.3
Build ID: 20140627162151
Gaia: 5c43d012a9351e8aaeacda0b87b0886b7f67d33d
Gecko: e181a36ebafaa24e5390db9f597313406edfc794
Version: 28.0 (1.3)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0
----------------------------------------------------------------------------------------------
Actual Results for 2.0 Flame build:
Fish Tank image does not fill display, fish do not stay on screen, FPS counter does not work correctly. 

Device: Flame 2.0
Build ID: 20140815000200
Gaia: 295c7f50707372e5af6d8e83148d2d970076dfd6
Gecko: 879c5208084f
Version: 32.0 (2.0)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need+] → [QAnalyst-Triage?][2.0-signoff-need+]
Flags: needinfo?(ddixon) → needinfo?(pbylenga)
Keywords: qaurgent
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [QAnalyst-Triage+][2.0-signoff-need+]
Flags: needinfo?(pbylenga)
We need a window here focusing on what regressed here, which is involving the fact that the FPS counter failing.
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need+] → [2.0-signoff-need+]
the regressing cset will definitely help here, to further narrow it down keeping comment 37 and 38 in mind which have the suspected landing info.

Also, discussed this with :snorp offline and he will help investigate this.
Do we have any actual apps that are showing a canvas performance regression? FishIE tank is just a demo. I agree that regressing performance there might indicate there is a larger problem, but that alone should not block a release.

I don't have a Flame device, and this seems to be working fine under Android, so I'm not sure what else I can do at this moment. If this reproduces on a hamachi I can try to resurrect that.
So, to be clear, FishIE demo on 512mb devices is a CAF blocker?  This bug contains a lot of issues in it, from no score to asserts and OOMs, etc., it's a bit confusing. Anyway, George is back on Monday, the only thing left is to see if unaccelerated canvas gives us the correct result, I imagine at bad performance.
Changing this to graphics so that we don't lose it.
Component: Performance → Graphics
Product: Firefox OS → Core
QA Whiteboard: [2.0-signoff-need+] → [2.0-signoff-need+] [QAnalyst-Triage?]
Flags: needinfo?(dharris)
Whiteboard: [c=regression p= s= u=2.0] [caf priority: p2][CR 695311] → [c=regression p= s= u=2.0] [caf priority: p2][CR 695311] [2.1-flame-test-run-1]
QA Whiteboard: [2.0-signoff-need+] [QAnalyst-Triage?] → [2.0-signoff-need+] [QAnalyst-Triage+]
Flags: needinfo?(dharris)
QA Whiteboard: [2.0-signoff-need+] [QAnalyst-Triage+] → [2.0-signoff-need+]
Right now, we're operating under the assumption that this is a duplicate of bug 1049195 - where in 1.4 we stopped canvases of certain size from getting created under SkiaGL, instead using Skia software, we have since started creating some of those larger canvases under SkiaGL, which takes more memory, and increases the chance of OOM - but gives us better performance when we don't OOM.  See bug 999841 as well.
Issue DOES NOT occur in latest 2.1 Flame build with 512 MB memory. 

Actual Results: After user enters "FishIETank" web address then zooms-in slightly, the FPS counter reads 1-10 FPS. 

Environmental Variables:
Device: Flame Master
BuildID: 20140820042626
Gaia: df39c463259d348396ef7f143c2c780eeb8f02d8
Gecko: f7e9920fe407
Version: 34.0a1 (Master) 
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
QA Whiteboard: [2.0-signoff-need+] → [QAnalyst-Triage?][2.0-signoff-need+]
Flags: needinfo?(jmitchell)
Flags: needinfo?(mvikram)
- b2g-v1.4:The benchmark is started automatically.
           FPS is 20-30.
           The fishtank does not fill the screen automatically.
           SkiaGL is not used.

- b2g-v2.0:The benchmark is not started automatically.
           But it is started by doing pinch and zoom.
           FPS is lower than 10fps.
           The fishtank does not fill the screen automatically.
           By doing pinch and zoom, it can fill the screen.
           SkiaGL is used.

- B2G-v2.1:The benchmark is not started automatically.
           But it is started by doing pinch and zoom.
           FPS is lower than 10fps.
           The fishtank does not fill the screen automatically.
           By doing pinch and zoom, it can fill the screen.
           SkiaGL is used.
Comment 57 is my checked result on each b2g version on flame.
(In reply to Duane Dixon [:ddixon] from comment #56)
> Issue DOES NOT occur in latest 2.1 Flame build with 512 MB memory. 
> 
> Actual Results: After user enters "FishIETank" web address then zooms-in
> slightly, the FPS counter reads 1-10 FPS. 

From Comment 57, it seems same to v2.0. On b2g-v2.0, fps is shown after zooms-in.
When SkiaGL is disabled, the fishtank shows FPS automatically on v2.0 flame.
I confirmed that attachment 8470203 [details] [diff] [review] did not change the symptom on v2.0 flame.
I got a profile on master flame. Most of the time is spending memcpy().

http://people.mozilla.org/~bgirard/cleopatra/#report=78192d75375501f815a70382f035fdd98349b328
Attached file memcpy call stack
Got call stack of memcpy() during Fishtank running.
Got a profile by disabling SkiaGL. memcpy() occupy only 0.1%.

http://people.mozilla.org/~bgirard/cleopatra/#report=4088992004e42711bc8ffc175f408532599ca6cb
When SkiaGL is disabled, Skia is still used for Canvas rendering. When GPU is not used, code patch becomes different at DrawTargetSkia::OptimizeSourceSurface() and memcpy() does not happen. The related code is the following.

http://mxr.mozilla.org/mozilla-central/source/gfx/2d/DrawTargetSkia.cpp#644
In current canvas implementation with SkigGL, each drawImage() causes two image copies. From SourceSurface to SkBitmap and SkBitmap to GPU. It seems ideal if we could remove SourceSurface to SkBitmap data copy.
George, snorp, is it possible to implement like Comment 66?
Flags: needinfo?(snorp)
Flags: needinfo?(gwright)
(In reply to Sotaro Ikeda [:sotaro] from comment #67)
> George, snorp, is it possible to implement like Comment 66?

I forget how this works, but my recollection is that we need to create a SourceSurfaceSkia here, which holds a SkBitmap, which is responsible for the lifetime of the final texture. It may be possible to just pass ownership of the buffer here to the SourceSurfaceSkia and avoid the memcpy, but I doubt it.

I think what might be happening is that the resulting SourceSurfaceSkia is not being cached. I vaguely remember some changes on b2g about this.
Flags: needinfo?(snorp)
CanvasImageCache is what I think may be going bad here.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #69)
> CanvasImageCache is what I think may be going bad here.

Thanks for the advice! The CanvasImageCache seems not work correctly, the cache does not hit at all. I am going to investigate more about it.
When Element and SourceSourface are added to cache by CanvasImageCache::NotifyDrawImage(), the Element was expired because of hitting cache limit. In b2g, it is set to 10MB by the following code.
 http://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#57 

When I locally increase the cache limit to 30MB, the problem is fixed.
That pref was added for 1.3, I believe. I guess we don't see this problem in 1.4 because for some reason SkiaGL isn't being used? Was that from the size limitation I guess?
On v1.4, SkiaGL is not used if the canvas size is more than display size. Fishtank hits this case. Since b2g-v2.0, it is changed by Bug 999841.
I locally tested "canvas.image.cache.limit" by 10MB, 15MB, 20MB, 30MB. Since 20MB, the Fishietank works correctly.
(In reply to Sotaro Ikeda [:sotaro] from comment #73)
> On v1.4, SkiaGL is not used if the canvas size is more than display size.
> Fishtank hits this case. Since b2g-v2.0, it is changed by Bug 999841.

Alright, makes sense. I think it would be silly to raise the cache limit for this one demo, but maybe there are other reasons to do so?

Separately, we should consider demoting to a software canvas if we detect cache thrashing. That should probably be part of the mythical heuristic described in bug 1042291.
Snorp, CanvasImageCache does not release cached when "memory-pressure" even happens. Is it by design?
Flags: needinfo?(snorp)
Flags: needinfo?(gwright)
(In reply to Sotaro Ikeda [:sotaro] from comment #76)
> Snorp, CanvasImageCache does not release cached when "memory-pressure" even
> happens. Is it by design?

Probably an oversight. CanvasImageCache predates that event, I'm sure.
Flags: needinfo?(snorp)
Thanks, it becomes clear :-)
This is turning into a large change.  I'd like to hear from QC that they would take this kind of a patch - where we increase the size of the cache for canvas images, and introduce the new code for reacting to the memory pressure for that cache, and not automatically uplift this.  Michael, can you get us somebody that can make this call?
Assignee: gwright → sotaro.ikeda.g
Flags: needinfo?(mvines)
(In reply to Milan Sreckovic [:milan] from comment #79)
> This is turning into a large change.  I'd like to hear from QC that they
> would take this kind of a patch - where we increase the size of the cache
> for canvas images, and introduce the new code for reacting to the memory
> pressure for that cache, and not automatically uplift this.  Michael, can
> you get us somebody that can make this call?

Yeah, and this is just a demo. Unless we have a real app that runs into these problems, my opinion is that this should just be punted for 2.0.
Not that simple :) OEMs down the pipeline run this demo as a part of their acceptance criteria.
The main concern is not that Fishietank is broken on v2.0 (although it would be nice for it to work no doubt), but rather that we've broken other parts of the web that worked fine with v1.3 and earlier.  I don't know how we could practically assess the risk of that in the timeframe that we'd need to make a decision though, which to me means favoring picking up a fix.

What do you mean by "not automatically uplift this" Milan?  We'll be able to pick up a patch for testing here if that's what you mean?
Flags: needinfo?(mvines)
Flags: needinfo?(mvikram)
Can you please explain the following:
- If the size of the cache should support 980*480, why isn't the canvas being created using the cache?
- If SkiaGL is not used, it should still work (like v1.3).

The fact that CanvasImageCache is not responding to memory pressure events seems like something that needs to be fixed anyway.
(In reply to Mandyam Vikram from comment #83)
> Can you please explain the following:
> - If the size of the cache should support 980*480, why isn't the canvas
> being created using the cache?

The cache is for the images drawn into the canvas, not the canvas itself. What's happening here is that the total size for the images drawn in FishIE (i.e., the fish, background, etc) is larger than the image cache size (10MB), so we end up creating a new SourceSurfaceSkia each time. This apparently results in a lot of memcpy (and later texture upload).

> - If SkiaGL is not used, it should still work (like v1.3).

The problem here is that SkiaGL *is* used. With a software canvas we don't need to copy the image or upload it as a texture, so performance is better -- not as good as what SkiaGL could do with a larger cache, however.

> 
> The fact that CanvasImageCache is not responding to memory pressure events
> seems like something that needs to be fixed anyway.

Yes, that seems like a separate issue. I doubt it would help this bug, though.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #84)
> (In reply to Mandyam Vikram from comment #83)
...
> > The fact that CanvasImageCache is not responding to memory pressure events
> > seems like something that needs to be fixed anyway.
> 
> Yes, that seems like a separate issue. I doubt it would help this bug,
> though.

It helps indirectly, in the sense that if we had memory pressure event handling, we'd feel better about turning up the cache size to, say 20MB, and the thrashing would go away.
(In reply to Milan Sreckovic [:milan] from comment #85)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #84)
> > (In reply to Mandyam Vikram from comment #83)
> ...
> > > The fact that CanvasImageCache is not responding to memory pressure events
> > > seems like something that needs to be fixed anyway.
> > 
> > Yes, that seems like a separate issue. I doubt it would help this bug,
> > though.
> 
> It helps indirectly, in the sense that if we had memory pressure event
> handling, we'd feel better about turning up the cache size to, say 20MB, and
> the thrashing would go away.

Yup, good point.
(In reply to Milan Sreckovic [:milan] from comment #81)
> Not that simple :) OEMs down the pipeline run this demo as a part of their
> acceptance criteria.

Really? That's terrible. Did they choose this themselves or did we recommend it?
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [2.0-signoff-need+]
Flags: needinfo?(jmitchell)
Attachment #8477675 - Flags: review?(gwright)
George, can you review the patch as soon as possible?
(In reply to Sotaro Ikeda [:sotaro] from comment #89)
> George, can you review the patch as soon as possible?

This bug is b2g-v2.0+. Need to hurry. Thanks!
Blocks: 1049195
Attachment #8477675 - Flags: review?(gwright) → review+
When m-i is re-opened, I am going to check-in soon. m-i is closed by Bug 1058073.
:sotaro,

Can you confirm this landing here fixes the issue and does the patch in comment #93 apply to 2.0 code base as well ? If this works, we can ask CAF to pick it up.
Flags: needinfo?(sotaro.ikeda.g)
It is necessary to create a patch for b2g-v2.0. I am going to create a patch for v2.0.
Flags: needinfo?(sotaro.ikeda.g)
Confirmed that the patch works on v2.0 flame.
https://hg.mozilla.org/mozilla-central/rev/db8e5ad4a738
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Blocks: 1049251
Depends on: 1090309
The issue is verified fixed in Flame 2.0, Flame 2.1, and Flame 2.2:

Environmental Variables:
Device: Flame 2.2 Master (Full flash)(KK)(319mb)
BuildID: 20141029040208
Gaia: 35e87ac4324f0f3abd93dcc70d61c9f37256a0f5
Gecko: 7e3c85754d32
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 36.0a1 (2.2 Master)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0
  
Device: Flame 2.1 (Full flash)(KK)(319mb)
BuildID: 20141029001202
Gaia: eb0aab0f13c78c7ac378ad860e865c4b6eaf669f
Gecko: 318019f80a8e
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 34.0 (2.1)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame 2.0 (Full flash)(KK)(319mb)
BuildID: 20141029000205
Gaia: 9f5b6f025e528fabfcc068782cb9b492cb51a7f9
Gecko: de8cfd54bf93
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 32.0 (2.0)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Result:
The FPS counter is working and giving results when the page is initially loaded.
Status: RESOLVED → VERIFIED
QA Whiteboard: [2.0-signoff-need+] → [QAnalyst-Triage?][2.0-signoff-need+]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [QAnalyst-Triage+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
Flags: needinfo?(marty.rosenberg)
You need to log in before you can comment on or make changes to this bug.