Closed Bug 873378 Opened 6 years ago Closed 6 years ago

Crash Report [@ gfxImageSurface::CopyFrom(gfxImageSurface*) ]

Categories

(Core :: Graphics, defect, critical)

ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla26
Tracking Status
firefox21 --- affected
firefox22 --- affected
firefox23 --- affected
firefox24 --- affected
firefox25 --- affected

People

(Reporter: paul.feher, Assigned: BenWa)

References

()

Details

(Keywords: crash, reproducible, Whiteboard: [native-crash])

Crash Data

Attachments

(4 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-f8c343c1-8e20-41c2-82cc-d079f2130517 .
============================================================= 
Aurora 23.0a2 2013-05-16
LG Nexus 4 (Android 4.2.2)

Steps to reproduce:
1) Go to http://people.mozilla.com/~nhirata/html_tp/accel_bug615597.html
2) Rotate the device several times portrait/landscape

Crash stack:
0 	libc.so 	libc.so@0x18688 	
1 	libxul.so 	gfxImageSurface::CopyFrom 	gfx/thebes/gfxImageSurface.cpp:289
2 	libxul.so 	gfxReusableSurfaceWrapper::GetWritable 	gfxReusableSurfaceWrapper.cpp:67
3 	libxul.so 	mozilla::layers::TextureClientTile::LockImageSurface 	gfx/layers/client/TextureClient.cpp:217
4 	libxul.so 	mozilla::layers::TextureClientTile::LockImageSurface 	gfx/layers/client/TextureClient.cpp:215
5 	libxul.so 	mozilla::layers::BasicTiledLayerBuffer::ValidateTileInternal 	gfx/layers/client/TiledContentClient.cpp:213
6 	libxul.so 	mozilla::FrameLayerBuilder::DrawThebesLayer 	layout/base/FrameLayerBuilder.cpp:3324
7 		@0x6de10ffe 	
8 	libxul.so 	nsRefPtr<mozilla::layers::ImageContainer>::assign_assuming_AddRef 	obj-firefox/dist/include/mozilla/mozalloc.h:225
9 	libmozglue.so 	arena_dalloc 	jemalloc.c:4675
10 	libxul.so 	nsRegion::SetToElements 	nsRegion.cpp:278
11 	libxul.so 	nsRegion::Copy 	nsRegion.cpp:606
12 	libxul.so 	nsRegion::And 	nsRegion.cpp:732
13 	libxul.so 	mozilla::layers::BasicTiledLayerBuffer::ValidateTile 	gfx/layers/client/TiledContentClient.cpp:268
14 	libxul.so 	nsRegion::SetToElements 	nsRegion.cpp:278
15 	libxul.so 	mozilla::layers::TiledLayerBuffer<mozilla::layers::BasicTiledLayerBuffer, mozill 	gfx/layers/TiledLayerBuffer.h:416
Component: General → Graphics
Product: Firefox for Android → Core
Hardware: All → ARM
Whiteboard: [native-crash]
Version: Firefox 23 → 23 Branch
Version: 23 Branch → Trunk
Duplicate of this bug: 884786
100% reproducible visiting this URL in Nightly for Android (06/19)

http://kieranhealy.org/philcites/

--
LG Nexus 4 (Android 4.2.2)
HTC One (Android 4.1)
Since this is reproducible is this worth having an owner?
Flags: needinfo?(joe)
Flags: needinfo?(joe)
*sigh* that is not what I wanted to do, bugzilla.
Flags: needinfo?(milan)
Let me make sure we have a device, then I'll reassign.
Assignee: nobody → milan
Flags: needinfo?(milan)
BenWa, can you take a look at this? I reproduced it on Nexus 4 Kats has using the Comment 2 STR.
Assignee: milan → bgirard
Just a quick update since it's been two weeks. I have an environment and I'm starting to work on this issue.
(In reply to Aaron Train [:aaronmt] from comment #2)
> 100% reproducible visiting this URL in Nightly for Android (06/19)
> 
> http://kieranhealy.org/philcites/
> 
> --
> LG Nexus 4 (Android 4.2.2)
> HTC One (Android 4.1)

I'm seeing a crash with this spammed:
W/Adreno200-GSL(21135): <sharedmem_gpumem_alloc:991>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory

Looks like I'll have to debug that first.
Some progress on the real bug now that Comment 8 went away for now.

It seems like the write lock is set on all the tiles. It's most likely not reset correctly because we should only be locking one tile at a time. This forces us to copy the entire screen leading to an allocation failing and SIGSEGV.
Alright my previous guess was wrong. The problem is that we allocate 200 layers of size 512^2 which is about 200MB so we OOM.

I think we need to have a cap on our layers after which we just flatten everything.
Attached patch Enforce a max of 20 layers (obsolete) — Splinter Review
I'll be testing this out tomorrow
Attachment #775436 - Flags: review?(matt.woodrow)
Attachment #775436 - Attachment is obsolete: true
Attachment #775436 - Flags: review?(matt.woodrow)
Attachment #775711 - Flags: review?(matt.woodrow)
Comment on attachment 775711 [details] [diff] [review]
Enforce a max of 20 layers on mobile

Asking roc instead to review this to get another opinion.
Attachment #775711 - Flags: review?(matt.woodrow) → review?(roc)
Comment on attachment 775711 [details] [diff] [review]
Enforce a max of 20 layers on mobile

Review of attachment 775711 [details] [diff] [review]:
-----------------------------------------------------------------

This is actually restricting the maximum number of active, non-static-content child layers any layer can have. That's fine, but you should document that more carefully. Also this pref should be add explicitly to all.js with the value -1, and its behavior documented in a comment there.
Attachment #775711 - Flags: review?(roc) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> Comment on attachment 775711 [details] [diff] [review]
> Enforce a max of 20 layers on mobile
> 
> Review of attachment 775711 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is actually restricting the maximum number of active,
> non-static-content child layers any layer can have.

Ohh right, matt pointed that out but I forgot to fix it. Either will fix this bug but I imagine a global max-layer per tree is simpler to understand and more useful to have. I'll change it to that unless you object.
I actually think your current behavior is better and easier to understand.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> I actually think your current behavior is better and easier to understand.

It's not what I'd expect when setting layers.max-active and I think we can miss cases where do you want to restrict the layer count. But I'm fine with landing this patch and changing this later as needed.
Attached patch patch (obsolete) — Splinter Review
Added pref to all.js
Attachment #775711 - Attachment is obsolete: true
Attachment #776656 - Flags: review+
Blocks: 889966
Blocks: 877557
Attached patch patch - ready to land. r=roc (obsolete) — Splinter Review
Tree is closed.
Attachment #776656 - Attachment is obsolete: true
Attachment #777947 - Flags: review+
Attachment #777947 - Flags: checkin?
Keywords: checkin-needed
Attachment #777947 - Flags: checkin? → checkin+
Bustage fix for unexpected tests:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5efda3c30be3

I didn't push to try because I didn't think we really had tests with 20 layers. We should consider forcing the page to render in a single thebes and seeing which tests pass/fail unexpectedly.
https://hg.mozilla.org/mozilla-central/rev/4c89002460da
https://hg.mozilla.org/mozilla-central/rev/5efda3c30be3
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Depends on: 896250
+/* static */ int32_t
+nsDisplayItem::MaxActiveLayers()
+{
+  static int32_t sMaxLayers = false;
+  static bool sMaxLayersCached = false;
+
+  if (!sMaxLayersCached) {
+    Preferences::AddIntVarCache(&sMaxLayers, "layers.max-active", -1);
+    sMaxLayersCached = true;
+  }
+
+  return sMaxLayersCached;
+}
                    ^^^^^^ !!

This effectively returns 1 every time.
AAhh thanks! I knew it was something silly about this pref not working causing these regressions.
Depends on: 896198
No longer depends on: 896198
Depends on: 896198
Once it did start returning 20 on mobile, it started failing reftest-2: 
https://tbpl.mozilla.org/php/getParsedLog.php?id=25623153&tree=Mozilla-Inbound, possibly because of the filtering at the edges.  The test and the reference result end up with different layers tree and I'm guessing that's where the differences come from.  Maybe - it would explain why 1 worked, and 20 does not (with 1, the layer trees are the same), but the trees are also different with the old code, yet the test passed.  Perhaps something changed in the layerization algorithm during the three days we had this code in?
Flags: needinfo?(bgirard)
Milan, do you have those layer tree dumps handy?

In particular being able to compare the one with a 20 limit, vs no-limit would be useful.
we likely want to blackout my bustage fix which was incorrect.

(In reply to Benoit Girard (:BenWa) from comment #21)
> Bustage fix for unexpected tests:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5efda3c30be3
Flags: needinfo?(bgirard)
Blocks: 897239
This changeset causes bug 897239 (crash when trying to open bluetooth or wifi in FirefoxOS settings app).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 777947 [details] [diff] [review]
patch - ready to land. r=roc

Review of attachment 777947 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsDisplayList.cpp
@@ +1507,5 @@
> +    Preferences::AddIntVarCache(&sMaxLayers, "layers.max-active", -1);
> +    sMaxLayersCached = true;
> +  }
> +
> +  return sMaxLayersCached;

Shouldn't this be sMaxLayers ?
Right - see bug 896250 that fixed that.  There are regressions in Android tests with the "proper" fix though, so that got backed out.
with the layer limit set to 1: https://tbpl.mozilla.org/?tree=Try&rev=32105e7ca6ee
with the layer limit set to -1: https://tbpl.mozilla.org/?tree=Try&rev=6e0879b39110
I believe we get the same result with the layer limit set to 20, but I don't have a try run for that.
It's just that we never actually tested this fix, because of the typo, we were setting the layer limit to 1, rather than 20.
(In reply to Matt Woodrow (:mattwoodrow) from comment #26)
> Milan, do you have those layer tree dumps handy?
> 
> In particular being able to compare the one with a 20 limit, vs no-limit
> would be useful.

Not handy, and I'm in two days of training/meetings so it will be a couple of days before I get them back.  I do have a few images, I will attach those.
I landed a patch to fix MaxActiveLayers in bug 897239 so I'll reclose this bug.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Depends on: 897334
Can we revert this patchset?

Its screwing up windows and b2g.
I'm sorry, I backed this out for causing bug 897239 and the other regressions.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d0a9de949678
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like this backout is causing Android reftest orange, because we didn't yet back out the other cset from this bug, from comment 21.
I backed out comment 21, so now this should be "fully backed out" (conceptually), and the long stream of R2 orange should now turn green: 
 https://hg.mozilla.org/integration/mozilla-inbound/rev/de06fec529a5
Looks like the backout of this is causing the same kinds of asserts we were getting the last time bug 896250 attempted to land. Inbound's closed until this is resolved.
I filed bug 898713 to track fixing the orange resulting from this backout.
It seems like any relanding of this should include, at minimum:
https://hg.mozilla.org/mozilla-central/rev/4c89002460da
https://hg.mozilla.org/mozilla-central/rev/da5fa63e90cb
though I'm not confident that the second covers all the regressions from the first.

(I'd expect it to *not* need to include https://hg.mozilla.org/mozilla-central/rev/5efda3c30be3 .)
Thanks for the help on this bug. I spent the weekend trapped in an airports moving from flight to flight. It took 24 hours for American airlines to get me from LAX to YYZ from an initially direct flight.
Attached patch patchSplinter Review
https://tbpl.mozilla.org/?tree=Try&rev=9ac8f21619bc
Attachment #777947 - Attachment is obsolete: true
Attachment #784638 - Flags: review+
We managed to hit the asserts from bug 898713 that were keeping the tree closed today after pushing bug 777196 to b2g-inbound. I'll point out that it was green on Try. Just as with bug 887868, it seems unlikely to have directly been at fault. Any idea what underlying issue is happening here?

https://tbpl.mozilla.org/php/getParsedLog.php?id=26032309&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=26032281&tree=B2g-Inbound
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #45)
> We managed to hit the asserts from bug 898713 that were keeping the tree
> closed today after pushing bug 777196 to b2g-inbound. I'll point out that it
> was green on Try. Just as with bug 887868, it seems unlikely to have
> directly been at fault. Any idea what underlying issue is happening here?
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26032309&tree=B2g-Inbound
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26032281&tree=B2g-Inbound

I don't know what the problem is but it looks like we have a bug that isn't triggered right now but changing the code sufficiently will trigger it which is at least what this patch did. This should be investigated in a new bug.
https://hg.mozilla.org/mozilla-central/rev/b5319d82251e
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: mozilla25 → mozilla26
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.