Last Comment Bug 694964 - crash [@ gfxSharedImageSurface::Open] when a layer is painted twice in the same transaction
: crash [@ gfxSharedImageSurface::Open] when a layer is painted twice in the sa...
Status: RESOLVED FIXED
[mobile-crash][native-crash][STR in c...
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: 10 Branch
: ARM Android
: -- critical with 1 vote (vote)
: mozilla14
Assigned To: Ali Juma [:ajuma]
:
Mentors:
: 701617 (view as bug list)
Depends on: 720353
Blocks: 690469
  Show dependency treegraph
 
Reported: 2011-10-17 06:16 PDT by Robert Kaiser
Modified: 2012-06-11 03:39 PDT (History)
17 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
affected
fixed
fixed
12+
fixed
-


Attachments
Crash fix on assumption from comment 21 (1.17 KB, patch)
2012-01-11 20:41 PST, Oleg Romashin (:romaxa)
romaxa: review-
Details | Diff | Splinter Review
Updated patch, drop obsolete buffer carefully (2.49 KB, patch)
2012-01-12 15:13 PST, Oleg Romashin (:romaxa)
ajuma.bugzilla: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Backout Attachment 588206 (2.49 KB, patch)
2012-01-23 07:45 PST, Ali Juma [:ajuma]
bugzilla: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Don't generate a Thebes Paint edit in an incomplete transaction. (1.21 KB, patch)
2012-03-29 12:08 PDT, Ali Juma [:ajuma]
bgirard: review+
akeybl: approval‑mozilla‑esr10+
Details | Diff | Splinter Review

Description Robert Kaiser 2011-10-17 06:16:00 PDT
This bug was filed from the Socorro interface and is 
report bp-e1096e16-7bc6-458c-9ef3-a527b2111017 .
============================================================= 

Top frames:

0 	libxul.so 	gfxSharedImageSurface::Open 	BaseSize.h:55
1 	libxul.so 	mozilla::layers::ShadowLayerForwarder::OpenDescriptor 	gfx/layers/ipc/ShadowLayers.cpp:464
2 	libxul.so 	mozilla::layers::BasicShadowThebesLayer::Swap 	gfx/layers/basic/BasicLayers.cpp:2792
3 	libxul.so 	mozilla::layers::ShadowLayersParent::RecvUpdate 	PLayers.h:1199
4 	libxul.so 	mozilla::layers::PLayersParent::OnMessageReceived 	obj-firefox/ipc/ipdl/PLayersParent.cpp:220
5 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	obj-firefox/ipc/ipdl/PContentParent.cpp:1464
6 	libxul.so 	mozilla::ipc::SyncChannel::OnDispatchMessage 	ipc/glue/SyncChannel.cpp:175
7 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:431
8 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
9 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:487
10 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:319


This crash started popping up on October 12 with Nightly Fennec builds of that day, from looking at https://crash-stats.mozilla.com/report/list?signature=gfxSharedImageSurface%3A%3AOpen it seems to happen on different Android versions (at least that's what I derive from seeing different kernel versions, Naoki or someone might correct me here) and all Nightly builds since then - but it only happens on Fennec and only on Nightly.

This also seems to be the #1 crash on Fennec Trunk right now - at least in my own stats it's the only one of which we have 10 or more crashes every day.
Comment 1 Ali Juma [:ajuma] 2011-10-17 06:35:24 PDT
This crash is happening in code that was recently added by Bug 690469.
Comment 2 Oleg Romashin (:romaxa) 2011-10-17 08:15:15 PDT
Is there are some way to reproduce this crash? could it be related to the same problem we've workaround-ed for OGL port (order of ThebesLayer Front/Back sync)...
BasicShadowableThebesLayer::SetBackBufferAndAttrs
Not sure should we just call SyncFrontBufferToBackBuffer from that place for all backends?
Comment 3 Oleg Romashin (:romaxa) 2011-10-17 08:22:43 PDT
but from other side, that problem could not endup with crash... and descriptor open could fail only if Shmem was give to Chrome process and destroyed at the same time on Child process side.
Comment 4 Robert Kaiser 2011-10-17 08:29:37 PDT
(In reply to Oleg Romashin (:romaxa) from comment #2)
> Is there are some way to reproduce this crash?

No idea, I only saw crash reports coming in frequently enough that it's worth filing this.
Comment 5 Martijn Wargers [:mwargers] (not working for Mozilla) 2011-10-19 14:37:39 PDT
I was hitting this crash once in current trunk build on the EEE Transformer, while playing around with the file input and the camera. Fennec crashed a while after I closed that page:
https://crash-stats.mozilla.com/report/index/bp-da19582c-6b42-4d88-abcc-0a0e92111019
Comment 6 Ludovic Hirlimann [:Usul] 2011-10-22 23:20:14 PDT
I got the crash whl3 loading www.leparisien.fr . The graphics on the pagw started to flicker a lot and then fennec crashed.
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-24 17:23:27 PDT
Ludovic, I couldn't reproduce on a Galaxy S II Epic 4G on today's (10/23) nightly.  What phone were you using?

Can anyone else repro?
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-24 17:33:39 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> Ludovic, I couldn't reproduce on a Galaxy S II Epic 4G on today's (10/23)
> nightly.  What phone were you using?

Er, phone *and* build.
Comment 9 Ludovic Hirlimann [:Usul] 2011-10-25 00:08:45 PDT
Galaxy-S one - build was from the day I posted on bugzilla. When I visited the page again it didn't crash. I haven't see this crash since.
Comment 10 Benoit Girard (:BenWa) 2011-11-01 12:58:37 PDT
The crash volume is significant considering how small user base for fennec nightly. I think we should back out bug 690469 until it's fixed.
Comment 11 Oleg Romashin (:romaxa) 2011-11-01 15:16:18 PDT
Sounds like yes we should do that.
Comment 12 Oleg Romashin (:romaxa) 2011-11-01 15:22:00 PDT
One possible solution to make change smaller, is to make this condition
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/basic/BasicLayers.cpp#2262
true always, so we will switch back to previous Swap/Sync/Create order and keep API change.
we had similar problem in bug 694140, but it was not leading to crash, and was causing some rendering problems... sounds like for software mode we have unreproducible crashes...
Comment 13 Josh Matthews [:jdm] 2011-11-11 13:58:04 PST
Apparently bug 701617 contains rock-steady STR.
Comment 14 Joe Drew (not getting mail) 2011-11-12 19:20:51 PST
*** Bug 701617 has been marked as a duplicate of this bug. ***
Comment 15 John Hammink 2011-11-22 15:00:50 PST
I am still able to reproduce this crash also by surfing to my battery crash page:
people.mozilla.com/~jhammink/webapi_test_pages/BatteryCrashv2.html

Actual crash report is here:
https://crash-stats.mozilla.com/report/index/bp-4e92cbbf-54fc-4874-af32-034262111122
Comment 16 Scoobidiver (away) 2011-12-17 06:37:01 PST
It's #2 top crasher in 10.0a2 (only about 700 ADU).
Comment 17 Scoobidiver (away) 2012-01-06 00:23:30 PST
It's #1 top crasher in 10.0b2 with about 12,000 ADU.
Comment 18 Scoobidiver (away) 2012-01-10 06:47:57 PST
It's #1 top crasher in 10.0b3 (about 11,000 ADU) with 16% of all crashes.
There have been no crashes in 11.0a2 and 12.0a1 (about 500 ADU).
Comment 19 Joe Drew (not getting mail) 2012-01-10 19:58:05 PST
Perhaps George can take a look at this when he has a moment.
Comment 20 Oleg Romashin (:romaxa) 2012-01-11 14:20:39 PST
If it is easy to reproduce, then it would be nice to get it crashed on debug build with NSPR_LOG_MODULES=Layers:5 exported... so we can see order of layers transactions/ctor/dtor calls.
Comment 21 Oleg Romashin (:romaxa) 2012-01-11 15:03:47 PST
We have here mFrontBuffer which is coming from child process, stick in UI process, and somehow get's destroyed in child process in such way that UI process does not know about that...
I see one way how it might happen:
BasicShadowableThebesLayer::PaintBuffer pushing mBackBuffer into transaction queue, but don't move that buffer into read-only state... while transaction in pending state time between ::PaintBuffer and BasicShadowableThebesLayer::SetBackBufferAndAttrs, we do receive another BasicShadowableThebesLayer::CreateBuffer call with different size... after that we have mBackBuffer  destroyed without notifying UI process.

I guess the right proposal here is to set mBackBuffer = NULL right after PaintedThebesBuffer call, or move it to mROFrontBuffer... so there is no chance that mBuffer will be destroyed after that.
But in that case we should handle situation where we can have 3 buffers, and destroy 3-rd buffer correctly
Comment 22 Oleg Romashin (:romaxa) 2012-01-11 20:41:58 PST
Created attachment 587937 [details] [diff] [review]
Crash fix on assumption from comment 21

Kind of blind fix, because I'm not able to reproduce this crash, will create try build probably someone could run manual test round and check if it still reproducible or not, or we can just push it and check if crash has disappear
Comment 23 Joe Drew (not getting mail) 2012-01-12 11:37:47 PST
Comment on attachment 587937 [details] [diff] [review]
Crash fix on assumption from comment 21

This looks perfectly reasonable to me, but I don't know whether this is a thing that should actually happen in the wild. I'm going to cede review to Chris and/or Ali, who have more recent experience with cross-process things.
Comment 24 Oleg Romashin (:romaxa) 2012-01-12 12:57:56 PST
Forgot to attach try build:
https://tbpl.mozilla.org/?tree=Try&rev=ed468272c38e
Comment 25 Oleg Romashin (:romaxa) 2012-01-12 13:12:08 PST
Comment on attachment 587937 [details] [diff] [review]
Crash fix on assumption from comment 21


>+  } else {
>+    BasicManager()->ShadowLayerForwarder::DestroySharedSurface(&mBackBuffer);

Oh, here we should destroy aBuffer.

>   }
Comment 26 Oleg Romashin (:romaxa) 2012-01-12 15:13:21 PST
Created attachment 588206 [details] [diff] [review]
Updated patch, drop obsolete buffer carefully
Comment 27 Oleg Romashin (:romaxa) 2012-01-12 16:14:10 PST
Pushed into:
https://hg.mozilla.org/mozilla-central/rev/964b118ac852

hope it will help.
if not then we should reopen and investigate again
Comment 28 Scoobidiver (away) 2012-01-16 23:18:43 PST
(In reply to Oleg Romashin (:romaxa) from comment #27)
> hope it will help.
As it doesn't happen in Nightly or Aurora (small sample group of about 800 ADU), we don't know if it's fixed using crash stats.
It's still #1 top crasher in Fennec 10.0b4 (about 10,000 ADU).
Comment 29 Oleg Romashin (:romaxa) 2012-01-17 00:06:43 PST
but, aurora does not have that fix... any crashes from nightly?
Comment 30 Scoobidiver (away) 2012-01-17 01:14:11 PST
(In reply to Oleg Romashin (:romaxa) from comment #29)
> but, aurora does not have that fix... any crashes from nightly?
It has never crashed in Fennec Native Nightly and Aurora.
It crashed in Fennec Aurora 11.0a2/20120111.
People using Nightly and Aurora are not statistically representative.

It seems risky to land this patch in the latest Beta as it also affects the desktop version.
Comment 31 Oleg Romashin (:romaxa) 2012-01-17 01:49:33 PST
Ok, let's wait when 12 will be representative
Comment 32 Ali Juma [:ajuma] 2012-01-17 06:46:21 PST
(In reply to Scoobidiver from comment #30) 
> It seems risky to land this patch in the latest Beta as it also affects the
> desktop version.

Note that this patch only affects shadow layers, which aren't used by default anywhere on desktop right now. (Shadow layers are currently only used for e10s and for off-main-thread compositing.)

I'd advocate at least taking this on Aurora, so that the next Beta for tablets (which, as I understand it, will still be XUL Fennec rather than native Fennec) has this.

Landing this on Beta right away is indeed risky (given Comment 22), but might be worth doing if there's enough time to land it, measure the effect it's having, and back out if it's making things worse.
Comment 33 Ali Juma [:ajuma] 2012-01-17 08:51:01 PST
I can consistently reproduce this crash on Nightly prior to this landing, but not on the latest Nightly. So I feel reasonably confident that the patch that landed fixed the issue.

Steps to reproduce:
1) Visit http://people.mozilla.com/~ajuma/fennec/test.html
2) Click on "one"
3) Click on OK
Comment 34 Oleg Romashin (:romaxa) 2012-01-17 09:56:40 PST
> Steps to reproduce:
> 1) Visit http://people.mozilla.com/~ajuma/fennec/test.html
> 2) Click on "one"
> 3) Click on OK
oh, great, we have test for that crash.
then we definitely should land it to 11, and ajuma is right, this affect only shadowLayers which are currently used only in e10s fennec
Comment 35 Scoobidiver (away) 2012-01-17 10:06:35 PST
(In reply to Ali Juma [:ajuma] from comment #32)
> might be worth doing if there's enough time to land it, measure the effect
> it's having, and back out if it's making things worse.
As it's reproducible, impacts only the mobile version and there will be a Beta 5 (this night) and 6 (in one week), it's worth asking a Beta approval ASAP.
Comment 36 Ali Juma [:ajuma] 2012-01-17 10:36:51 PST
Comment on attachment 588206 [details] [diff] [review]
Updated patch, drop obsolete buffer carefully

[Approval Request Comment]
Regression caused by (bug #): 690469

User impact if declined: This is the top crash on XUL Fennec, both on Beta and on Aurora.

Testing completed (on m-c, etc.): This has been on m-c since Jan. 12, but this is of limited relevance given the small number of XUL Fennec users we have on Nightly -- the number of users is small enough that we weren't even seeing this crash before the patch landed. In my own testing on a Galaxy Tab, the patch does fix the crash (see Comment 33).

Risk to taking this patch (and alternatives if risky): The patch is sufficiently non-trivial that there is some risk here. The risk is only to XUL Fennec -- this code isn't used by default anywhere else right now. We should definitely take this on Aurora. Given that this is the top crash, we should also consider taking it on Beta despite the risk (ideally though, we should only do this if there's still time to back it out if it causes regressions).
Comment 37 Alex Keybl [:akeybl] 2012-01-17 16:46:36 PST
Comment on attachment 588206 [details] [diff] [review]
Updated patch, drop obsolete buffer carefully

[Triage Comment]
Our understanding of this issue is that although this is in shared code, XUL Fennec is the only client of this change. Approving for Aurora/Beta.
Comment 39 Ali Juma [:ajuma] 2012-01-23 07:32:49 PST
This looks to be the most likely cause of Bug 720353, which is causing more crashes on beta (0.52 crashes/ADU) than the number of crashes fixed by this bug (0.24 crashes/ADU).
Comment 40 Ali Juma [:ajuma] 2012-01-23 07:45:41 PST
Created attachment 590713 [details] [diff] [review]
Backout Attachment 588206 [details] [diff]

Since the landing of this bug on mozilla-beta is the most likely cause of Bug 720353, we should back out this bug from beta.

[Approval Request Comment]
Regression caused by (bug #): The landing of this bug (Attachment 588206 [details] [diff]) on beta.
User impact if declined: Attachment 588206 [details] [diff] is suspected of causing more crashes on beta (0.52 crashes/ADU) than it resolved (0.24 crashes/ADU), so declining to back it out would mean a net increase of 0.28 crashes/ADU.
Testing completed (on m-c, etc.): Bug 720353 has only been seen on beta, likely due to the small number of XUL Fennec users on Aurora and Nightly, so landing on m-c wouldn't give us any useful information.
Risk to taking this patch (and alternatives if risky): Low-risk. This is just a backout of a patch than landed on beta last week.
Comment 41 Ali Juma [:ajuma] 2012-01-23 08:35:05 PST
Backout landed on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/ebe77c75acab
Comment 42 Alex Keybl [:akeybl] 2012-01-31 16:54:32 PST
Backed out for beta 11 as well: https://hg.mozilla.org/releases/mozilla-beta/rev/788ea1ef610b
Comment 43 Scoobidiver (away) 2012-02-02 08:26:23 PST
It's #2 top crasher in 10.0.
Comment 44 Alex Keybl [:akeybl] 2012-02-05 13:14:06 PST
(In reply to Scoobidiver from comment #43)
> It's #2 top crasher in 10.0.

I think this should be re-opened since it was backed out of Beta 10 and Beta 11. Joe - can you find somebody to take a look?
Comment 45 Ali Juma [:ajuma] 2012-02-05 14:19:25 PST
It's conceivable that this crash was really just the problem described in Bug 718150 (or at least closely related). Boris/Chris/Oleg, do you think that patch there might have fixed this crash?
Comment 46 Oleg Romashin (:romaxa) 2012-02-05 14:48:12 PST
In thebes layers case we do not have internal mSize or mBounds variables, and do check of FrontSurface size using actual buffers structures.
I've noticed one possible problem is that BasicShadowThebesLayer::Swap check only surface size, but not ContentType of buffer.
And another possible problem described in bug 720353#c6
would be nice to find testcase which reproduce that problem
Comment 47 Boris Zbarsky [:bz] 2012-02-06 09:14:49 PST
Bug 718150 should not have led to crashes if I understand the code correctly...
Comment 48 Joe Drew (not getting mail) 2012-02-06 13:39:52 PST
Oleg, can you proceed with your idea in comment 46, and maybe we can resolve the crash in bug 720353?

Alternately, someone able to reproduce that crash would be helpful...
Comment 49 Scoobidiver (away) 2012-02-10 23:49:01 PST
(In reply to Joe Drew (:JOEDREW!) from comment #48)
> Alternately, someone able to reproduce that crash would be helpful...
There are STR in comment 13 and comment 15.

Is it reproducible in Native Fennec?
Comment 50 Ali Juma [:ajuma] 2012-02-11 07:29:29 PST
(In reply to Scoobidiver from comment #49)
> (In reply to Joe Drew (:JOEDREW!) from comment #48)
> > Alternately, someone able to reproduce that crash would be helpful...
> There are STR in comment 13 and comment 15.
> 
> Is it reproducible in Native Fennec?

To be clear, there are two crashes being discussed here -- the crash for this bug, and the crash in Bug 720353.

The crash for this bug has STR in comments 13, 15, and 33. It's not reproducible in the current Native Fennec, since the current Native Fennec doesn't use shadow layers. Native Fennec with OMTC does use shadow layers, but the crash for this bug happens in Basic shadow layers while Native Fennec with OMTC uses GL shadow layers. There may turn out to be an analogous crash though, if the root cause is on the content side (since both XUL Fennec and Native Fennec with OMTC use Basic layers on the content side).

The crash in Bug 720353 doesn't have STR. It's also a shadow layers crash, but on the content side, so it may well be possible to reproduce in Native Fennec with OMTC.
Comment 51 Ali Juma [:ajuma] 2012-03-19 14:26:32 PDT
Backed out (see Bug 720353):
https://hg.mozilla.org/integration/mozilla-inbound/rev/af19e5ada310
Comment 52 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-03-19 15:07:44 PDT
I got a different Crash in 33 on Galaxy Nexus doing a 1 off:
> Steps to reproduce:
> 1) Visit http://people.mozilla.com/~ajuma/fennec/test.html
> 2) Click on "one"
> 3) Click on OK
4) Rotate device

It looks like an OOM crash. (03-19 15:06:23.419: E/GeckoApp(3020): low memory); cannot current duplicate the bug based on the repro steps.  Removing reproducible for now.
Comment 53 Robert Kaiser 2012-03-20 09:28:17 PDT
This signature is still high in XUL releases (10 ESR). Should we land this on the ESR branch for the upcoming XUL releases?
Comment 54 Scoobidiver (away) 2012-03-20 09:46:55 PDT
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #53)
> This signature is still high in XUL releases (10 ESR). Should we land this
> on the ESR branch for the upcoming XUL releases?
Fixing this bug with the current patch causes bug 720353 to appear.
Comment 55 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-03-21 12:02:08 PDT
https://bugzilla.mozilla.org/show_bug.cgi?id=694964#c52 is with nightly build.
Does not crash with nightly.

Doesn't seem to crash for Galaxy Nexus with the ESR release using the steps to repro given here: https://bugzilla.mozilla.org/show_bug.cgi?id=694964#c33
Comment 56 Daniel Veditz [:dveditz] 2012-03-23 11:15:02 PDT
We're going to minus this for ESR10 because this patch caused a worse regression, and there's no patch that fixes both.
Comment 57 Ali Juma [:ajuma] 2012-03-23 13:16:44 PDT
I've been able to reproduce this on Native Fennec on an HTC Desire, by rotating the phone back and forth while a page is loading (the crash doesn't always happen while doing this, but happens eventually when I keep repeating this).

Here's what seems to be the problem:

The crash happens when a Thebes layer gets painted twice within a single transaction. This happens when a transaction is set as incomplete (via BasicLayerManager::SetTransactionIncomplete). The result of painting the Thebes layer twice is that we have two TOpPaintThebesBuffer Edits for the same layer within the same transaction. Both of these have the same newFrontBuffer. This means that the ShadowThebesLayer's Swap function gets called twice with the same value of aNewFront. After the first such call, the ShadowThebesLayer's mFrontBufferDescriptor is aNewFront.buffer(). During the second such call, aNewBack's buffer is set to mFrontBufferDescriptor, but mFrontBufferDescriptor's value doesn't change (since it gets set again to the same aNewFront.buffer()). The result is that after this call and the corresponding SetBackBufferAndAttrs on the shadowable layer, the ShadowableThebesLayer's mBackBuffer is the same as ShadowThebesLayer's mFrontBufferDescriptor. The crash then happens when the Shadowable layer decides to destroy mBackBuffer (during a call to CreateBuffer), and the Shadow layer subsequently tries to open mFrontBufferDescriptor.

Another thing that can go wrong when we have two paints of the same layer in the same transaction is mBackBuffer getting destroyed during the second paint. In this case, after the first Swap, mFrontBufferDescriptor will be an already destroyed surface.
Comment 58 Cristian Nicolae (:xti) 2012-03-27 08:28:18 PDT
I reproduced this crash on today's Nightly build by going to Awesomebar and back into main view repeatedly:

https://crash-stats.mozilla.com/report/index/bp-e89ea478-0b13-40ec-8665-874a22120327

--
Firefox 14.0a1 (2012-03-27)
Device: Samsung Galaxy S
OS: Android 2.2
Comment 59 Scoobidiver (away) 2012-03-27 08:38:37 PDT
It's a regression from Fx 10.
Comment 60 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-03-27 09:27:06 PDT
I was able to reproduce this in Fennec Native on Galaxy Nexus by going to http://www.kevs3d.co.uk/dev/ and rotating from portrait to landscape.
Comment 61 Ali Juma [:ajuma] 2012-03-29 12:08:43 PDT
Created attachment 610636 [details] [diff] [review]
Don't generate a Thebes Paint edit in an incomplete transaction.

The problem is actually that we're generating a TOpPaintThebesBuffer edit when we shouldn't be. Here's how this happens. When BasicShadowableThebesLayer::PaintBuffer gets called, it in turn calls BasicThebesLayer::PaintBuffer with the same arguments. When argument aCallback is null, BasicThebesLayer::PaintBuffer sets the transaction as incomplete, and returns immediately without actually painting anything. But BasicShadowableThebesLayer::PaintBuffer nevertheless goes ahead and generates a spurious TOpPaintThebesBuffer edit (by calling PaintedThebesBuffer). This patch fixes that by making BasicShadowableThebesLayer::PaintBuffer check if the transaction got set as incomplete.
Comment 62 Benoit Girard (:BenWa) 2012-03-29 12:10:23 PDT
Comment on attachment 610636 [details] [diff] [review]
Don't generate a Thebes Paint edit in an incomplete transaction.

ajuma++!
Comment 64 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-03-30 07:22:12 PDT
I got this crash twice on the HTC Desire HD in current trunk build, while having these 2 tabs open:
http://people.mozilla.com/~mwargers/tests/positonabsolute_redraw_crash.htm
http://planet.mozilla.org/
And while I was performing some panning/zooming actions in the first tab.
Comment 65 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-03-30 07:33:15 PDT
Ok, I think I can reproduce this crash (sort of) with only http://people.mozilla.com/~mwargers/tests/positonabsolute_redraw_crash.htm
While being on that page, I first try to pinch zoom out as far as possible, then I pan to the right like crazy to scroll to the left as to make some scroll overflow area visible at the left (this is difficult, because the page has javascript in there that scrolls to the right)
Comment 66 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-03-30 07:56:13 PDT
Video here: http://www.youtube.com/watch?v=JCtwByGwgmk
Comment 67 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-03-30 10:24:49 PDT
Another STR:
1. Set Fennec -> Settings -> Plugins -> Enabled
2. restart Fennec
3. go to http://people.mozilla.org/~mwargers/tests/flashcrash_ics/buttonopencrash1.htm
4. Select Always Show for popups
5. During the time that it's doing the test, go to menu-> settings

Crash should occur.
Comment 68 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-03-31 13:30:17 PDT
I seem to reproducibly crash on the Samsung Galaxy Nexus on this url: http://people.mozilla.org/~mwargers/tests/plugins/flash/scroll.html
Comment 69 Ed Morley [:emorley] 2012-03-31 19:40:38 PDT
(In reply to Benoit Girard (:BenWa) from comment #62)
> Comment on attachment 610636 [details] [diff] [review]
> Don't generate a Thebes Paint edit in an incomplete transaction.
> 
> ajuma++!

https://hg.mozilla.org/mozilla-central/rev/d5a7628eed87
Comment 70 Robert Kaiser 2012-04-10 07:27:02 PDT
This is still one of the major topcrashes in the current Fennec "releases" which are from ESR10, should the backout or the fix be ported there?
Comment 71 Ali Juma [:ajuma] 2012-04-10 07:36:52 PDT
Comment on attachment 610636 [details] [diff] [review]
Don't generate a Thebes Paint edit in an incomplete transaction.

[Approval Request Comment]
Regression caused by (bug #): Bug 690469
User impact if declined: This is a Fennec top crasher (see Comment 70).
Testing completed (on m-c, etc.): This has been on m-c since March 31 and has fixed the crash there.
Risk to taking this patch (and alternatives if risky): Low risk to Fennec, no risk to desktop (since ShadowLayers aren't used on desktop).
String changes made by this patch: None.
Comment 72 Alex Keybl [:akeybl] 2012-04-11 16:14:48 PDT
Comment on attachment 610636 [details] [diff] [review]
Don't generate a Thebes Paint edit in an incomplete transaction.

(In reply to Ali Juma [:ajuma] from comment #71)
> Comment on attachment 610636 [details] [diff] [review]
> Don't generate a Thebes Paint edit in an incomplete transaction.
> 
> [Approval Request Comment]
> Regression caused by (bug #): Bug 690469
> User impact if declined: This is a Fennec top crasher (see Comment 70).
> Testing completed (on m-c, etc.): This has been on m-c since March 31 and
> has fixed the crash there.
> Risk to taking this patch (and alternatives if risky): Low risk to Fennec,
> no risk to desktop (since ShadowLayers aren't used on desktop).
> String changes made by this patch: None.

Approved for the ESR branch. Does bug 720353 need to come along for the ride?
Comment 73 Ali Juma [:ajuma] 2012-04-11 17:10:03 PDT
(In reply to Alex Keybl [:akeybl] from comment #72)
 
> Approved for the ESR branch. Does bug 720353 need to come along for the ride?

No, Bug 720353 backs out on 14 a patch that had already been backed out on 10 (see Comment 41 above).
Comment 74 Alex Keybl [:akeybl] 2012-04-12 09:34:47 PDT
(In reply to Ali Juma [:ajuma] from comment #73)
> (In reply to Alex Keybl [:akeybl] from comment #72)
>  
> > Approved for the ESR branch. Does bug 720353 need to come along for the ride?
> 
> No, Bug 720353 backs out on 14 a patch that had already been backed out on
> 10 (see Comment 41 above).

Thanks Ali. We spent a few minutes trying to untangle the status flags and backouts without success during triage, so we appreciate your confirmation.
Comment 75 Ali Juma [:ajuma] 2012-04-12 10:22:35 PDT
Landed on esr10:
https://hg.mozilla.org/releases/mozilla-esr10/rev/52183a4ceb7c

along with a follow-up bustage fix (BasicLayerManager::IsTransactionIncomplete wasn't already defined on esr10):
https://hg.mozilla.org/releases/mozilla-esr10/rev/2329d4a9d3d1
Comment 76 Scoobidiver (away) 2012-06-11 03:39:11 PDT
There are crashes at a lower volume in XUL Fennec 10.0.4esr (#31 top crasher), 10.0.5esr (#37) and 14.0b6 (#42) that contain the fix: https://crash-stats.mozilla.com/report/list?signature=gfxSharedImageSurface%3A%3AOpen

Note You need to log in before you can comment on or make changes to this bug.