[Browser] System crash when navigates https://plus.google.com/explore

VERIFIED FIXED in Firefox 44

Status

()

Core
Graphics: Layers
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: Alison Shiue, Assigned: sotaro)

Tracking

({regression})

unspecified
mozilla44
ARM
Gonk (Firefox OS)
regression
Points:
---

Firefox Tracking Flags

(blocking-b2g:2.5+, firefox44 fixed, b2g-v2.2 unaffected, b2g-master affected)

Details

Attachments

(4 attachments)

(Reporter)

Description

3 years ago
Created attachment 8675534 [details]
system_crash.log

STR:
1. Open a browser window and navigate to https://plus.google.com/explore
2. Tap "<" icon on left top (please refer: https://youtu.be/hiVgsVwd9uI)

Expected result:
1. Can navigate without problem

Actual result:
1. System crashed

Test build info:
[Flame]
Build ID               20151018150205
Gaia Revision          f75a7e01912cee313fed92ff2089586f507b2ba5
Gaia Date              2015-10-16 13:00:48
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/d1a89632277fbaaf470c90a35573776048988f2d
Gecko Version          44.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20151018.183443
Firmware Date          Sun Oct 18 18:34:55 EDT 2015
Bootloader             L1TC000118D0

[Aries]
Build ID               20151018193102
Gaia Revision          f75a7e01912cee313fed92ff2089586f507b2ba5
Gaia Date              2015-10-16 13:00:48
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/d1a89632277fbaaf470c90a35573776048988f2d
Gecko Version          44.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20151018.185054
Firmware Date          Sun Oct 18 18:51:02 UTC 2015
Bootloader             s1
(Reporter)

Comment 1

3 years ago
[Blocking Requested - why for this release]:
system crash
blocking-b2g: --- → 2.5?
Happens during regular navigation on the URL from the description as well.
blocking-b2g: 2.5? → 2.5+
Component: Gaia::Browser → Graphics: Layers
Product: Firefox OS → Core
Flags: needinfo?(milan)
:nical, any recent changes to IPC that could explain this?
Flags: needinfo?(milan) → needinfo?(nical.bugzilla)
Keywords: regressionwindow-wanted
(In reply to Milan Sreckovic [:milan] from comment #4)
> :nical, any recent changes to IPC that could explain this?

Not that I know of.

Interestingly, PCompositable's __delete__ message always comes from the child so it should not be possible for a message coming from the child to race with __delete__ and cause the failure to deserialize on the parent side. Looking at IPDL's generated code (pasted here because generated code don't show up in dxr):

> auto PLayerTransactionParent::Read(
>         PCompositableParent** v__,
>         const Message* msg__,
>         void** iter__,
>         bool nullable__) -> bool
> {
>     int32_t id;
>     if ((!(Read((&(id)), msg__, iter__)))) {
>         FatalError("Error deserializing 'id' for 'PCompositableParent'");
>         return false;
>     }
>     if (((1) == (id)) || (((0) == (id)) && ((!(nullable__))))) {
>         mozilla::ipc::ProtocolErrorBreakpoint("bad ID for PLayerTransaction");
>         return false;
>     }
> 
>     if ((0) == (id)) {
>         (*(v__)) = nullptr;
>         return true;
>     }
> 
>     ChannelListener* listener = Lookup(id);
>     if ((!(listener))) {
>         mozilla::ipc::ProtocolErrorBreakpoint("could not look up PCompositable");
>         return false;
>     }
> 
>     if ((PCompositableMsgStart) != ((listener)->GetProtocolTypeId())) {
>         mozilla::ipc::ProtocolErrorBreakpoint("actor that should be of type PCompositable has different type");
>         return false;
>     }
> 
>     (*(v__)) = static_cast<PCompositableParent*>(listener);
>     return true;
> }

Each path that returns false would trigger this assertion. I am assuming ProtocolErrorBreakpoint doesn't crash because I know FatalError does and the stack shows that this method returned false.
Flags: needinfo?(nical.bugzilla)
QA Contact: sleedavid
NO REPRO: 
Environmental Variables:
Device: Aries 2.5
BuildID: 20151021111747
Gaia: 32d827a70af90a05918f234e5b16b35d5d2a07e8
Gecko: d43374e6970311c0a14f25f3ec09d2a30448a2b2
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 44.0a1 (2.5) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:44.0) Gecko/44.0 Firefox/44.0

REPRO: 
Environmental Variables:
Device: Aries 2.5
BuildID: 20151016122951
Gaia: 8999f0ba6326d815c8366e3c1155b7e4e9763b40
Gecko: ccf288f658211b6cfab33c458aaf033baed2375b
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 44.0a1 (2.5) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:44.0) Gecko/44.0 Firefox/44.0

Repro Rate: (1/5) 

NOTE: Only crashed on first attempt when scrolling through the googlePlus website and not when pressing the < back-navigation icon.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmercado)
QA Contact: sleedavid
Let's get some more eyes on this.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmercado)
Comment 6 suggests this is now fixed?
(In reply to Milan Sreckovic [:milan] from comment #8)
> Comment 6 suggests this is now fixed?

If it has been fixed, bug 1215508, bug 1215900, bug 1213671 would look interesting...

Comment 10

3 years ago
This bug is definitely not fixed. With the latest OTA, Build 20151021064220 Commit 32d827a7, my Flame restarted while using G+. 

(In reply to Milan Sreckovic [:milan] from comment #9)
> (In reply to Milan Sreckovic [:milan] from comment #8)
> > Comment 6 suggests this is now fixed?
> 
> If it has been fixed, bug 1215508, bug 1215900, bug 1213671 would look
> interesting...
Comment 6 was using a newer build than you have, I imagine flashed manually instead of OTA, but let's clear that up.
Flags: needinfo?(sleedavid)

Comment 12

3 years ago
Created attachment 8677815 [details]
GDB backtrace for assertion failure on TV simulator

When opening https://plus.google.com/explore with the browser on TV simulator, I encountered an assertion failure as attached gdb backtrace log.

Not sure if it's similar issue as comment 0.  BTW, there is a bug 1156238 for such assertion.

Updated

3 years ago
Blocks: 1214243

Comment 13

3 years ago
I just retested this with the OTA I just received 10 minutes ago (20151023030241 commit 410e91dd) and Google+ is still crashing the whole OS.
To reproduce: open Google+ (i have it pinned as a site) and the click on the name of one of the Google+ users, to go to their profile. At this point the OS crashes.
(Reporter)

Comment 14

3 years ago
B2g Inbound Regression Window

Last Working
Build ID               20151015020003
Gaia Revision          87014859cb97cc8a75c614aadbf2b91d99908296
Gaia Date              2015-10-15 08:23:52
Gecko Revision         https://hg.mozilla.org/integration/b2g-inbound/rev/116ff033b0351a3de7667ae58b610f0fcfd235b4
Gecko Version          44.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20151015.052838
Firmware Date          Thu Oct 15 05:28:51 EDT 2015
Bootloader             L1TC000118D0

First Broken
Build ID               20151015031447
Gaia Revision          20809a0d85bc55bfec8737aeb7cd0dfc92344ec7
Gaia Date              2015-10-15 09:37:54
Gecko Revision         https://hg.mozilla.org/integration/b2g-inbound/rev/e0bb9665832484b39c30496d369566a39a673985
Gecko Version          44.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20151015.065533
Firmware Date          Thu Oct 15 06:55:42 EDT 2015
Bootloader             L1TC000118D0


Last Working gaia / First Broken gecko - This issue DOES occur with broken Gecko
Gaia: 87014859cb97cc8a75c614aadbf2b91d99908296
Gecko: e0bb9665832484b39c30496d369566a39a673985

Last Working gecko / First Broken gaia - This issue does NOT occur with broken Gaia
Gecko: 116ff033b0351a3de7667ae58b610f0fcfd235b4
Gaia: 20809a0d85bc55bfec8737aeb7cd0dfc92344ec7

B2g Inbound Pushlog: 
http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=116ff033b0351a3de7667ae58b610f0fcfd235b4&tochange=e0bb9665832484b39c30496d369566a39a673985
status-b2g-v2.2: --- → unaffected
status-b2g-master: --- → affected
Keywords: regressionwindow-wanted → regression
Sotaro, can you take a look at this?
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 16

3 years ago
Ok, I investigate it.
(Assignee)

Comment 17

3 years ago
I confirmed the symptom.
(Assignee)

Comment 18

3 years ago
When the following functions are called in order within ClientLayerManager::EndTransactionInternal().
-[2] ShadowLayerForwarder::RemoveTextureFromCompositableAsync() was called.
   CompositableClient was not destroyed yet. Then OpRemoveTextureAsync was added to layer transaction.
- CompositableClient::Destroy() of the above CompositableClient was called.
  In the Destroy(), PCompositableChild::Send__delete__(mCompositableChild) message was sent to parent side.

Therefore, when layer transaction was forwarded to parent side, the CompositableChild was already deleted.
(Assignee)

Comment 19

3 years ago
Gonk sent OpRemoveTextureAsync a lot to receive fence from parent side. It seems to trigger the problem.
(Assignee)

Comment 20

3 years ago
Created attachment 8679313 [details] [diff] [review]
patch - Always sent OpRemoveTextureAsync as pending async Messages

By sending OpRemoveTextureAsync as pending async Messages. The messages are handled independently from layer transaction massage. Then, SendPendingAsyncMessges() could send them to parent. It could handle unexpected CompositableChild removal.

There seems intrinsic problem between layer transaction message and CompositableChild removal. But the patch just bypass it.
(Assignee)

Updated

3 years ago
Attachment #8679313 - Flags: review?(nical.bugzilla)

Updated

3 years ago
Attachment #8679313 - Flags: review?(nical.bugzilla) → review+
(In reply to Sotaro Ikeda [:sotaro] from comment #20)
> There seems intrinsic problem between layer transaction message and
> CompositableChild removal.

Yes, at some point I planned to have PCompositable use the same kind of deallocation handshake as PTexture (Send a destroy message from child to parent, and have the parent send __delete__) which would make it easy to avoid this kind of race. I don't know why I ended up not doing it (probably that we just didn't run into issues until recently).
I'll do that in a new bug.

Updated

3 years ago
Status: NEW → ASSIGNED

Comment 24

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b7dd8bf95c82
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox44: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
(Reporter)

Updated

3 years ago
Keywords: verifyme
(Reporter)

Comment 25

3 years ago
Verified on

[Flame]
Build ID               20151101004501
Gaia Revision          91cac94948094cfdcd00cba5c6483e27e80cb3b0
Gaia Date              2015-10-28 20:32:15
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g44_v2_5/rev/36d8fc192b45d26eb4d32bbeae19f79cb2aebb65
Gecko Version          44.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20151101.031103
Firmware Date          Sun Nov  1 03:11:16 EST 2015
Bootloader             L1TC000118D0

[Aries]
Build ID               20151101012023
Gaia Revision          91cac94948094cfdcd00cba5c6483e27e80cb3b0
Gaia Date              2015-10-28 20:32:15
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/96377bdbcdf3e444a22aeaa677da696243b00d98
Gecko Version          45.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20151101.004008
Firmware Date          Sun Nov  1 00:40:17 UTC 2015
Bootloader             s1
Status: RESOLVED → VERIFIED
Flags: needinfo?(sleedavid)
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.