Closed Bug 881723 Opened 6 years ago Closed 6 years ago

crash in mozilla::dom::PContentChild::Write with abort message: "actor has been |delete|d"

Categories

(Core :: DOM: Core & HTML, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
1.1 QE5
blocking-b2g leo+
Tracking Status
firefox23 --- unaffected
firefox24 --- unaffected
firefox25 --- unaffected
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed

People

(Reporter: scoobidiver, Assigned: jdm)

References

Details

(Keywords: crash, reproducible, topcrash, Whiteboard: [b2g-crash] [STR: Comment 20])

Crash Data

Attachments

(1 file)

Seen on Unagi, Peak, and Keon.

Here is a crash report: bp-a431163f-b4ce-4112-ae4e-8c63e2130610.

Frame 	Module 	Signature 	Source
0 	libxul.so 	mozalloc_abort 	mozalloc_abort.cpp:30
1 	libxul.so 	NS_DebugBreak_P 	nsDebugImpl.cpp:423
2 	libxul.so 	mozilla::dom::PContentChild::Write 	PContentChild.cpp:5766
3 	libxul.so 	mozilla::dom::PContentChild::Write 	PContentChild.cpp:5415
4 	libxul.so 	mozilla::dom::PContentChild::Write 	PContentChild.cpp:5216
5 	libxul.so 	mozilla::dom::PContentChild::SendPStorageConstructor 	PContentChild.cpp:900
6 	libxul.so 	mozilla::dom::StorageChild::CloneFrom 	StorageChild.cpp:280
7 	libxul.so 	nsDOMStorage::CloneFrom 	nsDOMStorage.cpp:1439
8 	libxul.so 	nsDOMStorage2::Clone 	nsDOMStorage.cpp:1581
9 	libxul.so 	nsWindowWatcher::OpenWindowInternal 	nsWindowWatcher.cpp:972
10 	libxul.so 	nsWindowWatcher::OpenWindow2 	nsWindowWatcher.cpp:471
11 	libxul.so 	nsGlobalWindow::OpenInternal 	nsGlobalWindow.cpp:9383
12 	libxul.so 	nsGlobalWindow::OpenInternal 	nsGlobalWindow.cpp:9266
13 	libxul.so 	nsGlobalWindow::OpenJS 	nsGlobalWindow.cpp:5966
14 	libxul.so 	NS_InvokeByIndex_P 	xptcinvoke_arm.cpp:160
15 	libxul.so 	XPCWrappedNative::CallMethod 	XPCWrappedNative.cpp:3084
16 	libxul.so 	XPC_WN_CallMethod 	XPCWrappedNativeJSOps.cpp:1469
17 	libxul.so 	js::InvokeKernel 	jscntxtinlines.h:364
18 	libxul.so 	js::Interpret 	jsinterp.cpp:2475
19 	libxul.so 	js::RunScript 	jsinterp.cpp:324
20 	libxul.so 	UncachedInlineCall 	InvokeHelpers.cpp:363
21 	libxul.so 	js::mjit::stubs::UncachedCallHelper 	InvokeHelpers.cpp:451
22 	libxul.so 	js::mjit::CallCompiler::update 	MonoIC.cpp:1220
23 	libxul.so 	js::mjit::ic::Call 	MonoIC.cpp:1298 
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Adom%3A%3APContentChild%3A%3AWrite
Component: General → DOM: Core & HTML
Product: Boot2Gecko → Core
Use the new implementation :D

OK, seriously.  Is this reproducible?
Presumably this is fallout from the changes that neutered the IPC connection when the owning window was being destroyed. Maybe we shouldn't neuter the session storage until later, if it's the only kind that gets cloned.
It's #2 top crasher in B2G 18.0.
blocking-b2g: --- → leo?
Keywords: topcrash
Setting notes to follow up on this bug for myself.
Flags: needinfo?(nhirata.bugzilla)
18 crashes on the 6/3 , 6/9 build on geeksphone and 2 on unagi.

Something odd with cloning the Domstorage.  Need to look into when this occurs (for blackbox reproduction)

Might be fixed with a null check?  Not sure.  I can't find where the PContentChild.cpp is located...

I'm not sure how other devices will be affected; it doesn't look like a device specific crash.
Flags: needinfo?(nhirata.bugzilla)
It's something to do with storing data in sessionStorage, then causing opening a new window to the same domain to open as the window is being closed (maybe in an unload handler?)
Not blocking due to lack of STR here and at this time we are only blocking on crash issues which are actionable..Please renominate if we have reliable STR here.
blocking-b2g: leo? → -
(In reply to bhavana bajaj [:bajaj] from comment #7)
> Not blocking due to lack of STR here
Why was bug 871574 marked as leo+ two days later while it has also no STR? In addition, this bug is higher than bug 871574 (more than twice the volume).
It's #1 unfixed top crasher in B2G 18.0 so either it's specific to FxOS 1.0 (fixed in 1.1) or there's a big hole in the test coverage.
(In reply to Scoobidiver from comment #9)
> It's #1 unfixed top crasher in B2G 18.0

On which devices? If it's the Geeksphones, they are running 1.0.1 (unless people compile themselves, but then they'd not get symbols resolved, i.e. not the signature in this bug).
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #10)
> On which devices?
If you have 100 Geekphones for one non-GP in the wild, you have more chance to see crashes on GP. Crashes for not-yet-released devices are in the crash stat noise.
(In reply to Scoobidiver from comment #11)
> If you have 100 Geekphones for one non-GP in the wild

Don't make assumptions you don't have numbers for. I know this is not the case due to numbers I have actually seen (but which probably cannot be public at this point).
Also, that comment doesn't change that 1) the "topcrash" list for "B2G 18.0" is mostly bogus and 2) that you cannot argue about what should be cared about with 1.1 when using data from Geeksphones which run on 1.0.1.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> 1) the "topcrash" list for "B2G 18.0" is mostly bogus
What does mean bogus in that context?
If current crash stats mean nothing, the topcrash keyword for B2G shouldn't be set and https://wiki.mozilla.org/CrashKill/Topcrash should be updated accordingly.

> 2) that you cannot argue about what should be cared about with 1.1 when using data
> from Geeksphones which run on 1.0.1.
What does mean "either it's specific to FxOS 1.0 (fixed in 1.1)" for you?
Here is crash on Unagi: bp-a431163f-b4ce-4112-ae4e-8c63e2130610.
(In reply to Scoobidiver from comment #13)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> > 1) the "topcrash" list for "B2G 18.0" is mostly bogus
> What does mean bogus in that context?

http://en.wiktionary.org/wiki/bogus

> If current crash stats mean nothing, the topcrash keyword for B2G shouldn't
> be set and https://wiki.mozilla.org/CrashKill/Topcrash should be updated
> accordingly.

Doesn't need to, as we basically have no crashes at all yet from "what we do support" anyhow. We don't officially support Geeksphones at this time, and the various test devices are only half-supported.
Also, that document is a guideline, not a 100% tight roleset.

> > 2) that you cannot argue about what should be cared about with 1.1 when using data
> > from Geeksphones which run on 1.0.1.
> What does mean "either it's specific to FxOS 1.0 (fixed in 1.1)" for you?

Nothing. If it's specific to 1.0, then we should just ignore it and discussing it at all is a waste of our precious time (and I mean both of us, as you are doing awesome work around here).


(In reply to Scoobidiver from comment #14)
> Here is crash on Unagi: bp-a431163f-b4ce-4112-ae4e-8c63e2130610.

That's quite valuable. https://crash-stats-django.mozilla.org/report/index/a431163f-b4ce-4112-ae4e-8c63e2130610 nicely shows that this crash is actually from 1.1, so we can confirm that it also happens there. Given that 1.0.1 development is done and 1.1 will be the (security and otherwise) update for 1.0.1 users, seeing this on 1.1 means that we still need to work on it. Thanks for finding this one!
Some exploratory work around comment 6 is probably the most effective avenue of attack at this point.
Duplicate of this bug: 893200
Bug 893200 has STR to reproduce this crash as well as my latest crash report.
Keywords: reproducible
blocking-b2g: - → leo?
Nominating mainly cause this is now a top crash that we can reproduce. Marcia has a test account on facebook that does reproduce this crash, so we do have a way to replicate this.

FWIW - I also the theory in comment 6 sounds right.
https://https://www.facebook.com/LeicaCamera?ref=stream&hc_location=timeline - I clicked the link on the second post on that page in order to reproduce the crash: 

"The Montreux Jazz Festival is in full swing! We have more images featuring Green Day, The Lumineers, Gregory Porter and more. Check them out here: http://bit.ly/16uwH8R."
Triage - Leo+ per comment 19, top crasher with STR.
blocking-b2g: leo? → leo+
jst, can you tag somebody to take a look at this?
Assignee: nobody → jst
(In reply to Alex Keybl [:akeybl] from comment #23)
> jst, can you tag somebody to take a look at this?

I talked to jdm about this in IRC - I think he said he would look into this.
Assignee: jst → josh
Can't reproduce in the emulator; I'll try on a device tomorrow.
I think this should cover it. Some memory will be retained longer than necessary if we never need to clone some storages, but I don't think we have any way of determining that beforehand.

That being said, while testing on an unagi I saw the browser process OOM at some point after the link from the STR finished loading, which isn't that much better than a segfault.
Attachment #777526 - Flags: review?(honzab.moz)
Would there be value spinning a try build here to do some exploratory testing with Marcia's test FB test account and the STR used to repro the crash?
Absolutely.
If you want me to kick it off, let me know; I'm operating under the assumption that you'll make the build yourself at this point.
(In reply to Josh Matthews [:jdm] from comment #29)
> If you want me to kick it off, let me know; I'm operating under the
> assumption that you'll make the build yourself at this point.

Oh. I actually would prefer if you would kick off a try build. Can you kick off a try build with this patch?
Flags: needinfo?(mozillamarcia.knous)
Comment on attachment 777526 [details] [diff] [review]
Keep all session storage IPC connections alive until there's no chance of forking/cloning them

Review of attachment 777526 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch.  r=honzab

Note: I don't think this needs to be ported to the new domstore code.  Session storages are not shared between processes, only live as memory storage caches per-process.  And go away when docshell goes away.
Attachment #777526 - Flags: review?(honzab.moz) → review+
I tested the tryserver build in Comment 31 on an unagi and I was not able to reproduce the Facebook crash using the same URL as in Comment 20 (http://bit.ly/16uwH8R). I repeated logging in and out several times and launching the link just to be sure, since that was one way I was consistently able to reproduce the crash on Leo.

I tested again using the latest Leo nightly and I can consistently crash the first time I click the link when I start a Facebook session.
Flags: needinfo?(mozillamarcia.knous)
Keywords: checkin-needed
Whiteboard: [b2g-crash] → [b2g-crash] [only checkin to b2g18]
https://hg.mozilla.org/releases/mozilla-b2g18/rev/9c6987190007
Keywords: checkin-needed
Whiteboard: [b2g-crash] [only checkin to b2g18] → [b2g-crash]
Target Milestone: --- → 1.1 QE5
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [b2g-crash] → [b2g-crash] [STR: Comment 20]
You need to log in before you can comment on or make changes to this bug.