Closed Bug 881723 Opened 6 years ago Closed 6 years ago
crash in mozilla::dom::PContent
Child::Write with abort message: "actor has been |delete|d"
Seen on Unagi, Peak, and Keon. Here is a crash report: bp-a431163f-b4ce-4112-ae4e-8c63e2130610. Frame Module Signature Source 0 libxul.so mozalloc_abort mozalloc_abort.cpp:30 1 libxul.so NS_DebugBreak_P nsDebugImpl.cpp:423 2 libxul.so mozilla::dom::PContentChild::Write PContentChild.cpp:5766 3 libxul.so mozilla::dom::PContentChild::Write PContentChild.cpp:5415 4 libxul.so mozilla::dom::PContentChild::Write PContentChild.cpp:5216 5 libxul.so mozilla::dom::PContentChild::SendPStorageConstructor PContentChild.cpp:900 6 libxul.so mozilla::dom::StorageChild::CloneFrom StorageChild.cpp:280 7 libxul.so nsDOMStorage::CloneFrom nsDOMStorage.cpp:1439 8 libxul.so nsDOMStorage2::Clone nsDOMStorage.cpp:1581 9 libxul.so nsWindowWatcher::OpenWindowInternal nsWindowWatcher.cpp:972 10 libxul.so nsWindowWatcher::OpenWindow2 nsWindowWatcher.cpp:471 11 libxul.so nsGlobalWindow::OpenInternal nsGlobalWindow.cpp:9383 12 libxul.so nsGlobalWindow::OpenInternal nsGlobalWindow.cpp:9266 13 libxul.so nsGlobalWindow::OpenJS nsGlobalWindow.cpp:5966 14 libxul.so NS_InvokeByIndex_P xptcinvoke_arm.cpp:160 15 libxul.so XPCWrappedNative::CallMethod XPCWrappedNative.cpp:3084 16 libxul.so XPC_WN_CallMethod XPCWrappedNativeJSOps.cpp:1469 17 libxul.so js::InvokeKernel jscntxtinlines.h:364 18 libxul.so js::Interpret jsinterp.cpp:2475 19 libxul.so js::RunScript jsinterp.cpp:324 20 libxul.so UncachedInlineCall InvokeHelpers.cpp:363 21 libxul.so js::mjit::stubs::UncachedCallHelper InvokeHelpers.cpp:451 22 libxul.so js::mjit::CallCompiler::update MonoIC.cpp:1220 23 libxul.so js::mjit::ic::Call MonoIC.cpp:1298 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Adom%3A%3APContentChild%3A%3AWrite
Component: General → DOM: Core & HTML
Product: Boot2Gecko → Core
Use the new implementation :D OK, seriously. Is this reproducible?
Presumably this is fallout from the changes that neutered the IPC connection when the owning window was being destroyed. Maybe we shouldn't neuter the session storage until later, if it's the only kind that gets cloned.
It's #2 top crasher in B2G 18.0.
blocking-b2g: --- → leo?
Setting notes to follow up on this bug for myself.
18 crashes on the 6/3 , 6/9 build on geeksphone and 2 on unagi. Something odd with cloning the Domstorage. Need to look into when this occurs (for blackbox reproduction) Might be fixed with a null check? Not sure. I can't find where the PContentChild.cpp is located... I'm not sure how other devices will be affected; it doesn't look like a device specific crash.
It's something to do with storing data in sessionStorage, then causing opening a new window to the same domain to open as the window is being closed (maybe in an unload handler?)
Not blocking due to lack of STR here and at this time we are only blocking on crash issues which are actionable..Please renominate if we have reliable STR here.
blocking-b2g: leo? → -
(In reply to bhavana bajaj [:bajaj] from comment #7) > Not blocking due to lack of STR here Why was bug 871574 marked as leo+ two days later while it has also no STR? In addition, this bug is higher than bug 871574 (more than twice the volume).
It's #1 unfixed top crasher in B2G 18.0 so either it's specific to FxOS 1.0 (fixed in 1.1) or there's a big hole in the test coverage.
(In reply to Scoobidiver from comment #9) > It's #1 unfixed top crasher in B2G 18.0 On which devices? If it's the Geeksphones, they are running 1.0.1 (unless people compile themselves, but then they'd not get symbols resolved, i.e. not the signature in this bug).
(In reply to Robert Kaiser (:email@example.com) from comment #10) > On which devices? If you have 100 Geekphones for one non-GP in the wild, you have more chance to see crashes on GP. Crashes for not-yet-released devices are in the crash stat noise.
(In reply to Scoobidiver from comment #11) > If you have 100 Geekphones for one non-GP in the wild Don't make assumptions you don't have numbers for. I know this is not the case due to numbers I have actually seen (but which probably cannot be public at this point). Also, that comment doesn't change that 1) the "topcrash" list for "B2G 18.0" is mostly bogus and 2) that you cannot argue about what should be cared about with 1.1 when using data from Geeksphones which run on 1.0.1.
(In reply to Robert Kaiser (:firstname.lastname@example.org) from comment #12) > 1) the "topcrash" list for "B2G 18.0" is mostly bogus What does mean bogus in that context? If current crash stats mean nothing, the topcrash keyword for B2G shouldn't be set and https://wiki.mozilla.org/CrashKill/Topcrash should be updated accordingly. > 2) that you cannot argue about what should be cared about with 1.1 when using data > from Geeksphones which run on 1.0.1. What does mean "either it's specific to FxOS 1.0 (fixed in 1.1)" for you?
Here is crash on Unagi: bp-a431163f-b4ce-4112-ae4e-8c63e2130610.
(In reply to Scoobidiver from comment #13) > (In reply to Robert Kaiser (:email@example.com) from comment #12) > > 1) the "topcrash" list for "B2G 18.0" is mostly bogus > What does mean bogus in that context? http://en.wiktionary.org/wiki/bogus > If current crash stats mean nothing, the topcrash keyword for B2G shouldn't > be set and https://wiki.mozilla.org/CrashKill/Topcrash should be updated > accordingly. Doesn't need to, as we basically have no crashes at all yet from "what we do support" anyhow. We don't officially support Geeksphones at this time, and the various test devices are only half-supported. Also, that document is a guideline, not a 100% tight roleset. > > 2) that you cannot argue about what should be cared about with 1.1 when using data > > from Geeksphones which run on 1.0.1. > What does mean "either it's specific to FxOS 1.0 (fixed in 1.1)" for you? Nothing. If it's specific to 1.0, then we should just ignore it and discussing it at all is a waste of our precious time (and I mean both of us, as you are doing awesome work around here). (In reply to Scoobidiver from comment #14) > Here is crash on Unagi: bp-a431163f-b4ce-4112-ae4e-8c63e2130610. That's quite valuable. https://crash-stats-django.mozilla.org/report/index/a431163f-b4ce-4112-ae4e-8c63e2130610 nicely shows that this crash is actually from 1.1, so we can confirm that it also happens there. Given that 1.0.1 development is done and 1.1 will be the (security and otherwise) update for 1.0.1 users, seeing this on 1.1 means that we still need to work on it. Thanks for finding this one!
Some exploratory work around comment 6 is probably the most effective avenue of attack at this point.
Nominating mainly cause this is now a top crash that we can reproduce. Marcia has a test account on facebook that does reproduce this crash, so we do have a way to replicate this. FWIW - I also the theory in comment 6 sounds right.
https://https://www.facebook.com/LeicaCamera?ref=stream&hc_location=timeline - I clicked the link on the second post on that page in order to reproduce the crash: "The Montreux Jazz Festival is in full swing! We have more images featuring Green Day, The Lumineers, Gregory Porter and more. Check them out here: http://bit.ly/16uwH8R."
Triage - Leo+ per comment 19, top crasher with STR.
blocking-b2g: leo? → leo+
jst, can you tag somebody to take a look at this?
Assignee: nobody → jst
(In reply to Alex Keybl [:akeybl] from comment #23) > jst, can you tag somebody to take a look at this? I talked to jdm about this in IRC - I think he said he would look into this.
Can't reproduce in the emulator; I'll try on a device tomorrow.
I think this should cover it. Some memory will be retained longer than necessary if we never need to clone some storages, but I don't think we have any way of determining that beforehand. That being said, while testing on an unagi I saw the browser process OOM at some point after the link from the STR finished loading, which isn't that much better than a segfault.
Attachment #777526 - Flags: review?(honzab.moz)
Would there be value spinning a try build here to do some exploratory testing with Marcia's test FB test account and the STR used to repro the crash?
If you want me to kick it off, let me know; I'm operating under the assumption that you'll make the build yourself at this point.
(In reply to Josh Matthews [:jdm] from comment #29) > If you want me to kick it off, let me know; I'm operating under the > assumption that you'll make the build yourself at this point. Oh. I actually would prefer if you would kick off a try build. Can you kick off a try build with this patch?
Comment on attachment 777526 [details] [diff] [review] Keep all session storage IPC connections alive until there's no chance of forking/cloning them Review of attachment 777526 [details] [diff] [review]: ----------------------------------------------------------------- Good catch. r=honzab Note: I don't think this needs to be ported to the new domstore code. Session storages are not shared between processes, only live as memory storage caches per-process. And go away when docshell goes away.
Attachment #777526 - Flags: review?(honzab.moz) → review+
I tested the tryserver build in Comment 31 on an unagi and I was not able to reproduce the Facebook crash using the same URL as in Comment 20 (http://bit.ly/16uwH8R). I repeated logging in and out several times and launching the link just to be sure, since that was one way I was consistently able to reproduce the crash on Leo. I tested again using the latest Leo nightly and I can consistently crash the first time I click the link when I start a Facebook session.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [b2g-crash] → [b2g-crash] [STR: Comment 20]
You need to log in before you can comment on or make changes to this bug.