Closed Bug 976479 (IToplevelProtocol) Opened 6 years ago Closed 6 years ago

Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html,test_browserElement_oop_GetScreenshot.html,test_autoIncrement.html,test_browserElement_oop_PromptConfirm.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]

Categories

(Core :: DOM: Core & HTML, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla31
Tracking Status
firefox29 --- unaffected
firefox30 --- fixed
firefox31 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: cbook, Assigned: mrbkap)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(1 file)

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound opt test mochitest-2 on 2014-02-24 20:31:00 PST for push 12661b7737d4

slave: talos-mtnlion-r5-057

https://tbpl.mozilla.org/php/getParsedLog.php?id=35188220&tree=Mozilla-Inbound



TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_XFrameOptionsAllowFrom.html | application terminated with exit code 1
PROCESS-CRASH | /tests/dom/browser-element/mochitest/test_browserElement_oop_XFrameOptionsAllowFrom.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]

20:35:38  WARNING -  PROCESS-CRASH | /tests/dom/browser-element/mochitest/test_browserElement_oop_XFrameOptionsAllowFrom.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
20:35:38     INFO -  Crash dump filename: /var/folders/48/lhw5r9md7k706sy9km0t8zkm00000w/T/tmpZ2gQdS/minidumps/0E1CC770-3D48-446A-BB58-A5899049BE0C.dmp
20:35:38     INFO -  Operating system: Mac OS X
20:35:38     INFO -                    10.8.0 12A269
20:35:38     INFO -  CPU: amd64
20:35:38     INFO -       family 6 model 42 stepping 7
20:35:38     INFO -       8 CPUs
20:35:38     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
20:35:38     INFO -  Crash address: 0x0
20:35:38     INFO -  Thread 0 (crashed)
20:35:38     INFO -   0  XUL!mozilla::ipc::IToplevelProtocol::~IToplevelProtocol() [LinkedList.h:12661b7737d4 : 204 + 0x8]
20:35:38     INFO -      rbx = 0x0000000112891818   r12 = 0x00000001100df008
20:35:38     INFO -      r13 = 0x00007fff5fbfd2f0   r14 = 0x0000000112891800
20:35:38     INFO -      r15 = 0x00000000000000c1   rip = 0x00000001016775a8
20:35:38     INFO -      rsp = 0x00007fff5fbfd1b0   rbp = 0x00007fff5fbfd1b0
20:35:38     INFO -      Found by: given as instruction pointer in context
20:35:38     INFO -   1  XUL!mozilla::dom::PContentParent::~PContentParent() [PContentParent.cpp:12661b7737d4 : 191 + 0x10]
20:35:38     INFO -      rbx = 0x0000000112891818   r12 = 0x00000001100df008
20:35:38     INFO -      r13 = 0x00007fff5fbfd2f0   r14 = 0x0000000112891800
20:35:38     INFO -      r15 = 0x00000000000000c1   rip = 0x0000000101723759
20:35:38     INFO -      rsp = 0x00007fff5fbfd1c0   rbp = 0x00007fff5fbfd1e0
20:35:38     INFO -      Found by: call frame info
20:35:38     INFO -   2  XUL!mozilla::dom::ContentParent::~ContentParent() [ContentParent.cpp:12661b7737d4 : 1503 + 0x4]
20:35:38     INFO -      rbx = 0x0000000112891800   r12 = 0x00000001100df008
20:35:38     INFO -      r13 = 0x00007fff5fbfd2f0   r14 = 0x000000000000000c
20:35:38     INFO -      r15 = 0x0000000000000024   rip = 0x00000001020f9f8e
20:35:38     INFO -      rsp = 0x00007fff5fbfd1f0   rbp = 0x00007fff5fbfd200
20:35:38     INFO -      Found by: call frame info
20:35:38     INFO -   3  XUL!mozilla::dom::ContentParent::Release() [ContentParent.cpp:12661b7737d4 : 1979 + 0x5]
20:35:38     INFO -      rbx = 0x0000000000000000   r12 = 0x00000001100df008
20:35:38     INFO -      r13 = 0x00007fff5fbfd2f0   r14 = 0x000000000000000c
20:35:38     INFO -      r15 = 0x0000000000000024   rip = 0x00000001020fabc4
20:35:38     INFO -      rsp = 0x00007fff5fbfd210   rbp = 0x00007fff5fbfd220
20:35:38     INFO -      Found by: call frame info
20:35:38     INFO -   4  XUL!nsFrameLoader::~nsFrameLoader() [nsAutoPtr.h:12661b7737d4 : 900 + 0x8]
20:35:38     INFO -      rbx = 0x000000010d2acc40   r12 = 0x00000001100df008
20:35:38     INFO -      r13 = 0x00007fff5fbfd2f0   r14 = 0x000000000000000c
20:35:38     INFO -      r15 = 0x0000000000000024   rip = 0x000000010259cfa5
20:35:38     INFO -      rsp = 0x00007fff5fbfd230   rbp = 0x00007fff5fbfd240
Duplicate of this bug: 976645
Component: General → DOM
Severity: normal → critical
Crash Signature: [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
Hardware: x86 → x86_64
Summary: Intermittent | test_browserElement_oop_XFrameOptionsAllowFrom.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()] → Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
https://tbpl.mozilla.org/php/getParsedLog.php?id=35280262&tree=Fx-Team
Summary: Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()] → Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html, test_browserElement_oop_GetScreenshot.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
https://tbpl.mozilla.org/php/getParsedLog.php?id=35298960&tree=Mozilla-Inbound
Summary: Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html, test_browserElement_oop_GetScreenshot.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()] → Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html,test_browserElement_oop_GetScreenshot.html,test_autoIncrement.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
Summary: Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html,test_browserElement_oop_GetScreenshot.html,test_autoIncrement.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()] → Intermittent test_browserElement_oop_XFrameOptionsAllowFrom.html,test_browserElement_oop_GetScreenshot.html,test_autoIncrement.html,test_browserElement_oop_PromptConfirm.html | application crashed [@ mozilla::ipc::IToplevelProtocol::~IToplevelProtocol()]
https://tbpl.mozilla.org/php/getParsedLog.php?id=35612641&tree=Mozilla-Inbound

Kyle, any suggestions for who might be able to take a look at this semi-frequent OSX IPC crash?
Flags: needinfo?(khuey)
Olli, perhaps?
Flags: needinfo?(khuey)
Tag, you're it! /me ducks
Flags: needinfo?(bugs)
https://tbpl.mozilla.org/php/getParsedLog.php?id=35804570&tree=Fx-Team

Guess I've got one for the platform meeting next week at this rate.
This seems to be an OSX-only issue, and I don't have any such device.
Flags: needinfo?(bugs)
I haven't found anything obviously wrong causing this.
Bug 974356 landed after this started to happen.
Ah, Bug 956218 landed recently.
(I hate when hg doesn't show reasonable dates.)
The full stack of the crash is last-release via the SnowWhiteKiller:

0  XUL!mozilla::ipc::IToplevelProtocol::~IToplevelProtocol() [LinkedList.h:341c371eb41b : 204 + 0x8]
1  XUL!mozilla::dom::PContentParent::~PContentParent() [PContentParent.cpp:341c371eb41b : 200 + 0x10]
2  XUL!mozilla::dom::ContentParent::~ContentParent() [ContentParent.cpp:341c371eb41b : 1479 + 0x4]
3  XUL!mozilla::dom::ContentParent::Release() [ContentParent.cpp:341c371eb41b : 1964 + 0x5]
4  XUL!nsFrameLoader::~nsFrameLoader() [nsAutoPtr.h:341c371eb41b : 900 + 0x8]
5  XUL!nsFrameLoader::~nsFrameLoader() [nsFrameLoader.cpp:341c371eb41b : 293 + 0x4]
6  XUL!SnowWhiteKiller::~SnowWhiteKiller() [nsCycleCollector.cpp:341c371eb41b : 2352 + 0xd]
7  XUL!nsCycleCollector::FreeSnowWhite(bool) [nsCycleCollector.cpp:341c371eb41b : 2345 + 0x4]
8  XUL!nsCycleCollector_doDeferredDeletion() [nsCycleCollector.cpp:341c371eb41b : 3788 + 0x4]
9  XUL!AsyncFreeSnowWhite::Run() [XPCJSRuntime.cpp:341c371eb41b : 211 + 0x4]
10  XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:341c371eb41b : 694 + 0x5]
... eventloop

Kinda sucks and it's also weird that this is opt-only; it feels like double deletion of the ContentParent, although it could also be that we've screwed up the linked list which is IToplevelProtocol::mOpenActors.

It appears that IToplevelProtocol makes assumptions about single-threading mOpenActors. I wonder if the PBackground bug changes that assumption...
Flags: needinfo?(mrbkap)
Flags: needinfo?(bent.mozilla)
(In reply to Olli Pettay [:smaug] from comment #68)
> Ah, Bug 956218 landed recently.
> (I hate when hg doesn't show reasonable dates.)

Good call. Retriggers confirm it as the cause.
Attached patch Possible fixSplinter Review
bent and I looked at this, we don't think that the PBackground patch is at fault; however, it exposed a bug in the gfx code. For NUWA, we now maintain a list of toplevel protocols, adding to the list when a protocol actor is created and removing it when the actor is destroyed (in the destructor of LinkedListElement<T>). Before the PBackground patch, the only toplevel protocols that we created were PContentParent and gfx stuff. The gfx stuff was guaranteed to be destroyed before PContentParent, even though it was destroyed on the IO thread instead of the main thread. Now, it races with PBackground being destroyed on the main thread. The current NUWA stuff requires that it be destroyed in a synchronized way with the main thread.

Of note here as well: with this patch, we still release one pair of ImageBridge{Child,Parent} off the main thread: the static ref pointers in ImageBridgeChild.cpp seem to only be cleared when the main thread is waiting (and not spinning the event loop) for the image thread to be destroyed, so there shouldn't be a race there.

bjacob, what do you think of this?
Attachment #8393248 - Flags: review?(bjacob)
Assignee: nobody → mrbkap
Flags: needinfo?(mrbkap)
Flags: needinfo?(bent.mozilla)
bjacob is on PTO, anybody else who could look at this?
Flags: needinfo?(mrbkap)
Comment on attachment 8393248 [details] [diff] [review]
Possible fix

Nical, would you mind taking a look here? Please see comment 94 for an explanation of what's going on.
Attachment #8393248 - Flags: review?(nical.bugzilla)
Flags: needinfo?(mrbkap)
Comment on attachment 8393248 [details] [diff] [review]
Possible fix

Review of attachment 8393248 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good except I have a doubt about one thing: I am not sure we can still schedule events in the main thread's event loop this late in the shutdown sequence. This may not be a problem in b2g because we don't shut down image bridge at the same time as when shutting down the phone, but will probably be in other flavors of Gecko. In fact we seem to be hitting this problem already in bug 924622.
(In reply to Nicolas Silva [:nical] from comment #105)
> It looks good except I have a doubt about one thing: I am not sure we can
> still schedule events in the main thread's event loop this late in the
> shutdown sequence.

Hm, that's possible I guess but we need to fix that as part of bug 924622 I would think.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #110)
> Hm, that's possible I guess but we need to fix that as part of bug 924622 I
> would think.

The fix for bug 924622 might end up being something where we can schedule an event in the IPDL's chromium thread event loop (with the main thread blocked on it) but still not be able to post to the main thread's event loop.
I don't know much about gecko's shutdown sequence. Just warning that there may be (potentially intermittent) issues with doing this. I don't know who'd be the best person to ask about shutdown stuff.
Comment on attachment 8393248 [details] [diff] [review]
Possible fix

Review of attachment 8393248 [details] [diff] [review]:
-----------------------------------------------------------------

So my conclusion: r=me because the layers-side of the logic looks correct, but I am lacking knowledge about Gecko's shutdown to be 100% sure that this won't cause problems.
Attachment #8393248 - Flags: review?(nical.bugzilla) → review+
(In reply to comment #111)
> (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #110)
> > Hm, that's possible I guess but we need to fix that as part of bug 924622 I
> > would think.
> 
> The fix for bug 924622 might end up being something where we can schedule an
> event in the IPDL's chromium thread event loop (with the main thread blocked on
> it) but still not be able to post to the main thread's event loop.
> I don't know much about gecko's shutdown sequence. Just warning that there may
> be (potentially intermittent) issues with doing this. I don't know who'd be the
> best person to ask about shutdown stuff.

bsmedberg?
If we're so late in shutdown that NS_DispatchToMainThread fails then we'll simply leak a bit of memory for another few instructions. That's safer than what we're doing now, certainly.
https://hg.mozilla.org/mozilla-central/rev/100798d1cc7b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Looks like it's working :). Please request Aurora approval on this when you get a chance.
Flags: needinfo?(mrbkap)
Comment on attachment 8393248 [details] [diff] [review]
Possible fix

This applies cleanly to Aurora.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): n/a
User impact if declined: Random crashes on shutdown (or possibly whenever we shut down a child process).
Testing completed (on m-c, etc.): On m-c for a few days.
Risk to taking this patch (and alternatives if risky): Low risk.
Attachment #8393248 - Flags: approval-mozilla-aurora?
Flags: needinfo?(mrbkap)
Attachment #8393248 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 999221
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.