Closed Bug 824069 Opened 12 years ago Closed 11 years ago

Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | application crashed [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)] [@ mozalloc_abort(char const*)]

Categories

(Core Graveyard :: Plug-ins, defect, P2)

x86_64
macOS
defect

Tracking

(firefox20 unaffected, firefox21 fixed, firefox22 fixed)

RESOLVED FIXED
mozilla21
Tracking Status
firefox20 --- unaffected
firefox21 --- fixed
firefox22 --- fixed

People

(Reporter: emorley, Assigned: gfritzsche)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(1 file)

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound opt test mochitest-1 on 2012-12-21 01:42:31 PST for push 53accd955c2b

slave: talos-mtnlion-r5-012

https://tbpl.mozilla.org/php/getParsedLog.php?id=18156129&tree=Mozilla-Inbound

{
120816 INFO TEST-START | /tests/content/html/content/test/test_bug582412-1.html
120817 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120818 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120819 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120820 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120821 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120822 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120823 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120824 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120825 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120826 INFO TEST-PASS | /tests/content/html/content/test/test_bug582412-1.html | the method/formmethod attribute doesn't have the correct behavior
120827 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_bug582412-1.html | This test left crash dumps behind, but we weren't expecting it to!
...
...
PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)]
Crash dump filename: /var/folders/6g/6wv8f19j2bs70hgxfy8253fm00000w/T/tmpoBeHiN/minidumps/6B33CB17-0E55-490C-950A-09915D245C8D.dmp
Operating system: Mac OS X
                  10.8.0 12A269
CPU: amd64
     family 6 model 42 stepping 7
     8 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x18

Thread 0 (crashed)
 0  XUL!mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*) [PluginModuleChild.cpp : 753 + 0x0]
    rbx = 0x0000000105aff420   r12 = 0x0000000000000000
    r13 = 0x0000000000000002   r14 = 0x0000000108f2f160
    r15 = 0x0000000105a81540   rip = 0x0000000100f1fca9
    rsp = 0x00007fff5fbfc070   rbp = 0x00007fff5fbfc070
    Found by: given as instruction pointer in context
 1  XUL!mozilla::plugins::PluginScriptableObjectChild::ScriptableDeallocate(NPObject*) [PluginScriptableObjectChild.cpp : 577 + 0x10]
    rbx = 0x0000000105aff420   r12 = 0x0000000000000000
    r13 = 0x0000000000000002   r14 = 0x0000000108f2f160
    r15 = 0x0000000105a81540   rip = 0x0000000100f26fa0
    rsp = 0x00007fff5fbfc080   rbp = 0x00007fff5fbfc090
    Found by: call frame info
 2  XUL!mozilla::plugins::PluginModuleChild::DeallocNPObject(NPObject*) [PluginModuleChild.cpp : 2169 + 0x4]
    rbx = 0x0000000108f2f160   r12 = 0x0000000000000000
    r13 = 0x0000000000000002   r14 = 0x0000000108f3bc40
    r15 = 0x0000000105a81540   rip = 0x0000000100f229af
    rsp = 0x00007fff5fbfc0a0   rbp = 0x00007fff5fbfc0b0
    Found by: call frame info
 3  XUL!DeleteObject [PluginInstanceChild.cpp : 3948 + 0x7]
    rbx = 0x0000000100f1c200   r12 = 0x0000000000000000
    r13 = 0x0000000000000002   r14 = 0x0000000108f3bc40
    r15 = 0x0000000105a81540   rip = 0x0000000100f1bfea
    rsp = 0x00007fff5fbfc0c0   rbp = 0x00007fff5fbfc0c0
    Found by: call frame info
 4  XUL!PL_DHashTableEnumerate [pldhash.cpp : 717 + 0xf]
    rbx = 0x0000000100f1c200   r12 = 0x0000000000000000
    r13 = 0x0000000000000002   r14 = 0x0000000108f3bc40
    r15 = 0x0000000105a81540   rip = 0x00000001010ce0e6
    rsp = 0x00007fff5fbfc0d0   rbp = 0x00007fff5fbfc120
    Found by: call frame info
 5  XUL!mozilla::plugins::PluginInstanceChild::AnswerNPP_Destroy(short*) [nsTHashtable.h : 237 + 0x7]
    rbx = 0x0000000100f1c230   r12 = 0x0000000108fda400
    r13 = 0x00007fff5fbfc710   r14 = 0x0000000108fda400
    r15 = 0x0000000000000000   rip = 0x0000000100f1be99
    rsp = 0x00007fff5fbfc130   rbp = 0x00007fff5fbfc180
    Found by: call frame info
 6  XUL!mozilla::plugins::PPluginInstanceChild::OnCallReceived(IPC::Message const&, IPC::Message*&) [PPluginInstanceChild.cpp : 2355 + 0x12]
    rbx = 0x00007fff5fbfc710   r12 = 0x0000000000000007
    r13 = 0x00007fff5fbfc710   r14 = 0x0000000108fda400
}
Steven, bug 804606 was the last to touch PluginModuleChild.cpp, could it be responsible?
This is almost certainly from the prior test_bug579079.html. I believe that this is a Java plugin shutting down, and probably caused by bug 823559 which forces Java OOPP (just landed on inbound).

I'm not sure what to do about bugs in the Java plugin which show up in our tests suites, though, other than disable java.
Summary: Intermittent test_bug582412-1.html | This test left crash dumps behind, but we weren't expecting it to! [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)] → Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | This test left crash dumps behind, but we weren't expecting it to! [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)]
Blocks: 823559
gfritzsche, this may be sorta-fixable. See bug 544074 for some prior art related to this (and ever farther back bug 540868). What we're doing here is:

after NPP_Destroy is called on the instance, we collect all the NPObjects related to that instance: http://hg.mozilla.org/mozilla-central/annotate/feb09cb872f1/dom/plugins/ipc/PluginInstanceChild.cpp#l4121 and then invalidate them and delete them.

We collect them off into a separate hash because the plugins were mutating the main hash while we were destroying the objects. But what appears to be happening now is, while we're enumerating the deleting hash, the plugin is doing additional work (maybe from C++ destructors) and deleting objects itself. So we end up here: http://hg.mozilla.org/mozilla-central/annotate/feb09cb872f1/dom/plugins/ipc/PluginModuleChild.cpp#l753 and the object is no longer "alive" (not in the hash) so we're dereferencing near-null.

I'm not exactly sure why this is happening: the check at http://hg.mozilla.org/mozilla-central/annotate/feb09cb872f1/dom/plugins/ipc/PluginInstanceChild.cpp#l3935 is supposed to guard against this, I think. But in any case, we can probably wallpaper over the problem by adding a null-check (and probably an assertion so this doesn't get completely lost) in UnregisterActorForObject.
Assignee: nobody → georg.fritzsche
(Might clean up TBPL's matching)
Summary: Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | This test left crash dumps behind, but we weren't expecting it to! [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)] → Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | application crashed [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)]
I won't be able to look into this before January. Does this need backout or is this ok until then?
(In reply to Georg Fritzsche [:gfritzsche] [away Dec 24 - Jan 1] from comment #9)
> I won't be able to look into this before January. Does this need backout or
> is this ok until then?

Sorry, I was away or I would have replied sooner.
Ideally this would have had a backout, but if it's only going to be a few more days we might as well leave in at this point

Thank you for looking at this :-0
s/0/\)/

:-)
(In reply to Georg Fritzsche [:gfritzsche] from comment #47)
> Created attachment 697076 [details] [diff] [review]
> Check if entry is in object map
> 
> https://tbpl.mozilla.org/?tree=Try&rev=b27a902708c9

This leads to another crash stack which i'll look into later today. E.g.:

https://tbpl.mozilla.org/php/getParsedLog.php?id=18404979&tree=Try&full=1#error1

 0  libmozalloc.dylib!mozalloc_abort(char const*) [mozalloc_abort.cpp : 23 + 0x0]
 1  XUL!Abort [nsDebugImpl.cpp : 423 + 0x4]
 2  XUL!NS_DebugBreak_P [nsDebugImpl.cpp : 380 + 0x7]
 3  XUL!mozilla::plugins::PPluginScriptableObject::Transition(mozilla::plugins::PPluginScriptableObject::State, mozilla::ipc::Trigger, mozilla::plugins::PPluginScriptableObject::State*) [PPluginScriptableObject.cpp : 34 + 0x1f]
 4  XUL!mozilla::plugins::PPluginScriptableObjectChild::SendUnprotect() [PPluginScriptableObjectChild.cpp : 530 + 0x16]
 5  XUL!mozilla::plugins::PluginScriptableObjectChild::ScriptableDeallocate(NPObject*) [PluginScriptableObjectChild.cpp : 580 + 0x7]
 6  XUL!mozilla::plugins::PluginModuleChild::DeallocNPObject(NPObject*) [PluginModuleChild.cpp : 2173 + 0x4]
 7  XUL!DeleteObject [PluginInstanceChild.cpp : 3948 + 0x7]
 8  XUL!PL_DHashTableEnumerate [pldhash.cpp : 717 + 0xf]
 9  XUL!mozilla::plugins::PluginInstanceChild::AnswerNPP_Destroy(short*) [nsTHashtable.h : 237 + 0x7]
 [...]
Status: NEW → ASSIGNED
(In reply to Georg Fritzsche [:gfritzsche] from comment #55)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #47)
> > Created attachment 697076 [details] [diff] [review]
> > Check if entry is in object map
> > 
> > https://tbpl.mozilla.org/?tree=Try&rev=b27a902708c9
> 
> This leads to another crash stack which i'll look into later today. E.g.:

Did you get a chance to have another look at this? :-)
(In reply to Ed Morley [:edmorley UTC+0] from comment #117)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #55)
> > (In reply to Georg Fritzsche [:gfritzsche] from comment #47)
> > > Created attachment 697076 [details] [diff] [review]
> > > Check if entry is in object map
> > > 
> > > https://tbpl.mozilla.org/?tree=Try&rev=b27a902708c9
> > 
> > This leads to another crash stack which i'll look into later today. E.g.:
> 
> Did you get a chance to have another look at this? :-)

Sorry Ed, i was mostly sick since then - i'm looking into it again today.
Quick update: I am trying to track down the cause for the other issue and hope to resolve this week, got slowed down by my Mac dying on me.
Summary: Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | application crashed [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)] → Intermittent test_bug582412-1.html, test_bug582412-1.html, 640272.html | application crashed [@ mozilla::plugins::PluginModuleChild::UnregisterActorForNPObject(NPObject*)] [@ mozalloc_abort(char const*)]
Via try runs i traced it down to the following:
We end up doing PluginScriptableObjectChild::DropNPObject()/PPluginScriptableObjectChild::SendUnprotect() twice on the same instance.

The final (aborting one) is from AnswerNPP_Destroy(), with PluginScriptableObjectChild::mObject being 0:

 3  XUL!mozilla::plugins::PPluginScriptableObject::Transition(mozilla::plugins::PPluginScriptableObject::State, mozilla::ipc::Trigger, mozilla::plugins::PPluginScriptableObject::State*) [PPluginScriptableObject.cpp : 34 + 0x1f]
 4  XUL!mozilla::plugins::PPluginScriptableObjectChild::SendUnprotect() [PPluginScriptableObjectChild.cpp : 530 + 0x16]
 5  XUL!mozilla::plugins::PluginScriptableObjectChild::DropNPObject() [PluginScriptableObjectChild.cpp : 609 + 0x7]
 6  XUL!mozilla::plugins::PluginScriptableObjectChild::ScriptableDeallocate(NPObject*) [PluginScriptableObjectChild.cpp : 66 + 0x4]
 7  XUL!mozilla::plugins::PluginModuleChild::DeallocNPObject(NPObject*) [PluginModuleChild.cpp : 2176 + 0x4]
 8  XUL!DeleteObject [PluginInstanceChild.cpp : 3951 + 0x7]
 9  XUL!PL_DHashTableEnumerate [pldhash.cpp : 717 + 0xf]
10  XUL!mozilla::plugins::PluginInstanceChild::AnswerNPP_Destroy(short*) [nsTHashtable.h : 237 + 0x7]


... before that we end up there from NPP_SetWindow(), Java calls back into the browser and from the event queue we reenter with a second NPP_SetWindow():

 0 - 0   XUL                                 0x0000000100f3c72e mozilla::plugins::PluginScriptableObjectChild::DropNPObject()   78
 1 - 1   XUL                                 0x0000000100f3c6b4 mozilla::plugins::PluginScriptableObjectChild::ScriptableDeallocate(NPObject*)   84
 2 - 2   XUL                                 0x0000000100f37fa0 mozilla::plugins::PluginModuleChild::DeallocNPObject(NPObject*)   48
 3 - 3   XUL                                 0x0000000100f369f5 mozilla::plugins::PluginModuleChild::NPN_ReleaseObject(NPObject*)   181
 4 - 4   XUL                                 0x0000000100f36e9e mozilla::plugins::child::_releasevariantvalue(_NPVariant*)   62
 5 - 5   JavaAppletPlugin                    0x000000010579cbf6 getPageURL   264
 6 - 6   JavaAppletPlugin                    0x000000010579c960 NPP_SetWindow   305
 7 - 7   XUL                                 0x0000000100f3020a mozilla::plugins::PluginInstanceChild::UpdateWindowAttributes(bool)   186
 8 - 8   XUL                                 0x0000000100f300a9 mozilla::plugins::PluginInstanceChild::DoAsyncSetWindow(gfxASurface::gfxSurfaceType const&, mozilla::plugins::NPRemoteWindow const&, bool)   393
 9 - 9   XUL                                 0x0000000100f2fedb mozilla::plugins::PluginInstanceChild::RecvAsyncSetWindow(gfxASurface::gfxSurfaceType const&, mozilla::plugins::NPRemoteWindow const&)   347
10 - 10  XUL                                 0x0000000100fcc3f9 mozilla::plugins::PPluginInstanceChild::OnMessageReceived(IPC::Message const&)   1081
11 - 11  XUL                                 0x0000000100fc5f67 mozilla::plugins::PPluginModuleChild::OnMessageReceived(IPC::Message const&)   247
12 - 12  XUL                                 0x0000000100f474b8 mozilla::ipc::AsyncChannel::OnDispatchMessage(IPC::Message const&)   88
13 - 13  XUL                                 0x0000000100f502ba mozilla::ipc::RPCChannel::Call(IPC::Message*, IPC::Message*)   954
14 - 14  XUL                                 0x0000000100fd19a7 mozilla::plugins::PPluginScriptableObjectChild::CallGetParentProperty(mozilla::plugins::PPluginIdentifierChild*, mozilla::plugins::Variant*, bool*)   279
15 - 15  XUL                                 0x0000000100f3cd0c mozilla::plugins::PluginScriptableObjectChild::ScriptableGetProperty(NPObject*, void*, _NPVariant*)   124
16 - 16  XUL                                 0x0000000100f36c52 mozilla::plugins::child::_getproperty(_NPP*, NPObject*, void*, _NPVariant*)   114
17 - 17  JavaAppletPlugin                    0x000000010579cb78 getPageURL   138
18 - 18  JavaAppletPlugin                    0x000000010579c960 NPP_SetWindow   305
19 - 19  XUL                                 0x0000000100f3020a mozilla::plugins::PluginInstanceChild::UpdateWindowAttributes(bool)   186
20 - 20  XUL                                 0x0000000100f300a9 mozilla::plugins::PluginInstanceChild::DoAsyncSetWindow(gfxASurface::gfxSurfaceType const&, mozilla::plugins::NPRemoteWindow const&, bool)   393
21 - 21  XUL                                 0x0000000100f2fedb mozilla::plugins::PluginInstanceChild::RecvAsyncSetWindow(gfxASurface::gfxSurfaceType const&, mozilla::plugins::NPRemoteWindow const&)   347
22 - 22  XUL                                 0x0000000100fcc3f9 mozilla::plugins::PPluginInstanceChild::OnMessageReceived(IPC::Message const&)   1081
23 - 23  XUL                                 0x0000000100fc5f67 mozilla::plugins::PPluginModuleChild::OnMessageReceived(IPC::Message const&)   247
24 - 24  XUL                                 0x0000000100f474b8 mozilla::ipc::AsyncChannel::OnDispatchMessage(IPC::Message const&)   88
25 - 25  XUL                                 0x0000000100f4f53b mozilla::ipc::RPCChannel::OnMaybeDequeueOne()   555
26 - 26  XUL                                 0x0000000100f51bb6 RunnableMethod::Run()   38
27 - 27  XUL                                 0x0000000101212e64 MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)   132
28 - 28  XUL                                 0x000000010121315a MessageLoop::DoWork() 

My best guess so far is that the reentry leads to us later still keeping track of that PluginScriptableObjectChild and trying to clean it up.
The latest try run makes me sure that we are accessing a torn down PluginScriptableObjectChild (scroll up a little to see logging and backtraces):
https://tbpl.mozilla.org/php/getParsedLog.php?id=19349438&tree=Try&full=1#error0

All problematic runs end up with problems in DropNPObject() or below it, with mObject=0, mObject->_class wrong or the Transition() from SendUnprotect() failing.

However we do remove the entry for those PluginScriptableObjectChilds properly here:
http://dxr.mozilla.org/mozilla-central/dom/plugins/ipc/PluginModuleChild.cpp.html?string=PluginModuleChild.cpp#l2177
... and add them here:
http://dxr.mozilla.org/mozilla-central/dom/plugins/ipc/PluginModuleChild.cpp.html?string=PluginModuleChild.cpp#l2109
... and both places don't look suspicious regarding reentry problems, so i'm just not sure why we end up with two NPObject getting mapped to the same PluginScriptableObjectChild.

This is probably the same property being accessed from the Java plugin, so we could probably come up with a very specialized crude workaround keeping this specific instance alive for the duration of the plugin instances life-time, but i'd really like to avoid that.

bsmedberg, do you have any ideas at this point? See comment 204 for more info, although the mObject=0 constraint mentioned there doesn't hold.
Flags: needinfo?(benjamin)
(In reply to Ed Morley [:edmorley UTC+0] from comment #39)
> (In reply to Georg Fritzsche [:gfritzsche] [away Dec 24 - Jan 1] from
> comment #9)
> > I won't be able to look into this before January. Does this need backout or
> > is this ok until then?
> 
> Sorry, I was away or I would have replied sooner.
> Ideally this would have had a backout, but if it's only going to be a few
> more days we might as well leave in at this point
> 
> Thank you for looking at this :-0

We're now 6 weeks further on with this.

Please can we backout bug 823559 (or whatever else is deemed most appropriate until this is fixed please).
Flags: needinfo?(georg.fritzsche)
(In reply to Ed Morley [:edmorley UTC+0] from comment #266)
> Please can we backout bug 823559 (or whatever else is deemed most
> appropriate until this is fixed please).

Ok, checking the options today.
Flags: needinfo?(georg.fritzsche)
Backed out on Aurora for further investigation (so beware that the OSX perma-leaks will be back), bug 823559, comment 9.
I backed the backout out. Sorry, but I'd rather be starring this intermittent crash rather than 4 permaleaks on each push to Aurora.
The other option is to hide OSX 10.7 debug crashtests and 10.8 debug M1, M3, and crashtests if we want to back it out.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #294)
> The other option is to hide OSX 10.7 debug crashtests and 10.8 debug M1, M3,
> and crashtests if we want to back it out.

Or you back it out in time for the merge to beta?
Beta is where it probably shouldn't land in before we have this and related issues figured out.
Flags: needinfo?(ryanvm)
My preference would be to back this out from beta after the next merge and we can hide the leaking suites there. My reasoning is that the number of pushes (and therefore possibly missed regressions) will be lower there.
Flags: needinfo?(ryanvm)
Alright, that works for me.
Priority: -- → P2
Comment on attachment 697076 [details] [diff] [review]
Check if entry is in object map

I only now realized that this the bug i still saw with this patch applied is already occuring now (just mostly hidden in all those other reports).
I'd like to land this band-aid as discussed for now as it should greatly reduce the orange-count while still producing clearly starrable entries.
Attachment #697076 - Flags: review?(benjamin)
Comment on attachment 697076 [details] [diff] [review]
Check if entry is in object map

gfritzsche says that we aren't operating on dead memory here, just a state machine error, so I can mark r+ on this and we'll see what this does to volume. There appear to be two bugs here.
Attachment #697076 - Flags: review?(benjamin) → review+
Flags: needinfo?(benjamin)
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1700967353b

Let's leave the bug open until the whole situation is sorted.
Whiteboard: [leave open]
All the remaining trunk occurrences seem to be bug 845735; marking this as fixed to reduce the number of mis-stars (of which I am just as guilty :-s).

Georg, would you mind requesting approval to uplift this patch to aurora?
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Comment on attachment 697076 [details] [diff] [review]
Check if entry is in object map

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 823559 (Make Java OOP again on non-Windows platforms).
User impact if declined: Crash from null-pointer & higher orange count on OS X >=10.7
Testing completed (on m-c, etc.): Landing fine on m-c, see comment 609.
Risk to taking this patch (and alternatives if risky): Unlikely, but low risk of shuffling issues around.
String or UUID changes made by this patch: None.
Attachment #697076 - Flags: approval-mozilla-aurora?
Attachment #697076 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Setting the status flags in the spirit of comment #609
Blocks: 849675
Depends on: 878195
Target Milestone: --- → mozilla21
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: