Closed Bug 864112 Opened 11 years ago Closed 6 years ago

crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "mismatched CxxStackFrame ctor/dtors"

Categories

(Core Graveyard :: Plug-ins, defect, P2)

21 Branch
defect

Tracking

(firefox20 unaffected, firefox21- affected, firefox22- affected, firefox23- affected, firefox24 affected, firefox25 affected)

RESOLVED WONTFIX
Tracking Status
firefox20 --- unaffected
firefox21 - affected
firefox22 - affected
firefox23 - affected
firefox24 --- affected
firefox25 --- affected

People

(Reporter: scoobidiver, Unassigned)

References

Details

(4 keywords, Whiteboard: [metro-crash])

Crash Data

Attachments

(1 obsolete file)

It has replaced bug 637596 since 21.0b3 but is higher: currently #2 in 21.0b3 while bug 637596 was #14 in 21.0b2.
The Beta regression range is:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=c4dfe07f855c&tochange=e845a10035f2
It might be a regression from bug 858800.

Signature 	mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits More Reports Search
UUID	a6e63918-5997-44ac-b533-777cb2130421
Date Processed	2013-04-21 06:10:18
Uptime	83
Last Crash	more than 3 months before submission
Install Age	16.8 hours since version was first installed.
Install Time	2013-04-20 13:18:18
Product	Firefox
Version	23.0a1
Build ID	20130419030957
Release Channel	nightly
OS	Mac OS X
OS Version	10.6.8 10K549
Build Architecture	amd64
Build Architecture Info	family 6 model 23 stepping 10
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 8a0GL Layers? GL Context? GL Context+ GL Layers+ xpcom_runtime_abort(###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../../ipc/glue/RPCChannel.cpp, line 656)
Processor Notes 	sp-processor03.phx1.mozilla.com_17395:2012; exploitability tool failed: 127
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x 8a0

Frame 	Module 	Signature 	Source
0 	libmozalloc.dylib 	mozalloc_abort 	mozalloc_abort.cpp:30
1 	XUL 	NS_DebugBreak 	nsDebugImpl.cpp:387
2 	XUL 	_ZZL11toHexStringPKhjR19nsACString_internalE6digits 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+Abort+|+NS_DebugBreak_P
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+Abort
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+NS_DebugBreak+|+_ZZL11toHexStringPKhjR19nsACString_internalE6digits
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+Abort
... doesn't have those aborts.

All the stacks i looked at in the other reports only have the three top-most frames :/
Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | Abort] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] → [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ]
Bug 858800 affected Windows-specific code. Given that these crashes are all on OS X, it is not a regression from that bug.
Oh, right. So there is nothing standing out so far here.
Checking the date though, Adobe apparently released Flash on April 9th [1] (the start of the pushlog), which could explain the spike.

The toHexString part is definitely noise, so we don't have any frames below NS_DebugBreak in those crashes.
All we have is that we hit the |mCxxStackFrames.empty()| assert here:
https://hg.mozilla.org/releases/mozilla-beta/annotate/cc63217713dc/ipc/glue/RPCChannel.cpp#l80
Priority: -- → P2
Attached file Testcase (obsolete) —
Well, here's a test case, but I'd believe that there are multiple ways to trigger this, so it may or may not be the root cause.

If it turns out that this is the cause, it's a dupe of bug 418615.
(In reply to John Schoenick [:johns] from comment #4)
> If it turns out that this is the cause, it's a dupe of bug 418615.

Well, not a dupe of, since something caused it to start being crashy, but caused by.
The fact that FF20 is unaffected makes me think this was not caused solely by an external change.

Nonetheless, needinfo on KaiRo for URLs/correlations - we should try to determine whether this was caused by a forward fix, external change (like the new Flash version), or rebuilding beta (2->3).

Assigning to Georg, johns suggested trying to back out navigator.plugins.refresh(). Does that sound like the best path forward?

Finally, marking as reproducible given the test case in comment 4.
Assignee: nobody → georg.fritzsche
Flags: needinfo?(kairo)
Keywords: reproducible
(In reply to Alex Keybl [:akeybl] from comment #6)
> Assigning to Georg, johns suggested trying to back out
> navigator.plugins.refresh(). Does that sound like the best path forward?

To clarify, bsmedberg had mentioned that he wants to to tear out all the plugin stopping and document touching navigator.plugins.refresh(true) does, essentially making the param a no-op. 

If this is our intention anyway, uplifting such a change to beta might be a possibility, given the nearly non-existent usefulness of this to the end-user and how broken it is anyway (see bug 418615 comment 6)
Depends on: 804936
(In reply to John Schoenick [:johns] from comment #7)
> (In reply to Alex Keybl [:akeybl] from comment #6)
> > Assigning to Georg, johns suggested trying to back out
> > navigator.plugins.refresh(). Does that sound like the best path forward?
> 
> To clarify, bsmedberg had mentioned that he wants to to tear out all the
> plugin stopping and document touching navigator.plugins.refresh(true) does,
> essentially making the param a no-op.

Right, that would be a relatively quick and low-risk option.
However, we can only be somewhat sure of having fixed it when it's on beta.

Benjamin, can you confirm on no-opping navigator.plugins.refresh(true) now.
Do you think that is something we also could speculatively uplift?
Flags: needinfo?(benjamin)
Depends on: 418615
This crash signature is shared by several bugs. After a few days, it's not as high as that, only #7 browser crasher in 21.0b3 on Mac OS X.
Keywords: topcrash
Summary: crash in mozilla::ipc::RPCChannel::DebugAbort witha abort message: "mismatched CxxStackFrame ctor/dtors" → crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "mismatched CxxStackFrame ctor/dtors"
URLs are a bunch of ones and twos:

for mozalloc_abort(char const*) | Abort | NS_DebugBreak_P :
1 	https://www.mymicros.net/servlet/PortalLogIn/
1 	http://www.screenr.com/record
1 	about:home
1 	http://www.facebook.com/
1 	http://www.runescape.com/game.ws?j=1
1 	http://www.nongpink.com/details.php?id=21410
1 	http://www.top10bollywood.com/2012/12/bollywood-movie-calendar-2013.html#.UXJv0uuaSCZ
1 	https://obank.kbstar.com/quics?page=C027221&weblog=introLogin
1 	https://www.google.fr/search?q=dropbox+downloader&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-beta
1 	https://netcom.no/appframes/nonsec/netrefill/receiptnotloggedin.html?purchaseId=<...>
1 	https://gswlcocr01.hogarthww.prv/SystemExplorer.asp
1 	http://www.wildberries.ru/catalog/20/women.aspx?sort=newly&utm_source=ad_actionpay&utm_campaign=ad_actionpay&utm_medium=21150&actionpay=ce62ef8d-aefa-d61f-fef3-013e329683cf.21150
1 	http://mac.eltima.com/FolxPages/back.html
1 	http://www.vuze.com/content/channel.php?id=104
1 	https://www.microsoft.com/Licensing/servicecenter/Downloads/DownloadsAndKeys.aspx

for mozalloc_abort(char const*) | Abort :
2 	http://www.runescape.com/game.ws?j=1
1 	http://www.redfin.com/homes-for-sale#!market=dc&max_price=450000&num_beds=3&region_id=25346&region_type=6&v=8
1 	https://www.facebook.com/dialog/oauth?client_id=108500796108&response_type=token%2Csigned_request%2Ccode&display=none&domain=society6.com&origin=1&redirect_uri=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter.php%3Fversion%3D22%23cb%3Df2ae07d7c7
1 	https://of-uat-ofapp.corp.google.com:8443/dev60cgi/f60cgi?lookAndFeel=ORACLE&colorScheme=RED&digitSubstitution=CONTEXT&lang=US&env=NLS_LANG=%27AMERICAN_AMERICA.UTF8%27+FORMS60_USER_DATE_FORMAT=%27DD-MON-RRRR%27+FORMS60_USER_DATETIME_FORMAT=%27DD-MON-RRRR%
1 	https://muisca.dian.gov.co/WebArquitectura/DefCrearCertifDigital.faces
1 	https://wwws.ameritrade.com/cgi-bin/apps/u/StreamerContainer?launchids=&kctr=f
1 	http://www.incruit.com/
1 	https://www.santandernet.com.br/ibpf/transacoes/cartoes/pagamentoFaturaImprimir.asp
1 	https://obank.kbstar.com/quics?page=C018393
1 	http://www.youtube.com/watch?v=8D0B0-OjDXA
1 	https://verkkopankki.danskebank.fi/html/index.html?site=SBNBFI&secsystem=E2
1 	http://kriminal.ictv.ua/ru
1 	https://www.yousendit.com/send
1 	http://www.google.fr/
1 	https://www.huntington.com/
1 	http://www.youtube.com/watch?v=bUTei9nywY0

in filtering content I ran into problems with the java applet plugin at the danskebank.fi site above. I denied all requests, eventually the plugin crashed. My crash report was throttled.

for mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits :

http://mac.eltima.com/FolxPages/back.html
Keywords: needURLs
Bug 418615 landed Tuesday. I do think we can uplift it to Aurora, but not Beta.
Flags: needinfo?(benjamin)
Given the URL distribution it's not too likely that we'll find STR, but we should at least try. Given the Flash release date, maybe this spiked due to latest Flash or it's update?
Keywords: qawanted
None of the links in comment 10 nor the testcase in comment 4 crashed FF 21b3, Flash 11.7.700.169, Mac OS X 10.8.3.
Keywords: qawanted
(In reply to Paul Silaghi [QA] from comment #13)
> None of the links in comment 10 nor the testcase in comment 4 crashed FF
> 21b3, Flash 11.7.700.169, Mac OS X 10.8.3.

Note that this requires the nptest plugin. I was able to crash 21b5 with:
http://crash-stats.mozilla.com/report/index/bp-254f4e9c-52be-4823-a96a-2b6e92130426
There are no crashes in 21.0b4 so either it was a red herring or it was fixed.
(In reply to Scoobidiver from comment #15)
> There are no crashes in 21.0b4 so either it was a red herring or it was
> fixed.

This is likely to be timing sensitive, so small, seemingly unrelated changes might have made this disappear.

I'm only seeing a recent crash on Aurora for this, bp-6a21e5f2-2029-4906-b0e9-ecfe92130426 (without bug 418615 having landed there yet).
(In reply to John Schoenick [:johns] from comment #14)
> (In reply to Paul Silaghi [QA] from comment #13)
> > None of the links in comment 10 nor the testcase in comment 4 crashed FF
> > 21b3, Flash 11.7.700.169, Mac OS X 10.8.3.
> 
> Note that this requires the nptest plugin. I was able to crash 21b5 with:
> http://crash-stats.mozilla.com/report/index/bp-254f4e9c-52be-4823-a96a-
> 2b6e92130426

Thanks for the notice. I was also able to crash FF 21b5 only once using the testcase https://crash-stats.mozilla.com/report/index/bp-9acec8e2-c657-47e1-91ec-c7fc32130429
We'll retrack if/when this reappears, although it may be worthwhile to dig into comment 17 more in case it does.
Flags: needinfo?(kairo)
Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] → [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] [@ mozalloc_abort(char const*) | Abort | N…
Crash Signature: NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) const ] → NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) const ] [@ mozalloc_abort(char const*) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*,…
Crash Signature: , char const*, bool) const ] → , char const*, bool) const ] [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ]
Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] [@ mozalloc_abort(char const*) | Abort | N… → [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const* const) | NS_DebugBreak ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_intern…
I just got this crash and been able to make the following steps which crashes with this signature sometimes but it can be seen large amounts of memory even without it crashing so it will hopefully help.

These steps make use of the OOP thumbnail process and about:memory on Windows 8 desktop.

1. In a new profile (even in your normal one, I get better crash percentage with my normal one) open about:memory.
2. Measure memory. It should be quite quick.
3. Open a new blank tab which has the dashboard thing.
4. Make certain that a thumbnail is being generated. Can be done by dragging a bookmark into dashboard box.
5. Quickly go back to about:memory and measure.
6. Repeat measure every time it finishes. No point continuing if the thumbnail process ends though.

You will notice that huge amounts of memory are getting used in the main browser (I have seen 700MB+ in spike) while its collecting the reports and the browser will be hung, sometimes with script timeout a couple of times per collection.

Here is a crash report I made as I was typing this in a brand new profile using steps above.
https://crash-stats.mozilla.com/report/index/43be4c82-b3ca-4c9f-ac98-b6aa52130820
The crash stack from comment 20 shows us crashing in a nested event loop under ShowSlowScriptDialog, FWIW.
OS: Mac OS X → All
Hardware: x86_64 → All
The memory increases i see are <500MB. Most of that attributes to heap-unclassified & about:memory. I'm not sure about whether that is an actual issue.

I can't trigger the abort so far, but given that this is a timing problem that is not really surprising.
Sadly we don't see stack frames above ShowSlowScriptDialog, but if we hit that abort it means that a channel is spinning the event loop further up the stack and we're actually in an IPC call.
The plugin code deals with similar scenarios by delaying destruction until it's not nested anymore:
http://hg.mozilla.org/mozilla-central/annotate/1d6bf2bd4003/dom/plugins/ipc/PluginModuleParent.h#l112
http://hg.mozilla.org/mozilla-central/annotate/1d6bf2bd4003/dom/plugins/ipc/PluginModuleParent.cpp#l286
... it looks like ContentParent just needs the same protection here.

bsmedberg, does that sound reasonable to you? Do you know who has an overview on ContentParent et al? hg log suggests jlebar?
Flags: needinfo?(benjamin)
> 1. In a new profile (even in your normal one, I get better crash percentage
> with my normal one) open about:memory.
> 2. Measure memory. It should be quite quick.
> 3. Open a new blank tab which has the dashboard thing.
> 4. Make certain that a thumbnail is being generated. Can be done by dragging
> a bookmark into dashboard box.
> 5. Quickly go back to about:memory and measure.
> 6. Repeat measure every time it finishes. No point continuing if the
> thumbnail process ends though.
> 
> You will notice that huge amounts of memory are getting used in the main
> browser (I have seen 700MB+ in spike) while its collecting the reports and
> the browser will be hung, sometimes with script timeout a couple of times
> per collection.

A lot of memory report information gets passed between the processes, and in a very space-inefficient manner.  I think that's the cause of the spike.  I have some plans for improving this in the medium-term.  Until then, I'm not too worried, because this is an obscure scenario and the memory spike eventually sorts itself out (I assume).
Talk to billm and bent about ContentParent. Although this bug predates any use of ContentParent and is related to plugins. But extra event-loop-spinning certainly makes sense in some cases here.
Flags: needinfo?(benjamin)
Depends on: 907804
I was able to crash Firefox 24 beta 4 using this steps:

1. Open http://silverlightgames.org/game/hamquest/play/
2. Play a bit ~2 minute.
3. In the same tab open http://www.javagameplay.com/offroadrally/rally.html

It may take some few tries but I was able to crash FF. Unable to crash latest Nightly tough.

Signature:
[@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ]

https://crash-stats.mozilla.com/report/index/42f7d3e3-4b38-4aa5-9d87-c7f752130822
https://crash-stats.mozilla.com/report/index/662d8470-cc58-4241-9fdb-e0e7a2130822
https://crash-stats.mozilla.com/report/index/8d6a8aed-d760-49d9-b24f-f817c2130822
https://crash-stats.mozilla.com/report/index/64e06475-d1ee-433a-87f9-141992130822
The testcase here still causes this after bug 418615

@gfritzsche - do you think this is bug 907804 or should I file a new bug for it?
Flags: needinfo?(georg.fritzsche)
That's interesting. I think a new bug would be best - bug 907804 should only occur since we have OOP thumb-nailing and this issue predates it.
Flags: needinfo?(georg.fritzsche)
Depends on: 938796
Comment on attachment 740427 [details]
Testcase

Bug 938796 was filed for this testcase
Attachment #740427 - Attachment is obsolete: true
Assignee: georg.fritzsche → nobody
Flags: firefox-backlog?
Flags: firefox-backlog? → firefox-backlog+
Crash Signature: , bool) const ] [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const* const) | NS_DebugBreak | getptd_noexit ] → , bool) const ] [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const* const) | NS_DebugBreak | getptd_noexit ] [@ mozal…
Closing as we never shipped Metro!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: