Closed Bug 639162 Opened 11 years ago Closed 11 years ago

Startup crash [@ XPCCallContext::NewStringWrapper]

Categories

(Core :: XPConnect, defect)

ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 634594
Tracking Status
fennec 2.0+ ---

People

(Reporter: dougt, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-fe7c9029-f954-408f-b54a-ae1ad2110305 .
============================================================= 



Startup crash in todays nightly
tracking-fennec: --- → 2.0+
Crashes first appeared in Fennec 4.0b6pre/20110302.

Signature	XPCCallContext::NewStringWrapper
UUID	fe7c9029-f954-408f-b54a-ae1ad2110305
Time 	2011-03-05 07:26:14.151129
Uptime	0
Last Crash	79 seconds before submission
Install Age	811 seconds (13.5 minutes) since version was first installed.
Product	Fennec
Version	4.0b6pre
Build ID	20110305042456
Branch	2.0
OS	Linux
OS Version	0.0.0 Linux 2.6.32.9 #3 Tue Oct 12 21:33:42 KST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGBUS
Crash Address	0x0
App Notes 	samsung SCH-I800
verizon/SCH-I800/SCH-I800/SCH-I800:2.2/FROYO/DJ11:user/release-keys
Processor Notes 	WARNING: Json file missing Add-ons

Frame 	Module 	Signature [Expand] 	Source
0 	libxul.so 	XPCCallContext::NewStringWrapper 	nsTSubstring.h:593
1 	libxul.so 	XPCConvert::JSData2Native 	js/src/xpconnect/src/xpcconvert.cpp:762
2 	libxul.so 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:2968
3 	libxul.so 	XPC_WN_CallMethod 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1613
4 	libxul.so 	js::Interpret 	js/src/jscntxtinlines.h:701
5 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:653
6 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:863
7 	libxul.so 	JS_CallFunctionValue 	js/src/jsapi.cpp:5173
8 	libxul.so 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1672
9 	libxul.so 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:588
10 	libxul.so 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:132
11 	libxul.so 	libxul.so@0xd0948b 	
12 	libxul.so 	nsComponentManagerImpl::CreateInstanceByContractID 	xpcom/components/nsComponentManager.cpp:1315
...

The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ec8a059bea58&tochange=dd542a890eb7

More reports at:
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=XPCCallContext%3A%3ANewStringWrapper
Component: General → XPConnect
Keywords: regression
OS: Linux → Android
Product: Fennec → Core
QA Contact: general → xpconnect
Hardware: All → ARM
Summary: crash [@ XPCCallContext::NewStringWrapper] → Startup crash [@ XPCCallContext::NewStringWrapper]
Putting back on nom list.
tracking-fennec: 2.0+ → ?
This is ... odd.  The crash is claimed to be a null-deref, but the relevant code is:

757 const jschar *strChars = JS_GetStringCharsZAndLength(cx, str, &strLength);
758 if (!strChars)
759   return JS_FALSE;
760
761 XPCReadableJSStringWrapper *wrapper =
762   ccx.NewStringWrapper(strChars, strLength); 

The NewStringWrapper just looks up _where_ to construct the string; it passes things through directly to the string code.... which is where the crash claims to be?

Hmm.  I wonder whether the problem is that the XPC cache thing is doing unaligned access?
Almost certainly:

  char mStringWrapperData[sizeof(StringWrapperEntry) * XPCCCX_STRING_CACHE_SIZE]; 

And the previous member is a PRUint16 and the one before that is void*, so this will tend to be aligned on a 2-byte boundary.

But note that this code has been around forever (well since late 2007), so I don't see how this can be a regression.  Unless fixing breakpad for fennec made us finally get useful stacks here?
Blocks: 408139
Assignee: nobody → doug.turner
tracking-fennec: ? → 2.0+
If this was the reason, wouldn't we expect a crash every single time? There is no randomness in what alignment this will get (modulo 4), it should be 2-byte aligned every time.
Assignee: doug.turner → azakai
Not sure what we want to do here.

On the one hand, I still don't know what is causing this crash. I would suspect ccx of being null more than I would suspect an unaligned access, since this crash is rare (unaligned access should kill us every time - unless only very few Android kernels are configured that way? Perhaps these are modified kernels?). Also not sure this is a startup crash, as other reports have uptime > 0 (in the range 1-8). No STR so hard to be sure what is going on here.

On the other hand, if unaligned access is the problem, I'm not sure how we would solve it. We can juggle the elements in that structure, or add manual padding, but (1) this would increase the size of the structure, which is a shame if this isn't really the issue, and (2) it is hard to see how to do so in a way that is ensured to work on all archs.

But, looking through the code, it doesn't look like any significant amount of work is actually done in the constructors which are avoided here - which is what this hack is trying to optimize. And unaligned access, even if nonfailing, tends to be slower. So this optimization may not be that useful anyhow (although it does save some memory). Perhaps the simplest solution here is to just remove the optimization?
If I set everything up properly, 

k_OFFSETOF_XPCCallContext_mStringWrapperData:
	.word	102

in a fairly vanilla android opt build.
But should note that the surrounding area is

	.align	2
	.type	k_OFFSETOF_XPCCallContext_mStringWrapperData, %object
	.size	k_OFFSETOF_XPCCallContext_mStringWrapperData, 4
k_OFFSETOF_XPCCallContext_mStringWrapperData:
	.word	102

I wonder if this is a THUMB/no-THUMB thing ...
Alon, string constructors are a pretty noticeable performance sink in general...  Note that it's not just the ctors but also the heap-allocation of the string that this thing optimizes away.
Attached patch patch (obsolete) — Splinter Review
If alignment is the issue, this should fix it. I am not happy with how it does it (just shuffle some members - so vulnerable to future edits of the file), but I don't see a better solution. I am also not happy that we are not sure that alignment is the cause, but there seems no harm in trying this.

(I was worried this would take more memory, but it won't since the entire structure is padded anyhow to a multiple of the alignment of the largest member, so we are just moving padding from after to before.)
Attachment #518202 - Flags: review?(bzbarsky)
(In reply to comment #9)
> Alon, string constructors are a pretty noticeable performance sink in
> general...  Note that it's not just the ctors but also the heap-allocation of
> the string that this thing optimizes away.

Yeah, I read more of the code meanwhile, and it is significant. So removing the optimization would have been bad, forget my earlier comment.
(In reply to comment #8)
> But should note that the surrounding area is
> 
>     .align    2

Just double-checked this, and the .align param on ARM is the number of low-order *bits* to which to align the location, not byte-alignment.  So this isn't unexpected.
(In reply to comment #12)
> (In reply to comment #8)
> > But should note that the surrounding area is
> > 
> >     .align    2
> 
> Just double-checked this, and the .align param on ARM is the number of
> low-order *bits* to which to align the location, not byte-alignment.  So this
> isn't unexpected.

Cool, so at least that part makes sense.

The remaining mystery is why we only crash some of the time because of this alignment issue. But, I think we can go ahead with the patch without figuring that out.
Comment on attachment 518202 [details] [diff] [review]
patch

Why don't we use a union to align this properly instead of depending on ordering of previous stuff?  See the what nsAutoArrayBase in nsTArray.h does to align its mAutoBuf, for example.
Attachment #518202 - Flags: review?(bzbarsky) → review-
Hmm... apparently all of this has happened before. I noticed now that this is fixed on tracemonkey, bug 634594.

Will nom that bug.
Depends on: 634594
Attachment #518202 - Attachment is obsolete: true
There have been no crashes since 4.0b6pre/20110308, I close it as "workforme".

The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1240004b90e6&tochange=89429f5cbf81
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
This is really just a dupe of bug 634594.
No longer depends on: 634594
Resolution: WORKSFORME → DUPLICATE
Duplicate of bug: 634594
> This is really just a dupe of bug 634594.
Are you sure? The patch in bug 634594 has not been landed yet in m-c.
(In reply to comment #18)
> > This is really just a dupe of bug 634594.
> Are you sure? The patch in bug 634594 has not been landed yet in m-c.

I'm not sure what the patch landing or not has to do with this being a duplicate of the other bug - maybe I didn't understand what you meant here.

In any case, the patch in that bug has landed a few minutes ago :)
Crash Signature: [@ XPCCallContext::NewStringWrapper]
Assignee: azakai → nobody
You need to log in before you can comment on or make changes to this bug.