Closed Bug 785096 Opened 7 years ago Closed 7 years ago

Crash with Components.lookupMethod(sel.options, "add")

Categories

(Core :: DOM: Core & HTML, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: jruderman, Assigned: gkrizsanits)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, crash, testcase)

Crash Data

Attachments

(5 files)

Attached file testcase
###!!! ASSERTION: function object has parent of unknown class!: 'Error', file js/xpconnect/src/XPCWrappedNative.cpp, line 1845

Followed by a debug-crash in XPCCallContext::Init while trying to evaluate the following assertion:

NS_ABORT_IF_FALSE(!mFlattenedJSObject || IS_SLIM_WRAPPER(mFlattenedJSObject),
                              "should have a slim wrapper");
Attached file stack traces
Severity: normal → critical
Crash Signature: [@ NS_DebugBreak_P | XPCWrappedNative::GetWrappedNativeOfJSObject]
So what happens here is that mozilla::dom::oldproxybindings::HTMLOptionsCollection_Add initializes an XPCCallContext, which ends up calling XPCWrappedNative::GetWrappedNativeOfJSObject.

When it calls that last, the "obj" is a proxy.  Also, funObjParent == obj.  So funObjParentClass is a proxy class, which is not any of proto, wrapper, tearoff.  So we return null from GetWrappedNativeOfJSObject.  Then the logic in XPCCallContext::Init sets mWrapper to null, null-checks it, and when it's null does this:

154        NS_ABORT_IF_FALSE(!mFlattenedJSObject || IS_SLIM_WRAPPER(mFlattenedJSObject),
155                          "should have a slim wrapper");

But this is being called from the XPCCallContext constructor that doesn't take a flattened JS object, so mFlattenedJSObject is not initialized at this point (should that constructor null-initialize it?) and we try to test something about its JSClass and crash because that pointer is garbage.
Gabor, do you have time to look into this?
(In reply to Boris Zbarsky (:bz) from comment #3)
> Gabor, do you have time to look into this?

Yes, I will check it out.
(In reply to Boris Zbarsky (:bz) from comment #2)
> So what happens here is that
> mozilla::dom::oldproxybindings::HTMLOptionsCollection_Add initializes an
> XPCCallContext, which ends up calling
> XPCWrappedNative::GetWrappedNativeOfJSObject.
> 
> When it calls that last, the "obj" is a proxy.  Also, funObjParent == obj. 

In case of regular xrays this funObjParent should be the wrapper (hack in XPCWrappedNativeXrayTraits::resolveNativeProperty) and it should end up in the IS_WRAPPER_CLASS branch because of it (ignoring the passed in obj).

I checked the case of 
sel.options.add.bind(sel.options)(); and then the obj is the window... Which is totally wrong but since mWrapper of ccx is never used in HTMLOptionsCollection_Add so it does not really matter. So probably we could just fix this by handling the case where
funObjParent is an old dom proxy, or dealing with the case where we return null here. Why do we need a call context here exactly by the way?

What worries me is that LookupMethod find the method through an xray, and then binds it to the old dom proxy (HTMLOptionsCollection). But the parent object ends up being different than if we do the same thing from javscript...
(new XPCNativeWrapper(sel.options).add.bind(sel.options)();)
I still would like to figure why we get that dom proxy there before trying to fix this.
> Why do we need a call context here exactly by the way?

We don't in practice, but the codegen is dumb.  It sees the nsIVariant and assumes we'll need a ccx.  Which we do in quickstubs, but apparently not here.

So we could try changing anyParamRequiresCcx in js/xpconnect/src/qsgen.py to just return false and seeing what happens too.
So the only bits that requires an XPCCallContext here is the variant conversion. And the only real usage of the ccx there is this dom string optimization. As it seems this code could be simply moved to XPCJSRuntime (bug 408139), what I did in the first patch. In the second it's really only about replacing ccx with cx at many places, to finally being able to change the codegen, not to create a ccx.
Attachment #660048 - Flags: review?(bzbarsky)
Bobby you told me that I can flag you as the reviewer of this patch. By the way it was green on try.
Attachment #660049 - Flags: review?(bobbyholley+bmo)
I just realized that I did not create a test from the attached testcase, will do that later.
Comment on attachment 660048 [details] [diff] [review]
part1: Moving dom string cache from XPCCallContext to XPCJSRuntime

Yeah, I guess with mainthread only xpconnect this works.  r=me
Attachment #660048 - Flags: review?(bzbarsky) → review+
Comment on attachment 660049 [details] [diff] [review]
part2: Removing XPCCallContext dependency from XPCVariant


> 
>     // XXX check IsPtr - esp. to handle array of nsID (as opposed to nsID*)
> 
>+    JSObject* scope = lccx.GetScopeForNewJSObjects();
>+
>     switch (type.TagPart()) {

Uh, what?



>--- a/js/xpconnect/src/XPCWrappedNative.cpp
>+++ b/js/xpconnect/src/XPCWrappedNative.cpp

>         } else if (isSizedString) {
>-            if (!XPCConvert::NativeStringWithSize2JS(mCallContext, &v,
>+            if (!XPCConvert::NativeStringWithSize2JS(mCallContext.GetJSContext(), &v,
>                                                      (const void*)&dp->val,

>             // we actually assured this before doing the invoke
>             NS_ASSERTION(mArgv[i].isObject(), "out var is not object");
>-            if (!JS_SetPropertyById(mCallContext,
>+            if (!JS_SetPropertyById(mCallContext.GetJSContext(),
>                                     &mArgv[i].toObject(),
>                                     mIdxValueId, &v)) {


Why are these changes necessary. Shouldn't the implicit conversion take care of it? In particular, JS_SetPropertyById(mCallContext, ...) seemed to work just fine before this patch...


> def anyParamRequiresCcx(member):
>-    for p in member.params:
>-        if isVariantType(p.realtype):
>-            return True

Nice. Can we get another patch on top of this one that removes anyParamRequiresCcx and fixes any consumers just to assume false there?

Awesome patch! r=bholley with the above comments addressed. :-)
Attachment #660049 - Flags: review?(bobbyholley+bmo) → review+
(In reply to Bobby Holley (:bholley) from comment #11)
> Comment on attachment 660049 [details] [diff] [review]
> >+    JSObject* scope = lccx.GetScopeForNewJSObjects();
> >+
> >     switch (type.TagPart()) {
> 
> Uh, what?
Remaining from a previous version where I had a NativeData2JS that takes a scope and cx insted of a ccx... Anyway I'll clean it up.

> 
> >--- a/js/xpconnect/src/XPCWrappedNative.cpp
> >+++ b/js/xpconnect/src/XPCWrappedNative.cpp
> 
> >         } else if (isSizedString) {
> >-            if (!XPCConvert::NativeStringWithSize2JS(mCallContext, &v,
> >+            if (!XPCConvert::NativeStringWithSize2JS(mCallContext.GetJSContext(), &v,
> >                                                      (const void*)&dp->val,
> 
> >             // we actually assured this before doing the invoke
> >             NS_ASSERTION(mArgv[i].isObject(), "out var is not object");
> >-            if (!JS_SetPropertyById(mCallContext,
> >+            if (!JS_SetPropertyById(mCallContext.GetJSContext(),
> >                                     &mArgv[i].toObject(),
> >                                     mIdxValueId, &v)) {
> 
> 
> Why are these changes necessary. Shouldn't the implicit conversion take care
> of it? In particular, JS_SetPropertyById(mCallContext, ...) seemed to work
> just fine before this patch...

So I just made these changes to make it explicit for this patch where is this all starting from. I will remove them before commit.

> 
> Nice. Can we get another patch on top of this one that removes
> anyParamRequiresCcx and fixes any consumers just to assume false there?

Sure!
Also, I modified qsgen.py and codegen.py as well, but apparently qsgen is never run... Do we need it in the repo at all? Or is there some other script that is using it?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #12)
> Also, I modified qsgen.py and codegen.py as well, but apparently qsgen is
> never run... Do we need it in the repo at all? Or is there some other script
> that is using it?

Sorry my bad. Peter explained me how it works and which file it generates.
Comment on attachment 660048 [details] [diff] [review]
part1: Moving dom string cache from XPCCallContext to XPCJSRuntime

Review of attachment 660048 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1035,5 @@
>  
> +XPCReadableJSStringWrapper *
> +XPCJSRuntime::NewStringWrapper(const PRUnichar *str, PRUint32 len)
> +{
> +    for (PRUint32 i = 0; i < XPCCCX_STRING_CACHE_SIZE; ++i) {

Don't forget to run the script to make this uint32_t.

::: js/xpconnect/src/xpcprivate.h
@@ +959,5 @@
> +    struct StringWrapperEntry
> +    {
> +        StringWrapperEntry() : mInUse(false) { }
> +
> +        js::AlignedStorage2<XPCReadableJSStringWrapper> mString;

Make this mozilla::AlignedStorage2 while you're here.
Comment on attachment 660049 [details] [diff] [review]
part2: Removing XPCCallContext dependency from XPCVariant

Review of attachment 660049 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/codegen.py
@@ +214,3 @@
>                  "    if (!${name}) {\n"
> +                "        nsresult rv;"
> +                "        xpc_qsThrowBadArg(cx, rv, vp, %d);\n"

You're throwing an uninitialized rv here, aren't you?

::: js/xpconnect/src/qsgen.py
@@ +543,3 @@
>                  "    if (!${name}) {\n"
> +                "        nsresult rv;"
> +                "        xpc_qsThrowBadArg(cx, rv, vp, %d);\n"

Same here

::: js/xpconnect/src/xpcprivate.h
@@ +4023,5 @@
>  {
>    public:
> +    ArrayAutoMarkingPtr(JSContext* cx)
> +      : AutoMarkingPtr(cx), mPtr(nullptr), mCount(0) {}
> +    ArrayAutoMarkingPtr(JSContext* cx, T** ptr, PRUint32 count, bool clear)

uint32_t
> Comment on attachment 660048 [details] [diff] [review]
> part1: Moving dom string cache from XPCCallContext to XPCJSRuntime
> 
> Yeah, I guess with mainthread only xpconnect this works.  r=me

It does work, however... I'm hesitating to commit it because I noticed some serious performance drop back on the debug build locally. So thinking about this change, pre-patch this optimization was on the level of ccx. So in case of some deep call stack, we could have multiple ccx around, with multiple string caches, so optimistically all the dom string could be effected by this optimization. While post path we have always only one XPCJSRuntime, so top most XPCCCX_STRING_CACHE_SIZE (2) dom string could be effected at any given time.
Now I can increase that number ad hoc and see what's the result of it (and for validation I did that but didn't see it helping much) but I would like to do some real performance check. What do you think? Does this make sense? Or are there some automatic perf. checks on try already, that should have alerted if my theory checked out?
I might be wrong, and it can be just some glitch on my local env.,  but would prefer double checking it before commit over causing some nasty backouts later.

(In reply to :Ms2ger from comment #15)
> Comment on attachment 660049 [details] [diff] [review]
> You're throwing an uninitialized rv here, aren't you?

Whoops... fixed and thanks.(In reply to Boris Zbarsky (:bz) from comment #10)
Well, debug builds aren't great for perf testing, but that's a totally valid theory (increasing the string cache size should do the trick though, I'd think).

You can just push to try with -t all and see what happens. If it's actually noticeable for you on your local build, I'd expect the Talos numbers to go bonkers.
(In reply to Bobby Holley (:bholley) from comment #17)
> Well, debug builds aren't great for perf testing, but that's a totally valid
> theory (increasing the string cache size should do the trick though, I'd
> think).

Yeah I tried that but didn't really help.
 
> You can just push to try with -t all and see what happens. If it's actually
> noticeable for you on your local build, I'd expect the Talos numbers to go
> bonkers.

Even Talos looks pretty green to me...
https://tbpl.mozilla.org/?tree=Try&rev=d8f07d5eb75a

So shall I just commit this? (3rd patch will still need a review ofc, I mean the first 2)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #18)
> (In reply to Bobby Holley (:bholley) from comment #17)
> > Well, debug builds aren't great for perf testing, but that's a totally valid
> > theory (increasing the string cache size should do the trick though, I'd
> > think).
> 
> Yeah I tried that but didn't really help.
>  
> > You can just push to try with -t all and see what happens. If it's actually
> > noticeable for you on your local build, I'd expect the Talos numbers to go
> > bonkers.
> 
> Even Talos looks pretty green to me...
> https://tbpl.mozilla.org/?tree=Try&rev=d8f07d5eb75a

Uh, I don't see any talos boxes on that try run. Also I don't think talos goes orange. You have to manually compare the numbers (or use the 'compare' checkbox).
> Yeah I tried that but didn't really help.

How are you determining the performance impact?  Have you verified in an opt build?  Generally, doing performance measurement in a debug build is completely pointless; large parts of the browser have different algorithmic complexity in opt and debug builds.
I'd be somewhat unhappy if debug builds suddenly got a lot slower.  It would slow down our regression and fuzz testing.
Well, yes.  But the key part is "how are you determining?"
(In reply to Boris Zbarsky (:bz) from comment #20)
> > Yeah I tried that but didn't really help.
> 
> How are you determining the performance impact?  Have you verified in an opt
> build?  Generally, doing performance measurement in a debug build is
> completely pointless; large parts of the browser have different algorithmic
> complexity in opt and debug builds.

It was significant enough at starting up, and later on as well. By simple using the browser. Everything was like 3-4 times slower. So if it's not some glitch on that local copy/build/etc then it's really bad. And I get that debug build are not for profiling but there is a limit to that rule... I'm talking about 3-4 times slower then another copy I built without these changes.

Anyway I will spend some time on building a clean version and figure out what is going on.
(In reply to Bobby Holley (:bholley) from comment #19)
> Uh, I don't see any talos boxes on that try run. Also I don't think talos
> goes orange. You have to manually compare the numbers (or use the 'compare'
> checkbox).

Ehh... my bad... so there is a TryChooser bug it seems and only the xperf test has run. Anyway will push it again with -t All this time and get back to here.
So I experienced extreme slowness with another patch on that same local copy of the tree. I have no clue why is that local copy so much slower than the rest... but apparently it is not the fault of this patch. In the meanwhile I ran the talos tests:

https://tbpl.mozilla.org/?tree=Try&rev=ed83243fbf5a

Some numbers: 
(python compare.py --revision ed83243fbf5a --branch Try --masterbranch Firefox)

Linux:
    tdhtmlr: 405.265 -> 417.118; 412.147.
    tdhtmlr_nochrome: 400.794 -> 412.971; 402.735
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 1497.81 -> 2460.26; 2447.38.
    dromaeo_dom: 76.8533 -> 162.51; 161.607.
    tscrollr: 10174.9 -> 14373.8; 14112.1.
    a11yr: 607.0 -> 642.0; 610.0.
    ts_paint: 692.263 -> 717.947; 694.895.
    tpaint: 208.0 -> 227.0; 216.0.
    tsvgr: 2762.35 -> 4028.68; 3894.45.
    tsvgr_opacity: 97.0 -> 103.0; 100.0.
    tp5n: 364.061 -> 384.449; 368.404.
Linux64:
    tdhtmlr: 377.029 -> 387.206; 378.971.
    :) tdhtmlr_nochrome: 362.147 -> 372.941; 362.
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 1739.14 -> 2848.33; 2795.13.
    dromaeo_dom: 94.9733 -> 208.807; 206.337.
    tscrollr: 12072.5 -> 14010.5; 13713.3.
    a11yr: 519.0 -> 551.5; 527.0.
    ts_paint: 639.789 -> 661.0; 652.421.
    tpaint: 197.0 -> 214.0; 201.0.
    tsvgr: 2598.25 -> 3755.82; 2608.35.
    tsvgr_opacity: 79.0 -> 85.0; 80.5.
    tp5n: 312.692 -> 325.854; 314.116.
Win:
    tdhtmlr: 377.235 -> 385.588; 378.382.
    tdhtmlr_nochrome: 340.088 -> 359.235; 345.735
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 1831.69 -> 3030.85; 3020.05.
    dromaeo_dom: 97.63 -> 211.957; 211.907.
    tscrollr: 10791.6 -> 10915.2; 10815.8.
    a11yr: 480.0 -> 501.5; 484.0.
    ts_paint: 726.632 -> 805.789; 727.632.
    tpaint: 231.0 -> 245.0; 233.0.
    tsvgr: 2165.05 -> 3332.32; 3329.09.
    tsvgr_opacity: 177.5 -> 239.0; 180.0.
    tp5n: 306.545 -> 945.505; 307.611.
WinXP:
    tdhtmlr: 567.882 -> 649.971; 637.647.
    tdhtmlr_nochrome: 704.029 -> 756.412; 742.529
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 1861.3 -> 3060.34; 3045.28.
    :) dromaeo_dom: 98.74 -> 214.31; 214.77.
    tscrollr: 10755.1 -> 10793.7; 10776.8.
    a11yr: 449.0 -> 476.5; 455.0.
    ts_paint: 575.105 -> 646.684; 587.737.
    tpaint: 186.0 -> 198.0; 187.0.
    tsvgr: 2494.25 -> 3815.0; 2638.95.
    tsvgr_opacity: 462.0 -> 479.0; 470.5.
    tp5n: 288.758 -> 297.081; 290.621.
OSX10.7:
    tdhtmlr: 406.647 -> 413.206; 409.382.
    tdhtmlr_nochrome: 377.941 -> 383.353; 382.912
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 2133.02 -> 3615.56; 3525.58.
    dromaeo_dom: 118.4 -> 283.883; 278.497.
    tscrollr: 13375.7 -> 13495.4; 13442.5.
    a11yr: 347.0 -> 363.5; 351.5.
    ts_paint: 809.053 -> 934.526; 877.0.
    tpaint: 403.0 -> 465.0; 451.0.
    tsvgr: 2136.1 -> 3410.82; 3330.77.
    tsvgr_opacity: 551.5 -> 573.0; 555.0.
    tp5n: 258.152 -> 275.616; 270.53.
OSX64:
    tdhtmlr: 403.118 -> 419.529; 411.588.
    tdhtmlr_nochrome: 362.235 -> 417.853; 381.824
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 2180.7 -> 3629.15; 3589.82.
    dromaeo_dom: 117.583 -> 279.883; 278.48.
    tscrollr: 13423.2 -> 13582.4; 13519.5.
    a11yr: 352.5 -> 371.5; 362.5.
    ts_paint: 744.053 -> 790.211; 768.895.
    tpaint: 304.0 -> 363.0; 334.0.
    tsvgr: 2153.15 -> 3384.95; 3369.14.
    tsvgr_opacity: 440.0 -> 470.0; 459.5.
    tp5n: 256.657 -> 276.035; 271.712.


If I read these numbers right, it's in the range. (except on xp this one: dromaeo_dom: 98.74 -> 214.31; 214.77. but that's not that scary)

So my conclusion: I was a bit overcautious it seems and I can just land this. Attaching the 3rd patch that is some cleanup for Bobby's request.
This patch removes only some stuff that is not used any longer.
Attachment #662465 - Flags: review?(bobbyholley+bmo)
Comment on attachment 662465 [details] [diff] [review]
part3: Removing ccx from codegen/qsgen


>-        writeCheckForFailure(f, isMethod, isGetter, haveCcx)
>+        writeCheckForFailure(f, isMethod, isGetter, False)

Shouldn't the haveCcx param to writeCheckForFailure just go away?

r=bholley with that addressed. Please keep an eye on dev-tree-management to see if anything shows up perf-wise. If so, we can try tweaking the string cache size.
Attachment #662465 - Flags: review?(bobbyholley+bmo) → review+
(In reply to Bobby Holley (:bholley) from comment #27)
> Shouldn't the haveCcx param to writeCheckForFailure just go away?

Yeah... so it was in another file, fixed it.

> r=bholley with that addressed. Please keep an eye on dev-tree-management to
> see if anything shows up perf-wise. If so, we can try tweaking the string
> cache size.

Ok, I'll do that.

Also:

http://hg.mozilla.org/integration/mozilla-inbound/rev/ed7b21f9ae6a
http://hg.mozilla.org/integration/mozilla-inbound/rev/343dde4d974d
http://hg.mozilla.org/integration/mozilla-inbound/rev/3cd3840dea0b
https://hg.mozilla.org/mozilla-central/rev/ed7b21f9ae6a
https://hg.mozilla.org/mozilla-central/rev/343dde4d974d
https://hg.mozilla.org/mozilla-central/rev/3cd3840dea0b
Assignee: nobody → gkrizsanits
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.