785096 - Crash with Components.lookupMethod(sel.options, "add")

Reporter

Description

•

13 years ago

Attached file testcase — Details

###!!! ASSERTION: function object has parent of unknown class!: 'Error', file js/xpconnect/src/XPCWrappedNative.cpp, line 1845 Followed by a debug-crash in XPCCallContext::Init while trying to evaluate the following assertion: NS_ABORT_IF_FALSE(!mFlattenedJSObject || IS_SLIM_WRAPPER(mFlattenedJSObject), "should have a slim wrapper");

Jesse Ruderman

Reporter

Comment 1

•

13 years ago

Attached file stack traces — Details

Scoobidiver (away)

Updated

•

13 years ago

Severity: normal → critical

Crash Signature: [@ NS_DebugBreak_P | XPCWrappedNative::GetWrappedNativeOfJSObject]

Boris Zbarsky [:bzbarsky]

Comment 2

•

13 years ago

So what happens here is that mozilla::dom::oldproxybindings::HTMLOptionsCollection_Add initializes an XPCCallContext, which ends up calling XPCWrappedNative::GetWrappedNativeOfJSObject. When it calls that last, the "obj" is a proxy. Also, funObjParent == obj. So funObjParentClass is a proxy class, which is not any of proto, wrapper, tearoff. So we return null from GetWrappedNativeOfJSObject. Then the logic in XPCCallContext::Init sets mWrapper to null, null-checks it, and when it's null does this: 154 NS_ABORT_IF_FALSE(!mFlattenedJSObject || IS_SLIM_WRAPPER(mFlattenedJSObject), 155 "should have a slim wrapper"); But this is being called from the XPCCallContext constructor that doesn't take a flattened JS object, so mFlattenedJSObject is not initialized at this point (should that constructor null-initialize it?) and we try to test something about its JSClass and crash because that pointer is garbage.

Boris Zbarsky [:bzbarsky]

Comment 3

•

13 years ago

Gabor, do you have time to look into this?

Gabor Krizsanits (INACTIVE)

Assignee

Comment 4

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #3) > Gabor, do you have time to look into this? Yes, I will check it out.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 5

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #2) > So what happens here is that > mozilla::dom::oldproxybindings::HTMLOptionsCollection_Add initializes an > XPCCallContext, which ends up calling > XPCWrappedNative::GetWrappedNativeOfJSObject. > > When it calls that last, the "obj" is a proxy. Also, funObjParent == obj. In case of regular xrays this funObjParent should be the wrapper (hack in XPCWrappedNativeXrayTraits::resolveNativeProperty) and it should end up in the IS_WRAPPER_CLASS branch because of it (ignoring the passed in obj). I checked the case of sel.options.add.bind(sel.options)(); and then the obj is the window... Which is totally wrong but since mWrapper of ccx is never used in HTMLOptionsCollection_Add so it does not really matter. So probably we could just fix this by handling the case where funObjParent is an old dom proxy, or dealing with the case where we return null here. Why do we need a call context here exactly by the way? What worries me is that LookupMethod find the method through an xray, and then binds it to the old dom proxy (HTMLOptionsCollection). But the parent object ends up being different than if we do the same thing from javscript... (new XPCNativeWrapper(sel.options).add.bind(sel.options)();) I still would like to figure why we get that dom proxy there before trying to fix this.

Boris Zbarsky [:bzbarsky]

Comment 6

•

13 years ago

> Why do we need a call context here exactly by the way? We don't in practice, but the codegen is dumb. It sees the nsIVariant and assumes we'll need a ccx. Which we do in quickstubs, but apparently not here. So we could try changing anyParamRequiresCcx in js/xpconnect/src/qsgen.py to just return false and seeing what happens too.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 7

•

13 years ago

Attached patch part1: Moving dom string cache from XPCCallContext to XPCJSRuntime — Details — Splinter Review

So the only bits that requires an XPCCallContext here is the variant conversion. And the only real usage of the ccx there is this dom string optimization. As it seems this code could be simply moved to XPCJSRuntime (bug 408139), what I did in the first patch. In the second it's really only about replacing ccx with cx at many places, to finally being able to change the codegen, not to create a ccx.

Attachment #660048 - Flags: review?(bzbarsky)

Gabor Krizsanits (INACTIVE)

Assignee

Comment 8

•

13 years ago

Attached patch part2: Removing XPCCallContext dependency from XPCVariant — Details — Splinter Review

Bobby you told me that I can flag you as the reviewer of this patch. By the way it was green on try.

Attachment #660049 - Flags: review?(bobbyholley+bmo)

Gabor Krizsanits (INACTIVE)

Assignee

Comment 9

•

13 years ago

I just realized that I did not create a test from the attached testcase, will do that later.

Boris Zbarsky [:bzbarsky]

Comment 10

•

13 years ago

Comment on attachment 660048 [details] [diff] [review] part1: Moving dom string cache from XPCCallContext to XPCJSRuntime Yeah, I guess with mainthread only xpconnect this works. r=me

Attachment #660048 - Flags: review?(bzbarsky) → review+

Bobby Holley (:bholley)

Comment 11

•

13 years ago

Comment on attachment 660049 [details] [diff] [review] part2: Removing XPCCallContext dependency from XPCVariant > > // XXX check IsPtr - esp. to handle array of nsID (as opposed to nsID*) > >+ JSObject* scope = lccx.GetScopeForNewJSObjects(); >+ > switch (type.TagPart()) { Uh, what? >--- a/js/xpconnect/src/XPCWrappedNative.cpp >+++ b/js/xpconnect/src/XPCWrappedNative.cpp > } else if (isSizedString) { >- if (!XPCConvert::NativeStringWithSize2JS(mCallContext, &v, >+ if (!XPCConvert::NativeStringWithSize2JS(mCallContext.GetJSContext(), &v, > (const void*)&dp->val, > // we actually assured this before doing the invoke > NS_ASSERTION(mArgv[i].isObject(), "out var is not object"); >- if (!JS_SetPropertyById(mCallContext, >+ if (!JS_SetPropertyById(mCallContext.GetJSContext(), > &mArgv[i].toObject(), > mIdxValueId, &v)) { Why are these changes necessary. Shouldn't the implicit conversion take care of it? In particular, JS_SetPropertyById(mCallContext, ...) seemed to work just fine before this patch... > def anyParamRequiresCcx(member): >- for p in member.params: >- if isVariantType(p.realtype): >- return True Nice. Can we get another patch on top of this one that removes anyParamRequiresCcx and fixes any consumers just to assume false there? Awesome patch! r=bholley with the above comments addressed. :-)

Attachment #660049 - Flags: review?(bobbyholley+bmo) → review+

Gabor Krizsanits (INACTIVE)

Assignee

Comment 12

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #11) > Comment on attachment 660049 [details] [diff] [review] > >+ JSObject* scope = lccx.GetScopeForNewJSObjects(); > >+ > > switch (type.TagPart()) { > > Uh, what? Remaining from a previous version where I had a NativeData2JS that takes a scope and cx insted of a ccx... Anyway I'll clean it up. > > >--- a/js/xpconnect/src/XPCWrappedNative.cpp > >+++ b/js/xpconnect/src/XPCWrappedNative.cpp > > > } else if (isSizedString) { > >- if (!XPCConvert::NativeStringWithSize2JS(mCallContext, &v, > >+ if (!XPCConvert::NativeStringWithSize2JS(mCallContext.GetJSContext(), &v, > > (const void*)&dp->val, > > > // we actually assured this before doing the invoke > > NS_ASSERTION(mArgv[i].isObject(), "out var is not object"); > >- if (!JS_SetPropertyById(mCallContext, > >+ if (!JS_SetPropertyById(mCallContext.GetJSContext(), > > &mArgv[i].toObject(), > > mIdxValueId, &v)) { > > > Why are these changes necessary. Shouldn't the implicit conversion take care > of it? In particular, JS_SetPropertyById(mCallContext, ...) seemed to work > just fine before this patch... So I just made these changes to make it explicit for this patch where is this all starting from. I will remove them before commit. > > Nice. Can we get another patch on top of this one that removes > anyParamRequiresCcx and fixes any consumers just to assume false there? Sure! Also, I modified qsgen.py and codegen.py as well, but apparently qsgen is never run... Do we need it in the repo at all? Or is there some other script that is using it?

Gabor Krizsanits (INACTIVE)

Assignee

Comment 13

•

13 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #12) > Also, I modified qsgen.py and codegen.py as well, but apparently qsgen is > never run... Do we need it in the repo at all? Or is there some other script > that is using it? Sorry my bad. Peter explained me how it works and which file it generates.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 14

•

13 years ago

Comment on attachment 660048 [details] [diff] [review] part1: Moving dom string cache from XPCCallContext to XPCJSRuntime Review of attachment 660048 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/xpconnect/src/XPCJSRuntime.cpp @@ +1035,5 @@ > > +XPCReadableJSStringWrapper * > +XPCJSRuntime::NewStringWrapper(const PRUnichar *str, PRUint32 len) > +{ > + for (PRUint32 i = 0; i < XPCCCX_STRING_CACHE_SIZE; ++i) { Don't forget to run the script to make this uint32_t. ::: js/xpconnect/src/xpcprivate.h @@ +959,5 @@ > + struct StringWrapperEntry > + { > + StringWrapperEntry() : mInUse(false) { } > + > + js::AlignedStorage2<XPCReadableJSStringWrapper> mString; Make this mozilla::AlignedStorage2 while you're here.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 15

•

13 years ago

Comment on attachment 660049 [details] [diff] [review] part2: Removing XPCCallContext dependency from XPCVariant Review of attachment 660049 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/xpconnect/src/codegen.py @@ +214,3 @@ > " if (!${name}) {\n" > + " nsresult rv;" > + " xpc_qsThrowBadArg(cx, rv, vp, %d);\n" You're throwing an uninitialized rv here, aren't you? ::: js/xpconnect/src/qsgen.py @@ +543,3 @@ > " if (!${name}) {\n" > + " nsresult rv;" > + " xpc_qsThrowBadArg(cx, rv, vp, %d);\n" Same here ::: js/xpconnect/src/xpcprivate.h @@ +4023,5 @@ > { > public: > + ArrayAutoMarkingPtr(JSContext* cx) > + : AutoMarkingPtr(cx), mPtr(nullptr), mCount(0) {} > + ArrayAutoMarkingPtr(JSContext* cx, T** ptr, PRUint32 count, bool clear) uint32_t

Gabor Krizsanits (INACTIVE)

Assignee

Comment 16

•

13 years ago

> Comment on attachment 660048 [details] [diff] [review] > part1: Moving dom string cache from XPCCallContext to XPCJSRuntime > > Yeah, I guess with mainthread only xpconnect this works. r=me It does work, however... I'm hesitating to commit it because I noticed some serious performance drop back on the debug build locally. So thinking about this change, pre-patch this optimization was on the level of ccx. So in case of some deep call stack, we could have multiple ccx around, with multiple string caches, so optimistically all the dom string could be effected by this optimization. While post path we have always only one XPCJSRuntime, so top most XPCCCX_STRING_CACHE_SIZE (2) dom string could be effected at any given time. Now I can increase that number ad hoc and see what's the result of it (and for validation I did that but didn't see it helping much) but I would like to do some real performance check. What do you think? Does this make sense? Or are there some automatic perf. checks on try already, that should have alerted if my theory checked out? I might be wrong, and it can be just some glitch on my local env., but would prefer double checking it before commit over causing some nasty backouts later. (In reply to :Ms2ger from comment #15) > Comment on attachment 660049 [details] [diff] [review] > You're throwing an uninitialized rv here, aren't you? Whoops... fixed and thanks.(In reply to Boris Zbarsky (:bz) from comment #10)

Bobby Holley (:bholley)

Comment 17

•

13 years ago

Well, debug builds aren't great for perf testing, but that's a totally valid theory (increasing the string cache size should do the trick though, I'd think). You can just push to try with -t all and see what happens. If it's actually noticeable for you on your local build, I'd expect the Talos numbers to go bonkers.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 18

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #17) > Well, debug builds aren't great for perf testing, but that's a totally valid > theory (increasing the string cache size should do the trick though, I'd > think). Yeah I tried that but didn't really help. > You can just push to try with -t all and see what happens. If it's actually > noticeable for you on your local build, I'd expect the Talos numbers to go > bonkers. Even Talos looks pretty green to me... https://tbpl.mozilla.org/?tree=Try&rev=d8f07d5eb75a So shall I just commit this? (3rd patch will still need a review ofc, I mean the first 2)

Bobby Holley (:bholley)

Comment 19

•

13 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #18) > (In reply to Bobby Holley (:bholley) from comment #17) > > Well, debug builds aren't great for perf testing, but that's a totally valid > > theory (increasing the string cache size should do the trick though, I'd > > think). > > Yeah I tried that but didn't really help. > > > You can just push to try with -t all and see what happens. If it's actually > > noticeable for you on your local build, I'd expect the Talos numbers to go > > bonkers. > > Even Talos looks pretty green to me... > https://tbpl.mozilla.org/?tree=Try&rev=d8f07d5eb75a Uh, I don't see any talos boxes on that try run. Also I don't think talos goes orange. You have to manually compare the numbers (or use the 'compare' checkbox).

Boris Zbarsky [:bzbarsky]

Comment 20

•

13 years ago

> Yeah I tried that but didn't really help. How are you determining the performance impact? Have you verified in an opt build? Generally, doing performance measurement in a debug build is completely pointless; large parts of the browser have different algorithmic complexity in opt and debug builds.

Jesse Ruderman

Reporter

Comment 21

•

13 years ago

I'd be somewhat unhappy if debug builds suddenly got a lot slower. It would slow down our regression and fuzz testing.

Boris Zbarsky [:bzbarsky]

Comment 22

•

13 years ago

Well, yes. But the key part is "how are you determining?"

Gabor Krizsanits (INACTIVE)

Assignee

Comment 23

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #20) > > Yeah I tried that but didn't really help. > > How are you determining the performance impact? Have you verified in an opt > build? Generally, doing performance measurement in a debug build is > completely pointless; large parts of the browser have different algorithmic > complexity in opt and debug builds. It was significant enough at starting up, and later on as well. By simple using the browser. Everything was like 3-4 times slower. So if it's not some glitch on that local copy/build/etc then it's really bad. And I get that debug build are not for profiling but there is a limit to that rule... I'm talking about 3-4 times slower then another copy I built without these changes. Anyway I will spend some time on building a clean version and figure out what is going on.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 24

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #19) > Uh, I don't see any talos boxes on that try run. Also I don't think talos > goes orange. You have to manually compare the numbers (or use the 'compare' > checkbox). Ehh... my bad... so there is a TryChooser bug it seems and only the xperf test has run. Anyway will push it again with -t All this time and get back to here.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 25

•

13 years ago

So I experienced extreme slowness with another patch on that same local copy of the tree. I have no clue why is that local copy so much slower than the rest... but apparently it is not the fault of this patch. In the meanwhile I ran the talos tests: https://tbpl.mozilla.org/?tree=Try&rev=ed83243fbf5a Some numbers: (python compare.py --revision ed83243fbf5a --branch Try --masterbranch Firefox) Linux: tdhtmlr: 405.265 -> 417.118; 412.147. tdhtmlr_nochrome: 400.794 -> 412.971; 402.735 ts_places_med: No results found ts_places_max: No results found dromaeo_css: 1497.81 -> 2460.26; 2447.38. dromaeo_dom: 76.8533 -> 162.51; 161.607. tscrollr: 10174.9 -> 14373.8; 14112.1. a11yr: 607.0 -> 642.0; 610.0. ts_paint: 692.263 -> 717.947; 694.895. tpaint: 208.0 -> 227.0; 216.0. tsvgr: 2762.35 -> 4028.68; 3894.45. tsvgr_opacity: 97.0 -> 103.0; 100.0. tp5n: 364.061 -> 384.449; 368.404. Linux64: tdhtmlr: 377.029 -> 387.206; 378.971. :) tdhtmlr_nochrome: 362.147 -> 372.941; 362. ts_places_med: No results found ts_places_max: No results found dromaeo_css: 1739.14 -> 2848.33; 2795.13. dromaeo_dom: 94.9733 -> 208.807; 206.337. tscrollr: 12072.5 -> 14010.5; 13713.3. a11yr: 519.0 -> 551.5; 527.0. ts_paint: 639.789 -> 661.0; 652.421. tpaint: 197.0 -> 214.0; 201.0. tsvgr: 2598.25 -> 3755.82; 2608.35. tsvgr_opacity: 79.0 -> 85.0; 80.5. tp5n: 312.692 -> 325.854; 314.116. Win: tdhtmlr: 377.235 -> 385.588; 378.382. tdhtmlr_nochrome: 340.088 -> 359.235; 345.735 ts_places_med: No results found ts_places_max: No results found dromaeo_css: 1831.69 -> 3030.85; 3020.05. dromaeo_dom: 97.63 -> 211.957; 211.907. tscrollr: 10791.6 -> 10915.2; 10815.8. a11yr: 480.0 -> 501.5; 484.0. ts_paint: 726.632 -> 805.789; 727.632. tpaint: 231.0 -> 245.0; 233.0. tsvgr: 2165.05 -> 3332.32; 3329.09. tsvgr_opacity: 177.5 -> 239.0; 180.0. tp5n: 306.545 -> 945.505; 307.611. WinXP: tdhtmlr: 567.882 -> 649.971; 637.647. tdhtmlr_nochrome: 704.029 -> 756.412; 742.529 ts_places_med: No results found ts_places_max: No results found dromaeo_css: 1861.3 -> 3060.34; 3045.28. :) dromaeo_dom: 98.74 -> 214.31; 214.77. tscrollr: 10755.1 -> 10793.7; 10776.8. a11yr: 449.0 -> 476.5; 455.0. ts_paint: 575.105 -> 646.684; 587.737. tpaint: 186.0 -> 198.0; 187.0. tsvgr: 2494.25 -> 3815.0; 2638.95. tsvgr_opacity: 462.0 -> 479.0; 470.5. tp5n: 288.758 -> 297.081; 290.621. OSX10.7: tdhtmlr: 406.647 -> 413.206; 409.382. tdhtmlr_nochrome: 377.941 -> 383.353; 382.912 ts_places_med: No results found ts_places_max: No results found dromaeo_css: 2133.02 -> 3615.56; 3525.58. dromaeo_dom: 118.4 -> 283.883; 278.497. tscrollr: 13375.7 -> 13495.4; 13442.5. a11yr: 347.0 -> 363.5; 351.5. ts_paint: 809.053 -> 934.526; 877.0. tpaint: 403.0 -> 465.0; 451.0. tsvgr: 2136.1 -> 3410.82; 3330.77. tsvgr_opacity: 551.5 -> 573.0; 555.0. tp5n: 258.152 -> 275.616; 270.53. OSX64: tdhtmlr: 403.118 -> 419.529; 411.588. tdhtmlr_nochrome: 362.235 -> 417.853; 381.824 ts_places_med: No results found ts_places_max: No results found dromaeo_css: 2180.7 -> 3629.15; 3589.82. dromaeo_dom: 117.583 -> 279.883; 278.48. tscrollr: 13423.2 -> 13582.4; 13519.5. a11yr: 352.5 -> 371.5; 362.5. ts_paint: 744.053 -> 790.211; 768.895. tpaint: 304.0 -> 363.0; 334.0. tsvgr: 2153.15 -> 3384.95; 3369.14. tsvgr_opacity: 440.0 -> 470.0; 459.5. tp5n: 256.657 -> 276.035; 271.712. If I read these numbers right, it's in the range. (except on xp this one: dromaeo_dom: 98.74 -> 214.31; 214.77. but that's not that scary) So my conclusion: I was a bit overcautious it seems and I can just land this. Attaching the 3rd patch that is some cleanup for Bobby's request.

Gabor Krizsanits (INACTIVE)

Assignee

Comment 26

•

13 years ago

Attached patch part3: Removing ccx from codegen/qsgen — Details — Splinter Review

This patch removes only some stuff that is not used any longer.

Attachment #662465 - Flags: review?(bobbyholley+bmo)

Bobby Holley (:bholley)

Comment 27

•

13 years ago

Comment on attachment 662465 [details] [diff] [review] part3: Removing ccx from codegen/qsgen >- writeCheckForFailure(f, isMethod, isGetter, haveCcx) >+ writeCheckForFailure(f, isMethod, isGetter, False) Shouldn't the haveCcx param to writeCheckForFailure just go away? r=bholley with that addressed. Please keep an eye on dev-tree-management to see if anything shows up perf-wise. If so, we can try tweaking the string cache size.

Attachment #662465 - Flags: review?(bobbyholley+bmo) → review+

Gabor Krizsanits (INACTIVE)

Assignee

Comment 28

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #27) > Shouldn't the haveCcx param to writeCheckForFailure just go away? Yeah... so it was in another file, fixed it. > r=bholley with that addressed. Please keep an eye on dev-tree-management to > see if anything shows up perf-wise. If so, we can try tweaking the string > cache size. Ok, I'll do that. Also: http://hg.mozilla.org/integration/mozilla-inbound/rev/ed7b21f9ae6a http://hg.mozilla.org/integration/mozilla-inbound/rev/343dde4d974d http://hg.mozilla.org/integration/mozilla-inbound/rev/3cd3840dea0b

Ryan VanderMeulen [:RyanVM]

Comment 29

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/ed7b21f9ae6a https://hg.mozilla.org/mozilla-central/rev/343dde4d974d https://hg.mozilla.org/mozilla-central/rev/3cd3840dea0b

Assignee: nobody → gkrizsanits

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla18

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: DOM → DOM: Core & HTML

testcase 13 years ago Jesse Ruderman 252 bytes, text/html		Details
stack traces 13 years ago Jesse Ruderman 19.18 KB, text/plain		Details
part1: Moving dom string cache from XPCCallContext to XPCJSRuntime 13 years ago Gabor Krizsanits (INACTIVE) 11.34 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
part2: Removing XPCCallContext dependency from XPCVariant 13 years ago Gabor Krizsanits (INACTIVE) 48.12 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
part3: Removing ccx from codegen/qsgen 13 years ago Gabor Krizsanits (INACTIVE) 14.79 KB, patch	bholley : review+	Details \| Diff \| Splinter Review