Stop using JSPropertyOps in DOM bindings where we care about performance, because BC can't deal with that

RESOLVED FIXED in mozilla23

Status

()

RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: bzbarsky, Assigned: bzbarsky)

Tracking

unspecified
mozilla23
x86
Mac OS X
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

This really only applies to the old proxy bindings and quickstubs.
(Assignee)

Updated

6 years ago
Blocks: 786126
(Assignee)

Comment 1

6 years ago
Created attachment 662301 [details] [diff] [review]
part 1.  Convert quickstubs to using JSNative getters and setters.
(Assignee)

Comment 2

6 years ago
Comment on attachment 662301 [details] [diff] [review]
part 1.  Convert quickstubs to using JSNative getters and setters.

This got r=peterv in bug 786126.
Attachment #662301 - Flags: review+
(Assignee)

Comment 3

6 years ago
Created attachment 662302 [details] [diff] [review]
part 2.  Convert old proxy bindings to JSNative getters and setters.
Attachment #662302 - Flags: review?(peterv)
(Assignee)

Comment 4

6 years ago
Comment on attachment 662302 [details] [diff] [review]
part 2.  Convert old proxy bindings to JSNative getters and setters.

Need review on the jsobj change too.
Attachment #662302 - Flags: review?(ejpbruel)
(Assignee)

Comment 5

6 years ago
Note: part 2 makes us hit unexpected passes in some of the DOM tests.  So I'll need to fix those too (by adjusting the test); I just need to sit down and check to make sure those are all due to the jsnative switch fixing bugs.
(Assignee)

Comment 6

6 years ago
Created attachment 662419 [details] [diff] [review]
With the fail annotations removed
Attachment #662419 - Flags: review?(peterv)
Attachment #662419 - Flags: review?(ejpbruel)
(Assignee)

Updated

6 years ago
Attachment #662302 - Attachment is obsolete: true
Attachment #662302 - Flags: review?(peterv)
Attachment #662302 - Flags: review?(ejpbruel)
(Assignee)

Updated

6 years ago
Whiteboard: [need review]
Comment on attachment 662419 [details] [diff] [review]
With the fail annotations removed

Review of attachment 662419 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/codegen.py
@@ +533,5 @@
> +    elif isSetter:
> +        inArgs = requiredArgs = 1
> +    else:
> +        inArgs = requiredArgs = 0
> +        

Trailing ws
Comment on attachment 662301 [details] [diff] [review]
part 1.  Convert quickstubs to using JSNative getters and setters.

Review of attachment 662301 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCQuickStubs.cpp
@@ +132,5 @@
>                                             stringTable + ps->name_index,
> +                                           JSVAL_VOID,
> +                                           (JSPropertyOp)ps->getter,
> +                                           (JSStrictPropertyOp)ps->setter,
> +                                           flags | JSPROP_SHARED | JSPROP_NATIVE_ACCESSORS)) 

trailing ws

@@ +287,5 @@
> +    JSString *str = JS_InternString(cx, memberName);
> +    if (!str) {
> +        // Now what?  Just throw something
> +        XPCThrower::BuildAndThrowException(cx, rv, "Failed to intern prop name");
> +        return false;

I think jseng has already set an exception at this point

@@ +393,5 @@
> +    JSString *str = JS_InternString(cx, propName);
> +    if (!str) {
> +        // Now what?  Just throw something
> +        XPCThrower::BuildAndThrowException(cx, rv, "Failed to intern prop name");
> +        return;

Same here
(Assignee)

Comment 9

6 years ago
Applied those changes.
Has this been checked in?  I'm wondering if I should update to tip and rebase my current patch queue.

I have Bug 786126 building in debug mode, but when I build with optimizations on I'm getting a compile error about an ambiguous call to:

xpc_qsThrowBadSetterValue(JSContext *, nsresult, JSObject *, int);

That last argument can apparently be a uint16_t or a jsid, and the compiler can't tell what to convert the int to.

The issue seems to be in qsgen.py:writeArgumentUnboxing, around lines 563 or so:

f.write("        xpc_qsThrowBadSetterValue(cx, rv, JSVAL_TO_OBJECT(vp[1]), %s);\n" % propIndex)


I'm locally patching it here to explicitly cast that last argument to uint16_t, as I suspect that's the intent.  It seems to be building OK now, but I can't say for sure until the build finishes.
Nothing's been checked in yet.  I could land part 1, but part 2 is still waiting on reviews.

You could just import these into the start of your patch queue for now if you want, then qrm them when they get pushed...  Let me know if you want to do that and I'll upload exactly what I plan to push.

> getting a compile error about an ambiguous call

Er, yes.  I fixed that locally after uploading the already-reviewed patch, based on try failures, but failed to upload the updated patch, I think...  In my local tree that %s is explicitly cast to uint16_t as you describe.
Comment on attachment 662419 [details] [diff] [review]
With the fail annotations removed

Review of attachment 662419 [details] [diff] [review]:
-----------------------------------------------------------------

r+ on the jsobj.cpp part
Attachment #662419 - Flags: review?(ejpbruel) → review+
Attachment #662419 - Flags: review?(peterv) → review+
(Assignee)

Updated

6 years ago
Blocks: 560072
(Assignee)

Updated

6 years ago
Flags: in-testsuite+
Whiteboard: [need review]
>-ReifyPropertyOps(JSContext *cx, JSObject *obj, jsid id, unsigned orig_attrs,

Nice one.

Comment 15

6 years ago
Bug 647423 - Remove __defineGetter__ and __defineSetter__ support
https://bugzilla.mozilla.org/show_bug.cgi?id=647423

__defineGetter__ is non-standard method. Why not using Object.defineProeprty ?
These patches was backed out by ryanvm yesterday, for suspected Windows m-oth leaks.
See: https://hg.mozilla.org/integration/mozilla-inbound/rev/0ad22be0b52a

I had a panic moment when I updated to tip today and my perf gains in bug 786126 disappeared.

There's no bug posted for that backout.  What's the resolution path?
I'm sorry for forgetting to update the bug with the backout changeset. It was late and slipped my mind. That said, the backout did fix the Windows leaks, so the path forward is figuring out the cause and re-landing. bz said that this was green on Try, though, so I'm not sure what to make of that (unless it was due to a bad interaction with something else that landed in the mean time).
Flags: in-testsuite+
No worries.

The only allocation semantics that would be changed by these patches is the creation of a new JSFunction to hold JSNative function pointers.  If there's a leak, the obvious check is to see if these are becoming forgotten.

Can you give me any idea of where you were getting these suspected leaks showing up, or how I can replicate them?  Is it only in Windows that these leaks showed up?
It was basically leaking the world, primarily on Windows XP (though I did see it once or twice on Windows 7 as well). The leaks were only occurring in the mochitest-other suite.

Here's an example log:
https://tbpl.mozilla.org/php/getParsedLog.php?id=15536449&tree=Mozilla-Inbound
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 58152251 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of AsyncStatement with size 48 bytes each (96 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 4998 instances of AtomImpl with size 24 bytes each (119952 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of BackstagePass with size 24 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 801 instances of BodyRule with size 16 bytes each (12816 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of CSSRuleListImpl with size 16 bytes 

Also interesting (and possibly related) is that it also showed an assertion failure on some runs that went away with the backout:
https://tbpl.mozilla.org/php/getParsedLog.php?id=15539124&tree=Mozilla-Inbound&full=1
Assertion failure: compartment()->activeInference, at e:/builds/moz2_slave/m-in-w32-dbg/build/js/src/jsinfer.cpp:2400
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/widget/tests/test_mouse_scroll.xul | Exited with code -2147483645 during test run
PROCESS-CRASH | chrome://mochitests/content/chrome/widget/tests/test_mouse_scroll.xul | application crashed (minidump found)

Hope that helps.
The good news is that 58megs is such a gigantic leak that it should hopefully be easy to track down. Pretty odd that it only shows up on XP, though. 442 out of 2644 nsGlobalWindows leaked.

If you can reproduce it locally, then you can try looking at the CC logs. If not, then you have to try doing something like dumping the CC log to the logfile. I think Bill has some experience with doing that? Once you have the CC log, I can help you figure out what is keeping things alive.
teramako, I have no idea what you're asking.  Please feel free to mail me privately with questions about the __defineGetter__ stuff, but let's not clutter this bug up with them.

Apart from that, general status:

1)  I'm off today, so I won't start looking into this until tomorrow.
2)  I pushed some things to try to test them out on WinXP Moth.  That was 17 hours ago. 
    Those test runs are still pending.  So any experiments using try should be expected to
    have at least 20+ hour lag time.
3)  I have only two real ideas how this could possibly have caused leaks.  Especially
    platform-specific ones.  The first is JS engine bugs, of course; those assertions
    about type inference are interesting.  The second possibility I can think of is that
    I missed one of the broken define/lookupGetter/Setter callsites in our chrome.  That
    seems unlikely, since I double-checked that last night, but I'll try to put together
    a version of the patch tomorrow that leaves those hooks in and push it to try.  Maybe
    sometime on Friday I'll know results from that. 
4)  I'll resurrect my Windows build tomorrow morning, but if I recall the build times 
    correctly that means no actual Windows builds until end of day tomorrow for me.
5)  We need to get this landed before we branch, which is in about two weeks.

So if someone has a Windows build and can reproduce the problem, that would be _very_ helpful....
And yes, if I can dump the cc log to a log file on try, please tell me how!  The earlier I kick that off, the more likely we are to get a result back this month.  ;)
So the jsinfer assert is because someone is calling TypeCompartment::setPendingNukeTypesNoReport on a compartment with no active inference, looks like.  Wish we had a stack.  :(
Depends on: 794974
Ok, bad news.  I tried pushing with the defineGetter/defineSetter bits restored, and we still leak.  See https://tbpl.mozilla.org/?tree=Try&rev=b07f00c8b0c6
I've nominally been looking into the leaks here, but I've been hung up trying to figure out how to get leak logs on XP...
I've got a build running that will hopefully produce a useful shutdown CC log...

https://tbpl.mozilla.org/?tree=Try&rev=d88ba372462e
The shutdown log extraction process in bug 794974 seems to work, but my push didn't end up leaking, so that's not very useful. I'll retrigger it a few times in case it is intermittent. I applied it to a somewhat old tree with changeset 107863:e327e66a027d which is from last Monday. I can try updating my tree if you think that will help.
The WinXP oth didn't leak, but Win oth did, right?
In your try push in comment 24, there were WinXP debug leaks and Win debug timeouts, and comment 19 says it was primarily XP.

Though now that I look at it, there is a 3.8MB leak on Win 7 in my try push, in addition to the timeouts, but it looks like bug 753225 just causes leaks, so I think that's a different issue.
I updated my tree and pushed to try again.

https://tbpl.mozilla.org/?tree=Try&rev=4876758b6234

(There was a merge conflict in the file dom/imptests/failures/webapps/DOMCore/tests/approved/test_interfaces.html.json.rej but the lines it removed didn't seem to be there, so I just ignored it.)
Yeah, if you merged to tip then the part 2 patch is sorta a no-op: that code is not really being used anymore on tip as of a few days ago.  Hence the todo()s for that code (which that patch had removed because using JSNatives fixed them) are gone on tip...
Should I push to try with the patches on top of the version that leaked? Or are you interested in the behavior against tip? I don't really know what is going on in this bug, I'm just qpushing.
For me, just part 1 leaked, which should still apply cleanly to tip.

But mostly I'm interested in the tip behavior, yeah...  If we don't leak there anymore, so much the better.  ;)
It didn't leak against tip either. I'll retrigger a few times in case it is intermittent.

I also pushed without my patches in the unlikely event they are somehow fixing the leaks or causing leaks to fail to be detected:
https://tbpl.mozilla.org/?tree=Try&rev=403c7121911a
It won't show the log, with this error:

SQLSTATE[08S01]: Communication link failure: 1153 Got a packet bigger than 'max_allowed_packet' bytes

So maybe I just flooded the log with the CC dumps, preventing leak detection.
Oops didn't mean to do every platform
https://tbpl.mozilla.org/?tree=Try&rev=37d5828f6ba6
Created attachment 668052 [details]
what is holding nsGlobalWindows alive

I finally managed to get a log. This is only from the third of four shutdown CCs, and I'm working on getting one from the forth instead, but I don't imagine there will be much difference.

This is the raw output of what is holding every nsGlobalWindow alive. I'm going to look it over and do some analysis of common patterns.
This seems like the most common thing rooting windows: some FragmentOrElement (XUL), from various .xul URLs. The path holding the window alive goes up GetParent(), then down the children, into the nsEventListenerManager, into JS via the global, into the type proto, into the getter, then follow the parent edge to a JS Object (ChromeWindow), which holds the nsGlobalWindow alive.

2C4D39F0 [FragmentOrElement (XUL) vbox class='update-content' chrome://mozapps/content/update/updates.xul]
    --[GetParent()]-> 2CA2BE20 [FragmentOrElement (XUL) wizardpage id='finished' chrome://mozapps/content/update/updates.xul]
    --[GetParent()]-> 6968E2E0 [FragmentOrElement (XUL) wizard id='updates' chrome://mozapps/content/update/updates.xul]
    --[mAttrsAndChildren[i]]-> 2D076348 [FragmentOrElement (XUL) wizardpage id='license' chrome://mozapps/content/update/updates.xul]
    --[mAttrsAndChildren[i]]-> 447FE3B0 [FragmentOrElement (XUL) vbox class='update-content' chrome://mozapps/content/update/updates.xul]
    --[mAttrsAndChildren[i]]-> 2C1FF0A8 [FragmentOrElement (XUL) radiogroup id='acceptDeclineLicense' chrome://mozapps/content/update/updates.xul]
    --[[via hash] mListenerManager]-> 2C1FF198 [nsEventListenerManager]
    --[mListeners[i] mListener]-> 2C1FF210 [nsJSEventListener handlerName=onselect]
    --[mContext]-> 51E6C1A8 [nsJSContext]
    --[mContext]-> 64771C38 [JSContext]
    --[[global object]]-> 5B679010 [JS Object (Proxy)]
    --[type_proto]-> 5B6A8040 [JS Object (XPC_WN_ModsAllowed_NoCall_Proto_JSClass - ChromeWindow)]
    --[setter]-> 5B6E9880 [JS Object (Function)]
    --[parent]-> 5B6D3040 [JS Object (ChromeWindow)]
    --[xpc_GetJSPrivate(obj)]-> 2D17C440 [XPCWrappedNative (ChromeWindow)]
    --[mIdentity]-> 466F6078 [nsGlobalWindow #2376]

    Root 2C4D39F0 is a ref counted object with 1 unknown edge(s).
    known edges:
       2CA2BE20 [FragmentOrElement (XUL) wizardpage id='finished' chrome://mozapps/content/update/updates.xul] --[mAttrsAndChildren[i]]-> 2C4D39F0
nsJSEventListeners are another common root. The path is similar, except that it skips all the stuff going up and down the XUL document.

2C752118 [nsJSEventListener handlerName=onclick]
    --[mContext]-> 51E6C1A8 [nsJSContext]
    --[mContext]-> 64771C38 [JSContext]
    --[[global object]]-> 5B679010 [JS Object (Proxy)]
    --[type_proto]-> 5B6A8040 [JS Object (XPC_WN_ModsAllowed_NoCall_Proto_JSClass - ChromeWindow)]
    --[setter]-> 5B6E9880 [JS Object (Function)]
    --[parent]-> 5B6D3040 [JS Object (ChromeWindow)]
    --[xpc_GetJSPrivate(obj)]-> 2D17C440 [XPCWrappedNative (ChromeWindow)]
    --[mIdentity]-> 466F6078 [nsGlobalWindow #2376]

    Root 2C752118 is a ref counted object with 1 unknown edge(s).

nsNodeInfoManager is another fairly common source of rooting (about 1700), that looks a little different.
OK, so the key is that now we have actual functions for the getter and setter, right?

But shouldn't the JS engine already know about those edges from the Proto_JSClass JSObject through its property getters and setters?  Those are the only new edges we added to the graph, and I would have expected that object to have an edge to the ChromeWindow anyway, directly via its own [parent]...
It looks like the JS engine is tracing through the type proto to the the getter and setter (the examples I have here only show paths through setters, but there are just as many getters), because they are in these paths. The question is, what is holding onto the nsJSEventListeners/FragmentOrElement/nsXULPrototypeNode/nsNodeInfoManager/nsDocument that we're somehow not tracking?

Is there some change an nsTimeout could be keeping one of these alive? We don't put them in the purple buffer yet, so that's the only thing I can think of as a possible source of a weird leak. I could try putting together a hail-mary patch based on bug 774874 to see if it helps, though last time I tried it, it was making things crashy, so I'm not sure I can get a working one together...
An nsTimeout can keep alive all sorts of stuff via its function argument, which can close over things, right?

But the real question is why adding these function objects for the getters and setters (which is all the patch does) makes any difference...  :(
It could make something require the CC to die that didn't need it before, which would change when something goes away, and changing the timing of when things are last touched can cause leaks, when things aren't being added to the purple buffer properly. Bill had some nasty problems like this with IGC, though his leaks were really tiny. But of course, for him the class that wasn't being purple buffered properly was only involved in small cycles.

If in fact it is nsTimeout causing the problem, then we would see these huge leaks. It does seem odd to me that there are so many, but oh well. Maybe I can hack up something to test my theory...
Depends on: 798165
I've been poking around, but I haven't made any useful conclusions.

My latest scapegoat is bug 794420, which apparently causes WinXP mochitest-chrome (and not anything else!) to hold onto a bunch of windows for too long.

A try run with bholley's patch: https://tbpl.mozilla.org/?tree=Try&rev=8bf79350c313
bholley's patch does appear to have fixed it.  All three WinXP debug Moth runs were green in my try run.  I have some other weird stuff on my patch queue that didn't fix anything, so it shouldn't matter, but I'll push with just your patch and his, and a more up-to-date tip, to confirm.

If you'd landed this patch 2 days earlier, it would have been bholley who ended up backed out.
Depends on: 794420
No longer depends on: 794974, 798165
Here's a run using a just updated m-c, with only the part 1 patch from here, plus bholley's fix, as landed in m-i:
https://tbpl.mozilla.org/?tree=Try&rev=ab89ecdd84c8
Moth is still broken, though in a different way:
29187 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_consoleapi.html | Test timed out.
29464 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | filename: /test_consoleapi/
29465 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | property 'functionName' - got onAttach, expected doConsoleCalls
29469 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | 'type' is undefined
29470 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | 'actor' is undefined
29471 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | 'className' is undefined
29475 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | 'className' is undefined
29476 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | 'objectProperties' is undefined
29538 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/devtools/webconsole/test/test_object_actor.html | [SimpleTest.finish()] this test already called finish!
...but across all platforms. TBPL is angry so I wasn't able to retrigger in case it was some kind of infra problem.
(Assignee)

Updated

6 years ago
Blocks: 653526
I'm just going to wontfix this.  Instead of trying to fix up quickstubs, we're just removing them as fast as we get things converted to WebIDL.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
Reopening.  Would be good to avoid forcing BC to implement jspropertyop ICs, and we're not close enough on nuking quickstubs yet.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(Assignee)

Updated

6 years ago
Summary: Stop using JSPropertyOps in DOM bindings where we care about performance, because Ion can't deal with that → Stop using JSPropertyOps in DOM bindings where we care about performance, because BC can't deal with that
Part 2 is no longer relevant, since all the lists are on WebIDL bindings.

Landed part 1:

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc6c1d4bff95
Flags: in-testsuite?
Target Milestone: --- → mozilla23
https://hg.mozilla.org/mozilla-central/rev/dc6c1d4bff95
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Looks like this also made us pass the 'name' and 'self' tests on <http://www.w3c-test.org/web-platform-tests/master/html/browsers/the-window-object/window-properties.html>; we used to throw "Illegal operation on WrappedNative prototype object".
Yeah, not too surprising.  ;)
Depends on: 859482

Updated

5 years ago
Blocks: 849602
You need to log in before you can comment on or make changes to this bug.