The whack-a-mole between bug 655878 and bug 604196 was the result of a lot of confusing logic in XPCWrappedNative argument conversion. I think I've got enough of a handle on things to fix bug 655878, but I'd like to make all this code simpler and more understandable while I'm at it.
Brave man! Adding tests too? ;)
Update: I'm 10 patches deep, making good progress. See the github branch for the most up-to-date work: https://github.com/bholley/mozilla-central/commits/xpcwn_refactor
This passes the test suite I wrote in bug 684327, and is ready for review (though there's still more work to do).
mrbkap is reviewing it directly on github.
Thanks for splitting the patch up into bite-sized pieces. I have two patches to go.
We talked about Remove "mAutoString micro-optimization. v1": r=me if its effect was truly unnoticeable.
Bug 683802 - Move the loop contents of ConvertIndependentParams into a separate method (without re-indentation). v1
Bug 683802 - Separate reindentation from previous patch for easier review. No other changes. v1
Bug 683802 - Factor dipper handling out into a helper method. v2
Bug 683802 - Always store jsvals directly within the val union (fixes bug 655878). v3
Bug 683802 - Define and restrict the semantics of PTR_IS_DATA. v1
Bug 683802 - Use an explicit indicator for direct vs indirect calling semantics. v1
+ // Specify the correct storage/calling semantics.
+ if (paramInfo.IsIndirect())
Nit: if( in XPConnect. (This occurs in a couple of the patches).
Bug 683802 - Add jsval to the XPTCMiniVariant union for type safety. v1
Bug 683802 - Coalesce type-specific cleanup indicators. v2
@@ -350,11 +350,12 @@ txParamArrayHolder::~txParamArrayHolder()
+ if (variant.DoesValNeedCleanup())
+ if (variant.type == eSTRING)
+ delete (nsAString*)variant.val.p;
Nit: the bracing style is wrong for xslt/*.
Less of a nit: Please add an assertion that the else case is actually looking at an interface type.
Bug 683802 - Eliminate questionable support for [shared] parameters. v1
Bug 683802 - Remove XPC_JSArgumentFormatter and friends. v1
r=mrbkap on the final two patches as well.
One concern that I have is that we're removing more micro-optimizations here and we don't really know how important they are. We're going to probably want to keep a close eye on chrome performance and make sure that none of these optimizations are still being used on hot paths in the DOM.
bholley: oh wait, higher is _better_ on dromaeo?
bholley: Then my regression was in fact an improvement! Oh happy day!
Here was the initial push that led me to believe there was a regression.
linux64 opt with my patches:
dromaeo_jslib: 129.61 (details)
dromaeo_dom: 252.31 (details)
dromaeo_v8: 350.21 (details)
dromaeo_basics: 705.44 (details)
dromaeo_sunspider: 1119.85 (details)
dromaeo_css: 3399.98 (details)
linux64 opt without my patches (parent commit of my try push):
dromaeo_jslib: 125.87 (details)
dromaeo_dom: 246.6 (details)
dromaeo_v8: 343.81 (details)
dromaeo_basics: 694.1 (details)
dromaeo_sunspider: 1114.38 (details)
dromaeo_css: 3370.52 (details)
But since higher is better, this indicates that there is in fact a 1-3% improvement in my patch set.
I don't know how noisy the numbers are, so I don't know if this is significant. But they all seem to trend in the same direction. My guess is that it's mainly the result of reducing branches, but I'm not sure.
Since I thought this was a regression, I made a commit that adds mAutoString back in at the end, this time factored out neatly into HandleDipperParams(). I pushed it to try, this time on all platforms.
The relevant URLs are:
linux64 opt with my patches (and mAutoString):
dromaeo_jslib: 137.3 (details)
dromaeo_dom: 281.12 (details)
dromaeo_v8: 333.72 (details)
dromaeo_basics: 741.48 (details)
dromaeo_sunspider: 1135.43 (details)
dromaeo_css: 3684.73 (details)
linux64 opt without my patches:
dromaeo_jslib: 138.06 (details)
dromaeo_dom: 280.97 (details)
dromaeo_v8: 326.55 (details)
dromaeo_basics: 731.59 (details)
dromaeo_sunspider: 1109.38 (details)
dromaeo_css: 3665.84 (details)
linux opt with my patches (and mAutoString):
dromaeo_jslib: 134.99 (details)
dromaeo_dom: 234.38 (details)
dromaeo_v8: 430.31 (details)
dromaeo_basics: 740.44 (details)
dromaeo_sunspider: 1230.59 (details)
dromaeo_css: 3432.5 (details)
linux opt without my patches:
dromaeo_jslib: 135.59 (details)
dromaeo_dom: 237.41 (details)
dromaeo_v8: 426.94 (details)
dromaeo_basics: 734.88 (details)
dromaeo_sunspider: 1217.69 (details)
dromaeo_css: 3428.35 (details)
osx opt with my patches (and mAutoString):
dromaeo_jslib: 133.6 (details)
dromaeo_dom: 258.43 (details)
dromaeo_v8: 425.34 (details)
dromaeo_basics: 687.25 (details)
dromaeo_sunspider: 1225.08 (details)
dromaeo_css: 3324.12 (details)
osx opt without my patches:
dromaeo_jslib: 132.75 (details)
dromaeo_dom: 259.57 (details)
dromaeo_v8: 416.34 (details)
dromaeo_basics: 687.59 (details)
dromaeo_sunspider: 1221.1 (details)
dromaeo_css: 3297.12 (details)
osx64 with my patches (and mAutoString):
dromaeo_jslib: 128.57 (details)
dromaeo_dom: 278.45 (details)
dromaeo_v8: 329.58 (details)
dromaeo_basics: 707.15 (details)
dromaeo_sunspider: 1096.48 (details)
dromaeo_css: 3343.4 (details)
osx64 without my patches:
dromaeo_jslib: 127.48 (details)
dromaeo_dom: 280.67 (details)
dromaeo_v8: 322.53 (details)
dromaeo_basics: 689.17 (details)
dromaeo_sunspider: 1096.4 (details)
dromaeo_css: 3329.11 (details)
Win Talos results unavailable for some reason. :\
These seem somewhere between modest gains and a wash. In particular, the linux64 numbers (the only ones I have for the push with mAutoString removed) seem to indicate that mAutoString isn't helping us.
So it looks like killing mAutoString is the way to go. I'll do a full try push before landing to get perf numbers on all platforms.
+// XPIDL decides which types to make dippers. The list of these types
+// is given in the DIPPER_TYPE() macro in xpidl.h, and is currently
+// limited to 4 string types.
Shouldn't this point to typelib.py?
(In reply to Ms2ger from comment #7)
> +// XPIDL decides which types to make dippers. The list of these types
> +// is given in the DIPPER_TYPE() macro in xpidl.h, and is currently
> +// limited to 4 string types.
> Shouldn't this point to typelib.py?
Indeed. Updated to reflect that.
Perf results from my push last night look good. Here's the lowdown:
With khuey's help, I fixed a tricky little issue where the test interface typelib wasn't ending up on tinderbox, leading to test failures. Should be fixed now. I've done one final try push:
If this goes green, I'll push to mozilla-inbound at around 2PM. \o/
Landed on mozilla-inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/3bfef7f630dc (and ancestors)
This passed on try: https://tbpl.mozilla.org/?tree=Try&rev=82561a8c8594 , and did well on Talos as well (see comment 9).
For some reason this is burning on Maemo. The error says "extra ";"". The code in question is a NS_GENERIC_FACTORY_CONSTRUCTOR that has a trailing ;. None of the other instances of this have the trailing ;, so I'm going to try removing it, pushing to try, and see what happens.
I landed a followup that just removes the three trailing ; that Maemo complained about. Hopefully that will fix the bustage.
Backed out of inbound since bholley's 23-changeset push caused xpcshell orange, even after an in-place bustage fix (!!). Was just easier to back the whole push out.
Main backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/91b82bc49470
Bustage fix backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/e9f4b05cfbaa
Comment 11 try run didn't complete on Windows, which was the platform that failed (and even failed on Maemo, yet was still landed!). Please make sure that all platform pass try before landing a whopping 23 changesets (which when combined with a non-consecutive maemo bustage fix, proved a hassle to backout). Thanks :-)
The failure in question here was actually the final issue I was sorting out on this stuff. Thought to be solved, but clearly not! ;-) It was previously appearing on all try platforms (though never locally until I replicated the precise environment with which buildbot runs the test suite). I investigated it, figured out the issue, and (with khuey's help) came up with a patch:
This patch made sense, worked in local testing, and fixed the problem on all of the platforms that completed the try run. Windows didn't complete for some reason, but I maintain that it was reasonable to assume that the problem, platform-neutral as it was previously, would be fixed there too. Poor luck, but not poor judgement. ;-)
The Maemo thing was certainly my fault. I didn't see it - my bad. I'll certainly watch out for it in the future. :-)
> since bholley's 23-changeset push
> a whopping 23 changesets
Why so much negativity here? This kind of work is routinely squashed into a single patch, to the detriment of reviewer sanity, regression finding, and useful history. We know how to back out multiple changesets without resorting to O(n) techniques (Ehsan has a blog post about this). So the vitriol seems a bit uncalled for IMHO.
Regardless, we now have a bit of a problem. The open web apps team needs this stuff landed before the branch, and I don't have time this weekend to spin up a windows VM and figure out what's going on. But reality, the failure in question is in the test code (the build system for the test code, even!). This is of course stuff that should land with the refactor. But we know from running these tests on the different platforms that the refactor is fundamentally sound. So, assuming there are no objections, I don't think there's too much harm in landing the refactor minus the test code, and sorting out the test code after the branch.
So I rebased the refactor off directly onto master: https://github.com/bholley/mozilla-central/tree/refactor-notests
and pushed it to try: https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=c33f04e55afe
I'll let it do its full thing overnight. I'm not going to be at my computer until tomorrow evening, but anyone else who wants them sooner (ie, fabrice) is welcome to land the changes too. I'm attaching the exact patch set to the bug. If all goes well, just hg qimport *, hg qpush -a, hg qfinish -a, and hg push!
Created attachment 562225 [details]
patchset for comment 15
(In reply to Bobby Holley (:bholley) from comment #15)
> Why so much negativity here?
I'm sorry. I was tired and unduly vented the products of a bad day in your direction. I should not have done this, as it was not productive, nor fair on you. Thank you for the clarification of the try situation + the reality check :-)
The line at http://mxr.mozilla.org/mozilla-central/source/xpcom/components/nsComponentManager.cpp#770 is failing. Looks like we can't handle "interfaces ../foo.xpt" on windows.
/me notes that this is another instance where our macros that dump stuff to stderr turned a multi-hour debugging session into 10 seconds of reading a log ...
Oh, and please do keep splitting patches like this! ;)
(In reply to Ed Morley [:edmorley] from comment #17)
> I'm sorry. I was tired and unduly vented the products of a bad day in your
> direction. I should not have done this, as it was not productive, nor fair
> on you. Thank you for the clarification of the try situation + the reality
> check :-)
No worries. I really appreciate all the work you do sheriffing the tree, and I know that stuff can get pretty hectic sometimes. :-)
(In reply to Kyle Huey [:khuey] (email@example.com) from comment #18)
> The line at
> nsComponentManager.cpp#770 is failing. Looks like we can't handle
> "interfaces ../foo.xpt" on windows.
Oh, awesome! Thanks for looking into that. I've updated the patch to just symlink the xpt file into both directories: https://github.com/bholley/mozilla-central/commits/xpcwn_refactor
I pushed this whole thing to try again here: https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=8c30ddd8b511
Once it (all!) goes green, the whole set (including the tests) should be good to land. I can do that this evening, but in the mean time I'm attaching the patchset that can be pushed by anyone who wants to. Just follow the instructions in comment 15.
/me heads off for the day
Created attachment 562240 [details]
patchset for comment 21
Hm, 12 hours later and the Windows results still haven't come in. Anyone have any ideas?
Ed - what do you think about the rest of the results - are there any other failures showing up there that I should look into?
Ok, so according to the self-serve api it doesn't look like the windows tests results are coming. I kicked the above submission to build again, just in case.
This makes twice in a row where I don't get windows test results on try, but everyone else seems to. My hypothesis is that they might be miscategorized as talos rather than unit tests, meaning my '-t none' kills them somehow. So I tried a trychooser set that does everything for windows:
(In reply to Bobby Holley (:bholley) from comment #23)
> Ed - what do you think about the rest of the results - are there any other
> failures showing up there that I should look into?
The gesture is appreciated - though after my grumpiness the other day I probably don't deserve it ;-)
The windows tests have finally shown up on the original try run (presumably from the 3rd rebuild request) and look fine to me, but it's strange they hadn't twice in a row. The new try run suffered the same fate as well.
A skim of recent try pushes on TBPL shows that the windows tests seems to be triggering fine for almost all the other pushes, though I did find one other (https://tbpl.mozilla.org/?tree=Try&rev=a13f81776b22), maybe you've just been unlucky!
Either way, don't see any reason why this shouldn't land :-)
I'll ask on #build/file a bug about the win tests not being triggered.
Relanded on inbound :-)
This caused a significant perf regression on the N900s (>50% on Tp4), and we've had no android perf data for several days ... if we can't get confirmation that this didn't regress android before we branch I think this has to bounce from 9.
Filed bug 689350 on the lack of Android Tp4m data since last week.
> and we've had no android perf data for several days
Including over the harness changes? Or no? As in, was this the first perf result on android after the harness changes?
We stopped getting Tp4m data on Android on 9/19. See bug 689350 for details. Other perf tests (including Tp4m NoChrome) are still reporting data.
OK, so that definitely covers the range across which we made harness changes. Doesn't quite answer my last question from comment 30....
The performance regression blamed on this bug was detected on N900. There was no gap in the N900 data; the regression clearly took place in this changeset range:
There's no N900 Talos coverage on mozilla-inbound so we can't easily narrow that range.
(In reply to Matt Brubeck (:mbrubeck) from comment #33)
> The performance regression blamed on this bug was detected on N900. There
> was no gap in the N900 data; the regression clearly took place in this
> changeset range:
Well, then I'm blameless, no? That was my first attempt to land, and it includes Ed's backout of my changes.
(In reply to Bobby Holley (:bholley) from comment #34)
> Well, then I'm blameless, no? That was my first attempt to land, and it
> includes Ed's backout of my changes.
Yes, you're right of course. I'll post to mozilla.dev.tree-management to start finding the real regression.
https://hg.mozilla.org/mozilla-central/rev/44d1fafa07d0 looks odd to me, shouldn't
(datum_type.TagPart() == nsXPTType::T_IID ||
- datum_type.TagPart() == nsXPTType::T_PSTRING_SIZE_IS)) ||
+ datum_type.TagPart() == nsXPTType::T_PSTRING_SIZE_IS) ||
+ datum_type.TagPart() == nsXPTType::T_PWSTRING_SIZE_IS) ||
(datum_type.TagPart() == nsXPTType::T_IID ||
- datum_type.TagPart() == nsXPTType::T_PSTRING_SIZE_IS)) ||
+ datum_type.TagPart() == nsXPTType::T_PSTRING_SIZE_IS ||
+ datum_type.TagPart() == nsXPTType::T_PWSTRING_SIZE_IS)) ||
Good point - I'm not totally sure it matters (specifically, I'm not sure if the IsPointer check is necessary), but I've filed bug 690362 to fix it.