Closed
Bug 539553
Opened 16 years ago
Closed 16 years ago
Correctness regression on the r-tree benchmark
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a1
| Tracking | Status | |
|---|---|---|
| status1.9.2 | --- | final-fixed |
| status1.9.1 | --- | unaffected |
People
(Reporter: bzbarsky, Assigned: jorendorff)
References
()
Details
(Keywords: regression, Whiteboard: [sg:critical?] fixed-in-tracemonkey)
Attachments
(3 files, 1 obsolete file)
The benchmark in the url bar throws an exception in 3.6rc1 instead of completing; works fine in 3.5, according to comments at http://hacks.mozilla.org/2010/01/javascript-speedups-in-firefox-3-6/
I can certainly reproduce on m-c. Trying to find some regression ranges now.
Flags: blocking1.9.2?
| Reporter | ||
Comment 1•16 years ago
|
||
So this used to break on "Performing 1k window queries.." but then started working there and now breaks on the "Deleting..." part. That change was in this range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fd765c193770&tochange=21972540dbf4
Tracemonkey merge in there, as expected. I'll see if I can pin this down more.
Still not sure when the first (bigger) breakage appeared.
| Reporter | ||
Comment 2•16 years ago
|
||
Original regression is in this range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=38754465ffde&tochange=3079370d6597
With again a tm merge, huge one.
| Reporter | ||
Comment 3•16 years ago
|
||
My initial guess from that big merge is one of the closure fixes... trying to bisect now.
Updated•16 years ago
|
Flags: blocking1.9.2? → blocking1.9.2+
| Reporter | ||
Comment 4•16 years ago
|
||
OK, bisecting for that first thing totally failed, because 64-bit jit was enabled _after_ the regression was introduced. So the bisect found that changeset.
I'll try to find where the partial fix happened...
| Reporter | ||
Comment 5•16 years ago
|
||
I have just confirmed that I do see this problem with Fx 3.6rc1 on Mac, by the way.
| Reporter | ||
Comment 6•16 years ago
|
||
OK, the partial fix was due to the fix for bug 525028 looks like.
| Reporter | ||
Comment 7•16 years ago
|
||
OK, bisect on t-m on my mac shows that the original regression was due to the landing of http://hg.mozilla.org/tracemonkey/rev/2db233807725
That's pretty darned surprising, but I double-checked it via local backout (updated to that changeset, then took out the #include, and the problem went away).
Can anyone explain that? :(
| Reporter | ||
Comment 8•16 years ago
|
||
And of course starting with the parent of that changeset and applying that changeset's patch _also_ works. Sounds like there are dependency problems somewhere.... :(
On the other hand, then applying the patch for bug 505591 makes things fail. Not sure whether that exposes an existing problem by making us trace more or introduces a problem.
Blocks: 505591
Keywords: regression
Comment 9•16 years ago
|
||
Can I get blocking rationale? Not saying that I disagree, per se, but I'm not sure I understand the actual impact to users of this problem, nor the scope or potential for a fix, and marking it as a 1.9.2 blocker effectively holds the release date for Firefox 3.6 at this time, so I'll need to understand and communicate this out to all affected parties.
Comment 10•16 years ago
|
||
It's a JavaScript-runs-correctly issue on at least one benchmark; we don't know how widely the triggering pattern is used on the web, but history indicates that we almost never *over*estimate the frequency of such things, unfortunately.
Does bisecting on 1.9.2 work more cleanly? Certainly less changeset traffic, but maybe not more helpfully organized.
| Assignee | ||
Comment 11•16 years ago
|
||
| Reporter | ||
Comment 12•16 years ago
|
||
> Does bisecting on 1.9.2 work more cleanly?
It'll take me another 3-4 hours of building to test, unfortunately. I'll try to do that sometime today if no one else gets to it first.
| Reporter | ||
Comment 13•16 years ago
|
||
OK. So I put a breakpoint in the relevant JS_SetPendingException, and I get:
0 anonymous(b = undefined, a = [object Object]) ["http://stackulator.com/rtree/rtree.js":689]
1 anonymous(root = undefined, obj = undefined, rect = [object Object]) ["http://stackulator.com/rtree/rtree.js":103]
t = undefined
ltree = undefined
i = undefined
tree = undefined
ret_obj = undefined
from DumpJSStack(). |ltree| is what's passed for |b|, so that makes sense, sort of. But |ltree| is set by doing:
var ltree = tree.nodes[i];
right before making that call to the function that takes a and b. And if |tree| is undefined, why did that not throw? And i should definitely not be undefined...
It feels like something has scribbled on memory somewhere. Or something.
| Assignee | ||
Comment 14•16 years ago
|
||
The previous test case actually found an old bug, not the bug we're looking for now. Bogus test reduction.
This one throws a TypeError with -j in 1.9.2 tip shell and passes without -j (though it takes a few seconds to run).
Attachment #421642 -
Attachment is obsolete: true
| Reporter | ||
Comment 15•16 years ago
|
||
| Assignee | ||
Updated•16 years ago
|
Assignee: general → jorendorff
| Assignee | ||
Comment 16•16 years ago
|
||
TraceRecorder::record_JSOP_LENGTH()
{
...
if (STOBJ_GET_CLASS(obj) == &js_ArgumentsClass) {
...
LIns* v_ins = lir->ins1(LIR_i2f, INS_CONST(afp->argc));
But fp->argc does not change when a script assigns to arguments.length.
Note that by the time we get on trace, it may already be out of sync, so we can't fix this by detecting at record time if this has happened and aborting.
Note that this is not just a correctness bug. In bz's test case, argc is an int and arguments.length is an int. If instead arguments.length is a different type:
function f() {
var x = arguments;
arguments.length = {};
for (var i = 0; i < 9; i++)
print(x.length); // segfault near 0
}
f();
TraceRecorder puts a small integer, argc, on the stack, but the interpreter puts an object on the stack. Hilarity ensues.
Group: core-security
| Assignee | ||
Comment 17•16 years ago
|
||
Filed bug 539766 for somewhat-related shenanigans involving arguments and Object.defineProperty.
| Assignee | ||
Comment 18•16 years ago
|
||
Attachment #421700 -
Flags: review?(dmandelin)
Comment 19•16 years ago
|
||
(In reply to comment #18)
> Created an attachment (id=421700) [details]
> v1
I have questions about one feature: with this patch, for JSOP_ARGCNT, we record LIR that guards that the length has not changed and then returns the constant fp->argc, while for JSOP_LENGTH on |argsobj|, we record LIR that guards that the length has not changed and then returns the length out of the argsobj. Should they be different? Shouldn't they just both return the length out of the argsobj? And if so, do we need the guard, or can we just read then length out and return that?
It looks correct, though. The comments above just relate to how much we can stay on trace.
Comment 20•16 years ago
|
||
(In reply to comment #16)
> Note that this is not just a correctness bug. In bz's test case, argc is an int
> and arguments.length is an int. If instead arguments.length is a different
> type:
Are we any closer to understanding the potential web compatibility effect here, or when the regression appeared in betas? Our crash data for Firefox 3.6rc1 is actually quite encouraging, and we haven't had reports of major websites not functioning properly, so I'm wondering if we may have a correctness issue that affects things which - so far - the vast majority of websites aren't hitting?
Comment 21•16 years ago
|
||
(In reply to comment #20)
> (In reply to comment #16)
> > Note that this is not just a correctness bug. In bz's test case, argc is an int
> > and arguments.length is an int. If instead arguments.length is a different
> > type:
>
> Are we any closer to understanding the potential web compatibility effect here,
> or when the regression appeared in betas? Our crash data for Firefox 3.6rc1 is
> actually quite encouraging, and we haven't had reports of major websites not
> functioning properly, so I'm wondering if we may have a correctness issue that
> affects things which - so far - the vast majority of websites aren't hitting?
This regression must have been introduced by one of the patches for tracing using the |arguments| keyword, most likely the first.
The bug is that we can go wrong if a use of |arguments| is traced, the JS program modifies the |arguments| object in a way that changes its length, and then the program accesses the |arguments| length indirectly (i.e., args=arguments; args.length). That code makes me very sad and is probably not used by most pages, *but* IME chasing down various regressions, advanced library code (e.g. jQuery) uses all kinds of weird things.
Updated•16 years ago
|
Attachment #421700 -
Flags: review?(dmandelin) → review+
| Assignee | ||
Comment 22•16 years ago
|
||
(In reply to comment #20)
> Are we any closer to understanding the potential web compatibility effect here,
> or when the regression appeared in betas? Our crash data for Firefox 3.6rc1 is
> actually quite encouraging, and we haven't had reports of major websites not
> functioning properly, so I'm wondering if we may have a correctness issue that
> affects things which - so far - the vast majority of websites aren't hitting?
Maybe, but we discovered this because a Web page triggered it in the wild. The particular idiom that triggers the bug is reasonable, even if it does make dmandelin sad. I think the potential web compatibility effect is significant.
It's also a crasher.
Comment 23•16 years ago
|
||
(In reply to comment #22)
> (In reply to comment #20)
> > Are we any closer to understanding the potential web compatibility effect
> > here, or when the regression appeared in betas? Our crash data for Firefox
> > 3.6rc1 is actually quite encouraging, and we haven't had reports of major
> > websites not functioning properly, so I'm wondering if we may have a
> > correctness issue that affects things which - so far - the vast majority of
> > websites aren't hitting?
>
> Maybe, but we discovered this because a Web page triggered it in the wild.
> The particular idiom that triggers the bug is reasonable, even if it does
> make dmandelin sad. I think the potential web compatibility effect is
> significant.
> It's also a crasher.
I agree. I guess I couldn't resist editorializing (and I still don't think it's reasonable :-D ), but I tried to say at the end that I think there is a high risk that big important stuff like jQuery or FB could trigger this bug.
| Assignee | ||
Comment 24•16 years ago
|
||
(In reply to comment #19)
> (In reply to comment #18)
> I have questions about one feature: with this patch, for JSOP_ARGCNT, we record
> LIR that guards that the length has not changed and then returns the constant
> fp->argc, while for JSOP_LENGTH on |argsobj|, we record LIR that guards that
> the length has not changed and then returns the length out of the argsobj.
> Should they be different? Shouldn't they just both return the length out of the
> argsobj?
We discussed on IRC, but for the record:
In the case of JSOP_ARGCNT, we know that we're talking about the arguments object for the current frame. In JSOP_LENGTH we have only guarded enough to know that it's an arguments object.
Since ARGCNT *can* safely use argc here, I figured it should, since that might enable some constant folding or eliminate a data dependency.
> And if so, do we need the guard, or can we just read then length out
> and return that?
No: if the bit is set, arguments.length is now just a regular property, and the value in that slot is no longer relevant.
| Assignee | ||
Comment 25•16 years ago
|
||
Whiteboard: fixed-in-tracemonkey
| Reporter | ||
Comment 26•16 years ago
|
||
For what it's worth, a number of the codepaths in the Prototype library mess with the arguments object; I haven't looked at in a few days, but I'm pretty sure some of those can trigger this bug. Of course it might not always immediately break on all sites. Depends on what the arguments were and what the site is doing.
Comment 27•16 years ago
|
||
Added another test case to make the release safer:
http://hg.mozilla.org/tracemonkey/rev/a39e0b557864
Comment 28•16 years ago
|
||
I tested the |arguments| microbenchmark in the hacks article with a fresh TM build that includes this patch. Performance was not affected.
Comment 29•16 years ago
|
||
(In reply to comment #21)
> This regression must have been introduced by one of the patches for tracing
> using the |arguments| keyword, most likely the first.
Do we know when those patches were merged over to mozilla-1.9.2? I'm perhaps thinking about this wrong, but we've had about 400,000+ ADUs since Beta 3, and are now at about 1M ADUs. So far, aside from the r-tree benchmark page, I'm not aware of us having linked this problem to a page in the wild.
Allow me to be clear: I agree that this is a bad bug, and one that needs to be fixed. What I'm not sure of is whether it is a bug that we should fix before shipping 3.6.0 as opposed to fixing (along with other bad bugs, like not tracing web workers) in 3.6.1 in four to six weeks.
> args=arguments; args.length). That code makes me very sad and is probably not
> used by most pages, *but* IME chasing down various regressions, advanced
> library code (e.g. jQuery) uses all kinds of weird things.
Do we know of additional pages that regress due to this bug other than the r-tree benchmark? Again, I'm not trying to be difficult, I'm trying to make sure I understand the actual effect on users.
Comment 30•16 years ago
|
||
bz helped clear some of the questions up in IRC:
- the original problem has existed since 1.9.2a1
- it was exacerbated first in beta 1 by bug 505591
- it was further exacerbated in beta 3 by bug 525028
Comment 31•16 years ago
|
||
I don't see any arguments.length (or a.length for a possibly an arguments object) setting in prototype-1.6.0.2, FWIW.
/be
Comment 32•16 years ago
|
||
a1921=beltzner
Whoever sees this first, olease land this on mozilla-1.9.2 ASAP so we can get some testing cycles on it. If we decide to do a 2nd RC for this, we'll then land it on the relbranch.
Comment 33•16 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/6c9304d3aa15
http://hg.mozilla.org/mozilla-central/rev/269b08fc3508
Status: NEW → RESOLVED
Closed: 16 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2
Comment 34•16 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/19477e5bfa1b
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/2b86bfc51d3b
status1.9.2:
--- → final-fixed
Updated•16 years ago
|
Target Milestone: mozilla1.9.2 → mozilla1.9.3a1
Comment 35•16 years ago
|
||
This appears to be burning the 3.6 tree down.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1263541404.1263542720.10358.gz&fulltext=1
e:/builds/moz2_slave/mozilla-1.9.2-win32/build/js/src/jstracer.cpp(14027) : error C3861: 'RETURN_STOP_A': identifier not found
Comment 36•16 years ago
|
||
(In reply to comment #35)
> This appears to be burning the 3.6 tree down.
>
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1263541404.1263542720.10358.gz&fulltext=1
>
> e:/builds/moz2_slave/mozilla-1.9.2-win32/build/js/src/jstracer.cpp(14027) :
> error C3861: 'RETURN_STOP_A': identifier not found
I had to hand-patch 1.9.2 because of various function and type name differences between trunk and 1.9.2, but I guess I missed that macro difference. s/RETURN_STOP_A/ABORT_TRACE/ to fix.
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/08a91a7b080a
Comment 37•16 years ago
|
||
Are we sure that fixed it? While I thank Reed for jumping on top of things, I actually assumed that the patch was already built to work with mozilla-1.9.2 and am now worried that this isn't the case.
Things are looking good, but I'd also like a confirmation from someone from the JS team or Shaver.
Comment 38•16 years ago
|
||
(In reply to comment #37)
> Are we sure that fixed it? While I thank Reed for jumping on top of things, I
> actually assumed that the patch was already built to work with mozilla-1.9.2
> and am now worried that this isn't the case.
Reed's fix is the right one, but never-compiled patches are never good.
| Assignee | ||
Comment 39•16 years ago
|
||
Yep, Reed's fix looks good. Thanks, Reed.
Updated•16 years ago
|
status1.9.1:
--- → unaffected
Whiteboard: fixed-in-tracemonkey → [sg:critical?] fixed-in-tracemonkey
Comment 40•16 years ago
|
||
Based on local backout this caused bug 540404 (which is a blocker at least to my
development work).
Updated•16 years ago
|
Flags: in-testsuite?
Comment 41•16 years ago
|
||
Who on this bug or with privs to view this bug was responsible for the description being posted here: http://ffextensionguru.wordpress.com/2010/01/17/firefox-3-6rc2-released/ ?
Comment 42•16 years ago
|
||
Ah, he got it out of the HG log, I suspect.
Updated•16 years ago
|
Group: core-security
Updated•13 years ago
|
Flags: in-testsuite?
You need to log in
before you can comment on or make changes to this bug.
Description
•