Closed Bug 818023 Opened 11 years ago Closed 11 years ago

IonMonkey: function.caller does not always work (and in interpreter and JM)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21
blocking-b2g tef+
Tracking Status
firefox18 + verified
firefox19 + verified
firefox20 + verified
firefox21 --- verified
b2g18 --- fixed
b2g18-v1.0.0 --- fixed

People

(Reporter: jandem, Assigned: nbp)

References

Details

Attachments

(1 file)

My patch to add support for GETPROP to the baseline compiler introduces a new jit-test failure. However, after modifying the testcase a bit I can reproduce the problem with IonMonkey.

The following test fails with --ion-eager and passes without Ion:
---
Function.prototype.callX = Function.prototype.call;
function f() {
    return f.caller;
}
function g() {
    return f.callX(null);
}
assertEq(g(), g);
---

$./js --ion-eager test.js
test.js:8:0 Error: Assertion failed: got function call() {
test.js:8:0     [native code]
test.js:8:0 }, expected function g() {
test.js:8:0     return f.callX(null);
test.js:8:0 }

The following test is a bit more complicated, it tries to confuse TI so that it's pretty similar to what the baseline compiler does:
---
Function.prototype.callX = Function.prototype.call;
var foo = {callX: function() { return 3; }};
function bar() {
    return f.caller;
}
function g() {
    return f.callX(null);
}
f = foo;
assertEq(g(), 3);
f = bar;
assertEq(g(), g);
assertEq(g(), g);
---

$ ./js --ion-eager test.js
test.js:13:0 Error: Assertion failed: got null, expected function g() {
test.js:13:0     return f.callX(null);
test.js:13:0 }
(In reply to Jan de Mooij [:jandem] from comment #0)
> My patch to add support for GETPROP to the baseline compiler introduces a
> new jit-test failure. However, after modifying the testcase a bit I can
> reproduce the problem with IonMonkey.
> 
> The following test fails with --ion-eager and passes without Ion:
> ---
> Function.prototype.callX = Function.prototype.call;
> function f() {
>     return f.caller;
> }
> function g() {
>     return f.callX(null);
> }
> assertEq(g(), g);
> ---
> 
> $./js --ion-eager test.js
> test.js:8:0 Error: Assertion failed: got function call() {
> test.js:8:0     [native code]
> test.js:8:0 }, expected function g() {
> test.js:8:0     return f.callX(null);
> test.js:8:0 }

It looks like the expected behavior from my point of view.  We are calling the call primitive, which is then calling the f function.  Still Ion is breaking this when we inline the call or the funapply functions.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #1)
> (In reply to Jan de Mooij [:jandem] from comment #0)
> It looks like the expected behavior from my point of view.  We are calling
> the call primitive, which is then calling the f function.

You'd think that, and back in the day that would have been correct, when native functions got call stack entries (when there were slow natives, and these things were all slow natives).  Nowadays call and all those are fast natives -- natives, even, now -- so they shouldn't show up in stacks.  Or at least that's what I remember; someone else can point out if I'm wrong.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #1)
> 
> It looks like the expected behavior from my point of view.  We are calling
> the call primitive, which is then calling the f function.

But that's not how it behaves when running in the interpreter or JM (and also not what V8 and probably other engines do). Right now, there are at least three possible results for fun.caller when running under Ion, depending on type information and what we inline:

(1) The scripted function, when inlining the function.call. This matches the rest of the engine + V8.

(2) The Function.prototype.call function if we use LCallKnown (call has a single target).

(3) null if we don't have good type information and use LCallGeneric.

I don't think we should regress this. It may not be common with Ion because it inlines |call| in most cases, but the baseline compiler always hits (3) and it breaks tests.
Blocks: 828319
Looks like there's a good chance that we did in fact regress this and that it breaks websites.  :(
(In reply to Boris Zbarsky (:bz) from comment #4)
> Looks like there's a good chance that we did in fact regress this and that
> it breaks websites.  :(

https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Function/caller
Assignee: general → nicolas.b.pierron
Status: NEW → ASSIGNED
This patch filters out native & self-hosted frames iterated by StackIter in fun_getProperty.

https://tbpl.mozilla.org/?tree=Try&rev=e1aa7ea64281
Attachment #700089 - Flags: review?(jwalden+bmo)
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

Review of attachment 700089 [details] [diff] [review]:
-----------------------------------------------------------------

This matches up with my comment 2 understanding.
Attachment #700089 - Flags: review?(jwalden+bmo) → review+
(In reply to Boris Zbarsky (:bz) from comment #4)
> Looks like there's a good chance that we did in fact regress this and that
> it breaks websites.  :(

Do we have examples of any major web-site breakages here & if the patch has fixed them ?
(In reply to bhavana bajaj [:bajaj] from comment #8)
> (In reply to Boris Zbarsky (:bz) from comment #4)
> > Looks like there's a good chance that we did in fact regress this and that
> > it breaks websites.  :(
> 
> Do we have examples of any major web-site breakages here & if the patch has
> fixed them ?

Yes, on Bug 828319, the indexed forum suggest looking at http://adhoc.moo.com/ionmonkey-regression-test.html and to follow instructions.
Tracking this for release . We are willing to consider an uplift for a possible 18.0.1 if it happens , based on the risk profile/testing of this patch and considering the user feedback/input on any major website breakages due to this.
(In reply to bhavana bajaj [:bajaj] from comment #11)
> Tracking this for release . We are willing to consider an uplift for a
> possible 18.0.1 if it happens.

The uplift patch cannot use “NonBuiltinScriptFrameIter”, but it should use “ScriptFrameIter” as the non-builint version as been added with the addition of self-hosting, and AFAIK self-hosting was not yet present in the 18.
https://hg.mozilla.org/mozilla-central/rev/67f60ef5c92f
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Backed out for causing bug 829512:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Rev3 WINNT 6.1 mozilla-inbound debug test mochitest-1&tochange=67f60ef5c92f&fromchange=31d8d897cf4b

https://hg.mozilla.org/integration/mozilla-inbound/rev/b60130e11a59
Status: RESOLVED → REOPENED
Depends on: 829512
Resolution: FIXED → ---
Target Milestone: mozilla21 → ---
(In reply to bhavana bajaj [:bajaj] from comment #11)
> Tracking this for release . We are willing to consider an uplift for a
> possible 18.0.1 if it happens , based on the risk profile/testing of this
> patch and considering the user feedback/input on any major website breakages
> due to this.

+1 for release.

Well, yesterday, one of the first Firefox 18 customers/users of our app (a big JS app) just got errors. Anyway, this was already noticed [1] by some devs and Sencha itself (and linked back to this issue, actually). There is currently no solution except turning of IonMonkey or switching to another browser (including downgrading which won't fit with auto update). 

I guess because of the nature of JIT, several errors outside will be still hidden because tracking them down is not easy for the support level.

Slighty OT, but..
Because I'm not familiar with your build setup and integration: How can I/we as developers get a (nightly) build/release in which the patch is available (if it is ready to test)? Will a robot comment this bug with a build number? Or do you have an issues-fixed-by-this-release map? I do know where your nightlies are, but I don't know the state of integration of a specific issue and a specific build.

[1]: http://www.sencha.com/forum/showthread.php?253345
(In reply to knalli from comment #17)
> Slighty OT, but..
> Because I'm not familiar with your build setup and integration: How can I/we
> as developers get a (nightly) build/release in which the patch is available
> (if it is ready to test)?

The change has since been backed out for test regressions (it will be in today's Nightly, but not after that, once the backout merges in later today), however you can test a build with it at:

Win32:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1357912691/firefox-21.0a1.en-US.win32.zip

OSX:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1357912691/firefox-21.0a1.en-US.mac.dmg

Linux32:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1357912691/firefox-21.0a1.en-US.linux-i686.tar.bz2

Linux64:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1357912691/firefox-21.0a1.en-US.linux-x86_64.tar.bz2

Jeff is looking into the test failure, once they are resolved, it will reland, bug will be marked fixed once it merge to mozilla-central & then the next Nightly after that will include the fix.
s/Jeff/Jeff & Nicolas/
I was looking in the hopes I'd see an obvious issue, so we wouldn't have to ship a fix in nightlies, then not ship it.  But I didn't see anything obvious (or if there is something, it's in Mochitest harness code), so I returned to my own things.  :-(  Hopefully Nicolas will find something on deeper examination than simply test-reading.
Fwiw, I've just triggered a half dozen more tests on the push where this landed and the one before, just to be sure.
I am still trying to reproduce it locally, but just by reading the test and the harness gives no clues where fun_getProperty might be called from.  Apparently the bug appeared on Bug 827490 and become more frequent with this patch.

By looking at the test case I cannot see any array mixed with slots, the only thing suspicious I can notice is:

  new Array(10000).join(" -huge WebSocket message- ")
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #22)
> I am still trying to reproduce it locally, but just by reading the test and
> the harness gives no clues where fun_getProperty might be called from. 
> Apparently the bug appeared on Bug 827490 and become more frequent with this
> patch.
> 
> By looking at the test case I cannot see any array mixed with slots, the
> only thing suspicious I can notice is:
> 
>   new Array(10000).join(" -huge WebSocket message- ")

Hum, this cannot be this line as this line appear in the 4th test, and the reported timeout is in the 3rd.
I don't understand how this modification might have increase in any way the likely hood of this failure.  The function fun_getProperty is never called during the execution of "content/base/test/test_websocket_basic.html".

I suspect this error might be related to an allocation issue caused by one of the previous tests.

I tried to run valgrind on this test case, but without success, apparently the --disable-jsmalloc does not seems to work as I still get failures in jemalloc.
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

nbp in person requests some fuzzing on this. :)
Attachment #700089 - Flags: feedback?(gary)
Attachment #700089 - Flags: feedback?(choller)
I realize this isn't adding much information, but we've got a big ExtJS based application on a large site (alexa top 500). Since the release of Firefox 18, we've had floods of exceptions being reported that seem impossible. A function's .caller is suddenly NULL when it worked on previous identical invocations. I'm unable to produce a small test case of this since it's happening so intermittently, but wide spread enough that it's now our #1 reported bug.
Same as Kevin MFC here. We have a big ExtJS application and our software is now unusable with the latest firefox version. Please fix this in 18.0.1 !!!
(In reply to Kevin MFC from comment #27)
> I'm unable to produce a small test case of this since
> it's happening so intermittently, but wide spread enough that it's now our
> #1 reported bug.

Thanks for reporting that you want this feature fixed, the patch already exists (attached to this bug) but fails in our test suite with a reasons which still have to be determined.  Until we can find out an explanation and a fix for this other reason, the patch cannot be applied to any branch, and thus it cannot be back-ported to 18.

Believe me, we are working on it to reproduce and tackle this other failure such as this patch can make it for the next update.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #29)
> Thanks for reporting that you want this feature fixed, the patch already
> exists (attached to this bug) but fails in our test suite with a reasons
> which still have to be determined.  Until we can find out an explanation and
> a fix for this other reason, the patch cannot be applied to any branch, and
> thus it cannot be back-ported to 18.
> 
> Believe me, we are working on it to reproduce and tackle this other failure
> such as this patch can make it for the next update.

Sorry, I probably phrased that badly. I was just trying to make sure it was clear how much this was affecting production sites. I know you guys are working on it. :)
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #22)
> Apparently the bug appeared on Bug 827490 and become more frequent with this
> patch.

I started a try run[1] on top the landed patch in inbound[2] and did the same as Ed to check if the bug still appeared when I revert Bug 827490 which is the previous patch[3] on which this patch[2] has landed on.

My conclusion is that Bug 827490 caused the original failures in an infrequent manner[3], then this Bug landed just on top[2] of it and made the same error appear extremely more frequently, which caused this patch to be backed out (comment 16).  Testing without Bug 827490 patch makes this error disappear on try[1], which should sustain the idea that this patch is not the source of the error reported in Bug 829512.  Thus we should back out Bug 827490, and land this patch again(*).

[1] https://tbpl.mozilla.org/?tree=Try&rev=685ef9795250
[2] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=67f60ef5c92f
[3] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=f4671ccc4502

(*) except if one of the remaining try runs reports the *same error* on test_websocket_basic from mochitest-1.
> My conclusion is that Bug 827490 caused the original failures in an
> infrequent manner[3], then this Bug landed just on top[2] of it and made the
> same error appear extremely more frequently, which caused this patch to be
> backed out (comment 16).  Testing without Bug 827490 patch makes this error
> disappear on try[1], which should sustain the idea that this patch is not
> the source of the error reported in Bug 829512.  Thus we should back out Bug
> 827490, and land this patch again(*).

bhackett patched bug 827490. Moreover the fuzzers found some regressions from that bug which just landed on inbound (but not on m-c as of time of writing this comment).

We could try relanding after those fixes make it to m-c and when the fuzzers report that everything is ok from this patch.
I've rebase this change to see if fixes related to Bug 827409 are fixing this perma-orange on try:

https://tbpl.mozilla.org/?tree=Try&rev=e4171782cde1

If it does, we should land this patch again and think about back-porting it.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #33)
> I've rebase this change to see if fixes related to Bug 827409 are fixing
> this perma-orange on try:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=e4171782cde1
> 
> If it does, we should land this patch again and think about back-porting it.

Apparently, this error is not fixed as https://tbpl.mozilla.org/?rev=c8bc9da133f5&tree=Mozilla-Inbound still have the same failure.
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

jsfunfuzz didn't find anything particularly wrong with this patch after a few hours' of fuzzing.
Attachment #700089 - Flags: feedback?(gary) → feedback+
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

No patch-specific problems found :)
Attachment #700089 - Flags: feedback?(choller) → feedback+
Would be great to have this fixed ASAP. Breaks our ExtJS application and lots of others. Firefox is the browser that we promote to use it (honestly - I'm not trolling here). I'll try the fixes listed in this thread first http://www.sencha.com/forum/showthread.php?253345-FF-18-problem before we have to make a temp switch to (gulp) another browser. Thanks.
Depends on: 830920
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a10e0de8a48

This is the same patch but Bug 830290 landed before, so this should now be fine for this patch.
I don't expect any trouble for to backport this patch, otherwise we might need Bug 830290's patch (possibly on aurora, but unlikely on beta and release).
Status: REOPENED → ASSIGNED
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

[Approval Request Comment]
Regression caused by (bug #): Bug 768446
User impact if declined: fun.caller does not work as expected compared to 
previous versions. Impact ExtJS users.
Testing completed (on m-c, etc.): https://hg.mozilla.org/integration/mozilla-inbound/rev/0a10e0de8a48 & decoder+ & gkw+
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: N/A
Attachment #700089 - Flags: approval-mozilla-release?
Attachment #700089 - Flags: approval-mozilla-beta?
Attachment #700089 - Flags: approval-mozilla-b2g18?
Attachment #700089 - Flags: approval-mozilla-aurora?
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #39)
> Comment on attachment 700089 [details] [diff] [review]
> fun_getProperty: Use non-buitin script iterator.
> 
> [Approval Request Comment]
> Regression caused by (bug #): Bug 768446
> User impact if declined: fun.caller does not work as expected compared to 
> previous versions. Impact ExtJS users.
> Testing completed (on m-c, etc.):
> https://hg.mozilla.org/integration/mozilla-inbound/rev/0a10e0de8a48 &
> decoder+ & gkw+
> Risk to taking this patch (and alternatives if risky): None
> String or UUID changes made by this patch: N/A

nbp : can you please explain a bit more on why would this be a zero risk patch & also ensuring that the patch has indeed fixed the issue.We are slightly worried about the urgent uplift needed here if we take this in 18.0.1 vs fallout regressions it may have because of the limited testing bake-time here.
(In reply to bhavana bajaj [:bajaj] from comment #40)
> (In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #39)
> > Comment on attachment 700089 [details] [diff] [review]
> > fun_getProperty: Use non-buitin script iterator.
> > 
> > [Approval Request Comment]
> > Regression caused by (bug #): Bug 768446
> > User impact if declined: fun.caller does not work as expected compared to 
> > previous versions. Impact ExtJS users.
> > Testing completed (on m-c, etc.):
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/0a10e0de8a48 &
> > decoder+ & gkw+
> > Risk to taking this patch (and alternatives if risky): None
> > String or UUID changes made by this patch: N/A
> 
> nbp : can you please explain a bit more on why would this be a zero risk
> patch

This is a zero risk, because this restore what was the previous behavior expected for the script.  The only risk (if we can consider it as one) is that people are starting to get used to the bad behavior, which might be worse than fixing it.

> also ensuring that the patch has indeed fixed the issue.

This patch comes with its own test case, which was failing before this modification and was working in fx12, as well as after this modification.  So the test added here is effective and proved that the behavior is correct again.

> We are
> slightly worried about the urgent uplift needed here if we take this in
> 18.0.1 vs fallout regressions it may have because of the limited testing
> bake-time here.

I agree, but all person I asked are confident with this small change and even suggest me to back-port it before landing to inbound.
Jus(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #12)
> (In reply to bhavana bajaj [:bajaj] from comment #11)
> > Tracking this for release . We are willing to consider an uplift for a
> > possible 18.0.1 if it happens.
> 
> The uplift patch cannot use “NonBuiltinScriptFrameIter”, but it should use
> “ScriptFrameIter” as the non-builint version as been added with the addition
> of self-hosting, and AFAIK self-hosting was not yet present in the 18.

Correction, the class in question has already made it to fx18, so the current patch should work in all branches.  All the implementation related to this class is located in js/src/vm/Stack.h (bottom of the file) and differ from StackIter (previous line in fun_getProperty) by the settle function which is skipping non-observable frames.
Comment on attachment 700089 [details] [diff] [review]
fun_getProperty: Use non-buitin script iterator.

Approving on all branches, considering we found a low risk for a bug breaking alexa websites & also will be helpful to all the users who have been reaching out about this issue.

Please land on mozilla-release/beta ASAP to make it into 18.0.1/19.0b2 .
Attachment #700089 - Flags: approval-mozilla-release?
Attachment #700089 - Flags: approval-mozilla-release+
Attachment #700089 - Flags: approval-mozilla-beta?
Attachment #700089 - Flags: approval-mozilla-beta+
Attachment #700089 - Flags: approval-mozilla-b2g18?
Attachment #700089 - Flags: approval-mozilla-b2g18+
Attachment #700089 - Flags: approval-mozilla-aurora?
Attachment #700089 - Flags: approval-mozilla-aurora+
Tracking it for blocking-b2g : tef+ as this patch was critical enough to go into 18.0.1 & hence would be good to land in b2g as well
blocking-b2g: --- → tef+
IonMonkey is disabled in the default b2g config for v1.
blocking-b2g: tef+ → -
(In reply to Chris Jones [:cjones] [:warhammer] from comment #46)
> IonMonkey is disabled in the default b2g config for v1.

And this also affect the Interpreter and JM.  If you have any doubt, you can try the test case added in this patch.
blocking-b2g: - → tef?
blocking-b2g: tef? → tef+
Summary: IonMonkey: function.caller does not always work → IonMonkey: function.caller does not always work (and in interpreter and JM)
https://hg.mozilla.org/mozilla-central/rev/0a10e0de8a48
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
(In reply to Kevin MFC from comment #27)
> I realize this isn't adding much information, but we've got a big ExtJS
> based application on a large site (alexa top 500). Since the release of
> Firefox 18, we've had floods of exceptions being reported that seem
> impossible. A function's .caller is suddenly NULL when it worked on previous
> identical invocations. I'm unable to produce a small test case of this since
> it's happening so intermittently, but wide spread enough that it's now our
> #1 reported bug.

Hi, can you get our latest beta and help us verify the issue is resolved for you .You ccan get the beta from ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/19.0b2-candidates/build1/ .Thanks !
(In reply to bhavana bajaj [:bajaj] from comment #50)
> Hi, can you get our latest beta and help us verify the issue is resolved for
> you .You ccan get the beta from
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/19.0b2-candidates/
> build1/ .Thanks !


It does appear to be fixed in that build. I'll do some further testing to confirm, but my quick tests are working where they don't in the last release.
Can i get an update on when this fix will be released? We use ExtJS and we have received lot of reports from our customers that the application is unusable because of this issue. 

Thanks
(In reply to Srikant from comment #53)
> Can i get an update on when this fix will be released? We use ExtJS and we
> have received lot of reports from our customers that the application is
> unusable because of this issue. 
> 
> Thanks

Hello, our Fx 19.0b2 released today has a fix to this bug already(can you confirm that resolves your issue ? ) & we will be releasing a 18.0.1(targetting tomorrow) for our Release channel users with this fix.
I'd say, both 19b02 and 21.0a1 (2013-01-11) working as expected. Using a special test case and our affected (bigger) webapp w/ Ext JS, I can confirm that this issue was resolved.
Can someone confirm that this really did make it into 18.0.1? I don't see this bug listed here:

http://www.mozilla.org/en-US/firefox/18.0.1/releasenotes/buglist.html
Looks like it per comment 45 (note the mozilla-release changeset)
My test case no longer fails in 18.0.1 so I'd also say that it's fixed.
Depends on: 833414
I reproduced the problem in nightly 2012-11-17.
Verified fixed in FF 19b3 using the testcases in comment 0 and https://bugzilla.mozilla.org/show_bug.cgi?id=811577#c0.
Verified fixed in FF 18 based on comment 58.
Landed on mozilla-b2g18/gaia master prior to the 1/25 branching to mozilla-b2g18_v1_0_0/v1.0.0, updating status-b2g-v1.0.0 to fixed.
Verified fixed jsshell-linux-i686, jsshell-win32, jsshell-mac FF 20 RC.
Verified fixed jsshell-win32 FF 21b7
You need to log in before you can comment on or make changes to this bug.