Closed
Bug 1167883
Opened 9 years ago
Closed 9 years ago
Twitter.com reply field fails to open due to stack overflow
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla42
People
(Reporter: kael, Assigned: jandem, NeedInfo)
References
()
Details
Attachments
(3 files)
82.53 KB,
image/png
|
Details | |
2.06 KB,
patch
|
nbp
:
review+
kglazko
:
approval-mozilla-aurora+
kglazko
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
163 bytes,
text/html
|
Details |
For the past week (or more), the reply box on twitter.com has been failing to open around 90% of the time for me. Looking in the JS console shows that it's overflowing the stack. It doesn't happen 100% of the time, but it's quite consistent. I've tried repeatedly and been unable to reproduce this in x86 builds of FF Aurora (39.0a2 (2015-04-19) and 40.0a2 (2015-05-23) both seem fine); it only seems to happen in x64 builds. My guess is this is more of an evangelism issue than a JS engine one - they probably shipped some completely broken JS - but that x64 builds are worse off here because pointers are larger on the stack? My simple reproduction steps are to log in, visit the test URL, and click in the reply box. In an x64 build it almost always fails, and fails repeatedly. I've yet to get it to fail in an x86 build.
Reporter | ||
Comment 1•9 years ago
|
||
This page: https://twitter.com/antumbral/status/602307485416820736 Seems to consistently trigger the stack overflow right as it loads in x86-64 but not x86. You can tell it happened because the inline GIFs are broken.
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 2•9 years ago
|
||
I can't repro this on OS X. Stack limits are very platform dependent though, will try it on Windows later.
Reporter | ||
Comment 3•9 years ago
|
||
Twitter pushed some code changes a little while back that fixed some other Firefox bugs (like image upload UI being totally busted), and this issue went away for a day or two. But now it's back again. :/
Reporter | ||
Comment 4•9 years ago
|
||
My repro URL doesn't appear to work anymore because they totally changed how that page is laid out (the other GIFs don't autoload). My other main repro step does still work but is less consistent (click the reply arrow when viewing a tweet page)
Reporter | ||
Comment 5•9 years ago
|
||
I'm getting similar stack overflow errors when trying to save changes to calendar events in Google Calendar now. Should I file a separate issue or should I expand this into a general issue about too much recursion issues in win64 Firefox?
Reporter | ||
Comment 6•9 years ago
|
||
Did some manual testing of the calendar issue, looks similar. Decided to write it up, and tested out a workaround: Open an existing google calendar event, change the start and/or end time, hit save Results: broken (silent failure, 'too much recursion' in console when clicking save) 40.0a2 (2015-06-08) x64 40.0a2 (2015-06-09) x64 works 40.0a2 (2015-06-09) Workaround: editbin.exe .\firefox.exe /STACK:4194304
Assignee | ||
Comment 7•9 years ago
|
||
Katelyn, thanks for the info. I can get you a Try build that logs the JS stack when we hit an overrecursion. That way we can see how many frames there are actually on the stack. I assume this also affects Nightly?
Reporter | ||
Comment 8•9 years ago
|
||
Haven't tried x64 nightly yet, I'll make a note to do that.
Assignee | ||
Comment 9•9 years ago
|
||
Here's an Aurora Win64 build that should log the JS stack to stderr when we overrecurse: https://treeherder.mozilla.org/#/jobs?repo=try&revision=51506e427bf4 http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jandemooij@gmail.com-51506e427bf4/try-win64/ If you don't have time to try it I can also do it soon :)
Reporter | ||
Comment 10•9 years ago
|
||
That try build doesn't run out of stack >_< It also doesn't seem to log anything to stdout or stderr when I run it from cmd.exe, though.
Reporter | ||
Comment 11•9 years ago
|
||
Does this help at all? I'll see if I can dump a JS stacktrace from the debugger, but I remember it being prohibitively difficult...
Comment 12•9 years ago
|
||
If the try build is an opt build, you should be able to get commandline output by launching it with --attach-console from cmd.exe or --console otherwise.
Assignee | ||
Comment 13•9 years ago
|
||
(In reply to K. Gadd (:kael) from comment #10) > That try build doesn't run out of stack >_< Ugh, we probably need a PGO build, I'll trigger one. (In reply to K. Gadd (:kael) from comment #11) > Created attachment 8620744 [details] > Native stack trace during google calendar stack overflow > > Does this help at all? Hm that looks pretty worst-case: interpreter -> fun.apply -> bound function -> parser to delazify. Each of those eat stack space. Unfortunately we don't see the whole stack due to JIT code. > I'll see if I can dump a JS stacktrace from the debugger, but I remember it > being prohibitively difficult... You could try to call DumpJSStack() or js::DumpBacktrace(cx), but not sure offhand if they work in opt builds.
Assignee | ||
Comment 14•9 years ago
|
||
Win64 Aurora PGO build, this will easily take 4-5 hours: https://treeherder.mozilla.org/#/jobs?repo=try&revision=38a8b8a65b32
Assignee | ||
Comment 15•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #14) > Win64 Aurora PGO build, this will easily take 4-5 hours: Bah, it hit a weird hang or timeout and the build took 5.5 hours :( I just retriggered it: the only difference with the previous push is that I enabled PGO so I doubt it's a problem with the patch.
Reporter | ||
Comment 16•9 years ago
|
||
How do I actually download that build? It shows as 'complete' but the ftp folder just has two huge .txt files in it.
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to K. Gadd (:kael) from comment #16) > How do I actually download that build? It shows as 'complete' but the ftp > folder just has two huge .txt files in it. Well this one failed too, in a different way. I just did two more retriggers; two so that we get at least one different slave, in case it's something with that machine.
Reporter | ||
Comment 18•9 years ago
|
||
Did you ever get a try build for this? It's reproducing on Twitter again.
Reporter | ||
Comment 19•9 years ago
|
||
Is there a way I can do an instrumented PGO build myself? At present a handful of sites (ones important to me, unfortunately) are broken in x64 firefox, and a handful are broken in x86 firefox (address space exhaustion/compositor issues, which I also filed bugs about ages ago), so this kinda leaves me with no reasonable option other than switching over to another browser or finding some labyrinthine way of running both builds of FF side-by-side...
Assignee | ||
Comment 20•9 years ago
|
||
Sorry for the slow response due to the work week and PGO. Is this also broken with the new Aurora release, 41? I just updated my Aurora tree and triggered a new instrumented Win64 PGO build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebd81d3a3722 If this fails again I'm going to open a RelEng bug; it's silly if we can't do PGO builds on Try.
Reporter | ||
Comment 21•9 years ago
|
||
It comes and goes intermittently but I still hit this pretty often on Twitter and Google Calendar. Once it starts happening I see it for at least a day or so (presumably until they push new code or the phases of the moon change)
Assignee | ||
Comment 22•9 years ago
|
||
This now also affects our tests, bug 1180948.
Assignee | ||
Comment 23•9 years ago
|
||
So I was finally able to reproduce this locally with my own build. When we hit the overrecursion error there aren't a lot of JS frames on the stack, but there are some bound function calls and those eat a lot of stack space. This is bug 1093158. What makes this even worse here is that the Invoke + RunScript stack frames we have for each bound function invocation eat ~6.2 KB stack space because PGO inlines a ton of stuff in them. I'm now doing another build with a patch to disable PGO in Invoke* and RunScript, let's see if that helps.
Depends on: 1093158
Assignee | ||
Comment 24•9 years ago
|
||
Two small fixes: (1) Keep MSVC from optimizing RunScript too much. PGO was inlining a lot of code in it (code to enter the JITs etc) and RunScript's stack frame became > 4 KB. (2) Use cx->currentScript instead of FrameIter in Invoke. On x64, FrameIter is > 1.5 KB (also bug 1181558), so Invoke's stack frame was > 2 KB.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8631020 -
Flags: review?(nicolas.b.pierron)
Comment 25•9 years ago
|
||
Comment on attachment 8631020 [details] [diff] [review] Patch Review of attachment 8631020 [details] [diff] [review]: ----------------------------------------------------------------- This sounds good. Out of curiosity, can you report the result returned by js/xpconnect/tests/chrome/test_bug732665_meta.js, before and after this patch. I feel that this might have a big impact on our recursion depth.
Attachment #8631020 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 26•9 years ago
|
||
> Out of curiosity, can you report the result returned by > js/xpconnect/tests/chrome/test_bug732665_meta.js, before and after this > patch. I feel that this might have a big impact on our recursion depth. Not really on this test because (1) the recursive calls are optimized by Baseline (and also the interpreter btw) so we don't go through Invoke/RunScript and (2) I ran it in the shell, which is very different from the browser build with PGO. Attached is a bound function call benchmark, for this I get the following max recursion depths before/after the patch: - normal : 88 -> 340 - no baseline: 47 -> 86 This is still not great, but bind is the worst case scenario (see bug 1093158) and hopefully this is good enough for now until we fix bind itself.
Comment 28•9 years ago
|
||
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=11513958&repo=mozilla-inbound
Flags: needinfo?(jdemooij)
Comment 29•9 years ago
|
||
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/e2558e2d3c72
Assignee | ||
Comment 30•9 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #28) > sorry had to back this out for test failures like Hm this exposed a completely unrelated bug; will investigate.
Assignee | ||
Comment 32•9 years ago
|
||
Comment on attachment 8631020 [details] [diff] [review] Patch Requesting approval so we can reopen Aurora (see bug 1180948). Approval Request Comment [Feature/regressing bug #]: Hard to say; compiler issue. [User impact if declined]: Overrecursion errors. [Describe test coverage new/current, TreeHerder]: On inbound. [Risks and why]: Low risk; very small patch with some simple tweaks to avoid large stack frames. [String/UUID change made/needed]: None.
Flags: needinfo?(jdemooij)
Attachment #8631020 -
Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/7bdb396ae9e9
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Assignee | ||
Comment 34•9 years ago
|
||
Comment on attachment 8631020 [details] [diff] [review] Patch Actually let's land this on beta too - there has been some talk about our Win64 builds and this will fix some websites for people willing to try them.
Attachment #8631020 -
Flags: approval-mozilla-beta?
Comment 35•9 years ago
|
||
I went ahead and pushed this to Aurora ahead of the formal approval so we can get it reopened ASAP. https://hg.mozilla.org/releases/mozilla-aurora/rev/113007b56abf
status-firefox41:
--- → fixed
Updated•9 years ago
|
status-firefox40:
--- → affected
Comment 36•9 years ago
|
||
Comment on attachment 8631020 [details] [diff] [review] Patch Low risk, on inbound, already fixed in Aurora.
Attachment #8631020 -
Flags: approval-mozilla-beta?
Attachment #8631020 -
Flags: approval-mozilla-beta+
Attachment #8631020 -
Flags: approval-mozilla-aurora?
Attachment #8631020 -
Flags: approval-mozilla-aurora+
Comment 38•9 years ago
|
||
:kael, could you confirm the fix works with the latest Firefox 40 Beta 4 build: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/40.0b4-candidates/build1/?
Flags: needinfo?(kg)
You need to log in
before you can comment on or make changes to this bug.
Description
•