Closed Bug 966173 Opened 7 years ago Closed 3 years ago

"too much recursion" on Linux trying to file U.S. income tax in Firefox 28

Categories

(Core :: JavaScript Engine, defect)

28 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1244280
Tracking Status
firefox27 + affected
firefox28 - affected
firefox29 - affected
firefox30 - affected
relnote-firefox --- -

People

(Reporter: jidanni, Unassigned)

References

Details

(Whiteboard: [bad-incentives])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release)
Build ID: 20140125004003

Steps to reproduce:

Today is the first day of the US tax season.
And wouldn't you know it?
One cannot fill in the fields of the 1040 form on https://www.freefilefillableforms.com .
Chromium on the other hand works fine.
This affects millions of taxpayers.
Severity: normal → major
Priority: -- → P1
https://www.freefilefillableforms.com/#/fd/faqs says

System Requirements

    What are the recommended browsers for using Free File Fillable Forms?

    Free File Fillable Forms will run best using any of the following browsers:
        Google Chrome version 31: http://www.google.com/chrome
        Mozilla Firefox version 26: http://www.firefox.com/
        Internet Explorer version 11: http://windows.microsoft.com/ie
Severity: major → critical
Summary: https://www.freefilefillableforms.com cannot be used → Americans will not be able to file their taxes this year with Firefox
# update-flashplugin-nonfree --status
Flash Player version installed on this system  : 11.2.202.335
Flash Player version available on upstream site: 11.2.202.335
I made this "critical" thinking it would get somebody to look at it.
As that doesn't seem to have any effect, I have restored this to 'major'.
I have also alerted the IRS to this bug number so they might weigh in with their opinion.
Severity: critical → major
Component: Untriaged → Layout: Form Controls
Product: Firefox → Core
Hmm.  This stuff used to be Flash, but apparently they've changed their site around to be all HTML now.

In any case, I just tried and I can enter numbers in a 1040 on the site just fine in a current nightly and in Firefox 26 (and in Firefox 22, just for good measure).  This is on a Mac, though.

Dan, I assume you're clicking in the gray area to the left of the white bit that actually looks like a textbox?  Do you see it take focus but typing doesn't work, or does it not even take focus for you?
Severity: major → normal
Flags: needinfo?(jidanni)
Priority: P1 → --
Just tried Firefox 26 on Linux, and it works just fine as well as far as I can tell.

To make sure we're testing the same thing, what are the exact steps to reproduce this bug?
*** Please test with Firefox 28 as indicated above, not 26 ***
I cannot be responsible for older versions. Thank you.


   "vendor": "Mozilla",
    "name": "Firefox",
    "id": "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}",
    "version": "28.0a2",
    "appBuildID": "20140203004003",
    "platformVersion": "28.0a2",
    "platformBuildID": "20140203004003",
    "os": "Linux",
    "xpcomabi": "x86-gcc3",
    "updateChannel": "aurora"

I used
# set https://www.freefilefillableforms.com/
# su - nobody -c 'HOME=/tmp /home/jidanni/tmp/firefox/firefox-bin '$@ &
to ensure the cleanest test environment possible.

Indeed you will find that the 1040 that you filled in with NON FF 28 browsers comes up AS IF IT WAS NEVER filled in, in FF28. And either way, you won't be able to fill anything in.
Flags: needinfo?(jidanni)
Summary: Americans will not be able to file their taxes this year with Firefox → Americans will not be able to file their taxes this year with Firefox Aurora
Dan, please note I said "current nightly" in comment 5, in addition to the release version I tested.

But just for good measure, I just tried an Aurora 28 build.  Specifically, a 20140203004003 build on Mac.  The site works fine in that build as well, in that I can type in the fields on the 1040.  I can also click the "Save" button, log out, log back in, and the numbers I typed are still there.

Still waiting for the equivalent Linux build to finish downloading.
Aha.  So on Linux, with Aurora 28, I do see the problem.  Presumably this error in the erro console is relevant:

  too much recursion underscore.js:8

So the good news is that 28 is not shipped to users yet, so we can fix the regression.  ;)
Status: UNCONFIRMED → NEW
Component: Layout: Form Controls → JavaScript Engine
Ever confirmed: true
OK glad the problem got found.

One last thing to please check for me:
On the "STEP 2. E-File Your Tax Forms" tab,
"Complete one electronic 1099-R for every 1099-R you received. Click the
Add button for each 1099-R you received. . ."

Well it turns out no matter how much one can try,
that 1099-R form will not get into the "Print Return" button results,
and will thus never get sent to the IRS, and the user will get a
rejection email several days later. Tested even with Chromium.
Though even in older builds that let me type in the field I get too much recursion in formloader.js on load and it doesn't show the saved values.  So it looks like all that happened is the exact location where the too much recursion happens just moved a bit to be more obvious...
> One last thing to please check for me:

That's hard to do without an actual 1099-R, especially if you want me to try submitting it to the IRS.  ;)

So I just looked around, and bug 960523 is looking a bit similar.  Except on the freefilefillableforms page I'm getting errors back to Firefox 25...
(In reply to Boris Zbarsky [:bz] from comment #12)
> That's hard to do without an actual 1099-R
Just type a name into the 1040, SAVE, DONE WITH THIS FORM, etc.
then on the "STEP 2. E-File Your Tax Forms" tab, click 1099-R,
type a word into that 1099-R form, click SAVE, DONE WITH THIS FORM,
now at the top, print PRINT RETURN, voila, *no* 1099-R attached!
OK, so the first "too much recursion" error on Linux appears in this range:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b197bed90a98&tochange=3d16d59c9317

Anything obvious in there that could use more stack?

Still working on getting more recent Linux nightlies to see what things are like on trunk.
Dan, just to check, this is a 64-bit build you're using?
BUG: the 32/64 information should be detected by bugzilla when the user enters the bug...
All I know is I use
# lshw
    description: Notebook
    product: 20150 (Type1Sku0)
    vendor: LENOVO
    version: Lenovo G580
    serial: 3019826502787
    width: 32 bits
I've created a test account for testing this, but after I get to https://www.freefilefillableforms.com/#/fd/home when clicking either one of the "Start 1040EZ", "Start 1040A" or "Start 1040" buttons, I get the Page Not Found Error.

Does anyone have any suggestions?
Flags: needinfo?(jidanni)
This seems prescient now: http://quotes.burntelectrons.org/4390
Summary: Americans will not be able to file their taxes this year with Firefox Aurora → "too much recursion" trying to file U.S. income tax in Firefox 28
Whiteboard: [bad-incentives]
The numbers vary a bit, but typically I see ~300k for nesting depth of "simple interpreter calls" on Mac, and ~80k for function.call().

On Linux, those numbers for me are closer to ~7-40k and ~500 (!).
More data.

Local bisect shows for me consistently that a build from rev ccd298a9db28 works while a build from rev ce43d28276e4 (bug 678037) fails.

On the testcase in attachment 8370190 [details], both builds show about the same behavior, though...
Blocks: LazyBytecode
Summary: "too much recursion" trying to file U.S. income tax in Firefox 28 → "too much recursion" on Linux trying to file U.S. income tax in Firefox 28
Also, the bisect was done with clobber builds.  With dep builds the result is all confused. For example, if I build rev ce43d28276e4 and then dep build ccd298a9db28 I get a failing build, but if I clobber build ccd298a9db28 I get a passing build?  Or something.
(In reply to Boris Zbarsky [:bz] from comment #20)
> The numbers vary a bit, but typically I see ~300k for nesting depth of
> "simple interpreter calls" on Mac, and ~80k for function.call().
> 
> On Linux, those numbers for me are closer to ~7-40k and ~500 (!).

You probably know this already but just to be sure: if you have a direct function call, the interpreter will push an "inline frame" and reuse the C++ Interpret activation, and the Baseline JIT will be able to optimize the call so that you stay in JIT code.

With fun.call() both the interpreter and Baseline go through a lot of C++ (fun_call, RunScript, etc) on each call, but at least for Baseline-compiled code we shouldn't enter that huge Interpret frame.

What numbers do you get on Linux with javascript.options.baselinejit.content = false?
(And once you reach Ion compilation, use count 1000, you get smaller stack frames and probably inlining a few levels deep. That's probably happening on OS X (hence the awesome numbers) but sadly not on Linux: we reach the stack limit before we enter Ion.)
> we reach the stack limit before we enter Ion.

Ah, that might explain the behavior I'm seeing, yes.  Nonlinearity of recursion depth with step depth, fun.

On Linux on the attached testcase, in an opt build from around when this regression appeared, I get 534 frames via function.call with baseline enabled and 345 with baseline disabled.

On a current (2014-01-31) nightly I get 362 frames with baseline enabled and 174 with baseline disabled.

Now I really wonder what our Windows builds do on this site and on this testcase...

But also, note that we went from working to not working with no change to the stack depth numbers back in June.
(In reply to Manuela Muntean [:Manuela] [QA] from comment #17)
> I've created a test account for testing this, but after I get to
> https://www.freefilefillableforms.com/#/fd/home when clicking either one of
> the "Start 1040EZ", "Start 1040A" or "Start 1040" buttons, I get the Page
> Not Found Error.
> 
> Does anyone have any suggestions?

Double check with Google Chromium to eliminate any short-term site issues.
Flags: needinfo?(jidanni)
Attached image e.png
All I know is that there is also no way to attach any of the forms circled there on the Step two page so that they show up in "Print Return", no matter how much any of the other buttons are pressed.
> Now I really wonder what our Windows builds do on this site and on this testcase...

On the site, it works in both a current nightly and Firefox 27 on Windows.  So this bug really seems Linux-specific, on this particular site.

On the testcase, Firefox 27 on Windows gives me:

  simple interpreter calls: 57063
  via function.call: 775

and the current nightly on Windows gives me:

  simple interpreter calls: 59457
  via function.call: 750

so a bit higher than on Linux, and that might happen to be enough to allow the script on this site to work....

I'm really not sure what we can do to fix this issue, other than increasing the default stack size on Linux on the main thread?
bz - do you want to try the stack size increase you mention in comment 28?  We've got one Beta (thursday) left to try speculative fixes, otherwise this will remain unfixed for 28 as well.
Flags: needinfo?(bzbarsky)
I have no idea how to do that and whether it would break too many other things.  :(  Benjamin, thoughts?
Flags: needinfo?(bzbarsky) → needinfo?(benjamin)
Are we talking about the native stack size or the JS boundary size or both? The primary effects of increasing the stack size are to make infinite-recursion crashes take longer before crashing and to increase the size of crash report minidumps. So in general I'd like to avoid increasing the stack size.

Is this all content JS and not our own code?
Flags: needinfo?(benjamin)
We're talking about the native stack size, specifically for our main UI thread.  Note that it would really only help this one specific site which seems to be close to the edge of fitting onto our existing stack.

> Is this all content JS and not our own code?

Yes.
I think we should WONTFIX this. Perhaps we should add telemetry for JS hitting the stack limits, though!
If we don't plan to fix this on our end somehow, we should make it tech evangelism, I guess...
It looks to me like this is no longer a blocker for release (comment 33) and that also we could put it in our release notes as a known issue at least throughout tax season.
The patch in bug 960523 should increase the max recursion depth here at least 10x.
Depends on: 960523
On the testcase attached by bz I get:

Firefox 28 (Beta)
simple interpreter calls: 54929
via function.call: 773

Nightly 30.0a1 (2014-03-16) - with bug 960523
simple interpreter calls: 63239
via function.call: 30388
Firefox 28:
<script>
    var f1 = function (length) {
      return new Array(length);
    };
    var f = function (s) {
      if (s === "1") {
        return f1(1);
      }
      return f1(0);
    };
    setInterval(function () {
      f("1").length;
      if (f("0").length !== 0) {
        alert("Hello, Mozilla!");
      }
    }, 1);
</script>
open html page with my previous html code, wait ~10 seconds to see "alert"
That has nothing to do with this bug.  Please file a separate bug.
That looks like bug 989586 and (as Boris said) is unrelated to this bug. Before you file a new bug, please test it in Firefox 29 (it should be fixed there).
Results for Firefox 38 on CentOS7:
simple interpreter calls: 49999
via function.call: 189
Should be fixed with bug 1244280.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1244280
You need to log in before you can comment on or make changes to this bug.