Closed Bug 1537609 Opened 5 years ago Closed 5 years ago

Games on pogo.com are slow because they use an obfuscator that deliberately triggers stack overflow exceptions

Categories

(Core :: JavaScript: GC, defect, P1)

66 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla68
Tracking Status
relnote-firefox --- 66+
firefox-esr60 --- unaffected
firefox66 blocking verified
firefox67 + verified
firefox68 + verified

People

(Reporter: u635799, Assigned: jorendorff)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Cleaned out cashie,

Actual results:

I got new update for firefox and since pogo.com is slow and pause during play.
The pogo site since firefox update runs so slow. Please fix this problem, it was not there until the firefox update.

Expected results:

the games should run smooth and not pause and jump during play.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Went to file a bug on this issue and found this waiting for us!

Flags: needinfo?(vchin)
Flags: needinfo?(luke)

A specific link that reproduces this is: https://www.pogo.com/games/lottso-express

A few initial findings:

  • We get like 4fps; Chrome runs totally smooth.
  • The profiler shows that we're doing multiple major GCs per frame.
  • Even before FF66, this appears to be the case, but the GCs got somehow more expensive in FF66
  • The games use some form of cheating prevention that, it looks like, inserts tons of debugger statement calls

So I think there are three questions to answer here:

  1. Is the debugger statement causing all the GCs?
  2. If so, why (note the console isn't even open) and, if not, what is?
  3. Why did the GCs get more expensive with FF66.
Component: Untriaged → JavaScript: GC
Flags: needinfo?(luke)
Product: Firefox → Core
Attached patch don't emit JSOP_DEBUGGER — — Splinter Review

This patch should disable any effects from debugger statements. I didn't do any profiling but I see similar intermittent choppiness with and without this patch.

Keywords: regression
Summary: Since the update when you go on a game site like pogo.com the games are slow and pause in the middle of play. → Slow games because of GC called triggered by debugger
Assignee: nobody → bhackett1024

From Looking at the forums at https://games-forum.pogo.com/search.php?sid=c40fe75c121c1f146a550caa3c054e80 we may want to call this a blocker for going to 100% updates on release. This many reports with 66 barely released is significant.

I'm not available to work on this, I was just wondering if the debugger statements were having a noticeable effect on performance.

Assignee: bhackett1024 → nobody

Can confirm that this affects all supported versions >=66.

INFO: Last good revision: 5ab46880d921f1c948902976e832e995cdaa8104
INFO: First bad revision: e4550e8b5a79edd04b29b77903412a94a949bbfc
INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5ab46880d921f1c948902976e832e995cdaa8104&tochange=e4550e8b5a79edd04b29b77903412a94a949bbfc

Blocks: 256180
Has Regression Range: --- → yes
Flags: needinfo?(hsivonen)

The change implicated by the regression range does two things:

  1. Bad HTML (primarily hundereds of <font> start tags without end tags) causes deeper DOM trees than before. This can affect JS perf if JS walks the tree. (Probably not what's happening here.)

  2. The run-time stack is deeper, so it's possible to run more deeply recursive code before getting stopped by the SpiderMonkey recursion limit, which, IIRC, changes to match the actual stack size setting on Windows, but now I fail to locate the relevant code on searchfox.

It would be worthwhile to check if the game relies on unlimited recursion getting limited by the JS engine.

Flags: needinfo?(hsivonen)

bhackett: thanks for doing that experiment. For more details, the developer says they're using https://obfuscator.io and they specifically observe the regression when they toggle the "debugProtectionInterval" option.

ISTR debugger statements used to deopt some stuff in the front end, regardless of whether we actually emit the JSOP_DEBUGGER bytecode.

(In reply to Henri Sivonen (:hsivonen) from comment #7)

[...] the SpiderMonkey recursion limit, which, IIRC, changes to match the actual stack size setting on Windows, but now I fail to locate the relevant code on searchfox.

https://searchfox.org/mozilla-central/rev/4eaf7c637a303e0f7adcf1ae45064b2d4bef9eb0/js/xpconnect/src/XPCJSContext.cpp#1144

Apparently we always generate arguments objects in functions that contain debugger statements. This alone could account for extra GC traffic, as well as general slowness, but still doesn't explain why it was ever OK.

Looking at https://github.com/javascript-obfuscator/javascript-obfuscator/blob/master/README.md it looks like "Debug Protection Interval" requires "Debug protection". If we have contact with the developer, what does the behavior looks like if "Debug protection" is on but "Debug Protection Interval" is off?

Still trying to figure out what the "Debug protection" option actually does by digging through the source, but based on https://github.com/javascript-obfuscator/javascript-obfuscator/blob/master/src/templates/debug-protection-nodes/debug-protection-function-interval-node/DebugProtectionFunctionIntervalTemplate.ts it adds a

  setInterval(debugProtectionFunctionName, 4000);

to ... where? How many of these interval timers end up live?

"debugProtectionFunctionName" is defined in https://github.com/javascript-obfuscator/javascript-obfuscator/blob/master/src/templates/debug-protection-nodes/debug-protection-function-node/DebugProtectionFunctionTemplate.ts which when the timer fires will presumaby call debugProtectionFunctionName with ret undefined, which will call debuggerProtection(0).

What that does I think depends on the "ObfuscationTarget.BrowserNoEval" option per https://github.com/javascript-obfuscator/javascript-obfuscator/blob/62168969786dbfed949012397cc06d0d9ce4c262/src/custom-nodes/debug-protection-nodes/DebugProtectionFunctionNode.ts#L63-L65. So either https://github.com/javascript-obfuscator/javascript-obfuscator/blob/master/src/templates/debug-protection-nodes/debug-protection-function-node/DebuggerTemplate.ts or https://github.com/javascript-obfuscator/javascript-obfuscator/blob/master/src/templates/debug-protection-nodes/debug-protection-function-node/DebuggerTemplateNoEval.ts.

The latter seems to boil down to either an infinite loop if counter is a string (not in this case), or a bunch of debugger calls. But note that the caller in "debuggerProtection" looks like this:

            function debuggerProtection (counter) {            
                {debuggerTemplate}
                debuggerProtection(++counter);
            }

and gets called with "0" as the arg. So yes, this will in fact do infinite recursion, with a debugger; at every level (in the no-eval case; in the eval case it does something more complicated like:

                (function () {return false;}.constructor('debu' + 'gger').apply('stateObject'));

or whatnot. Anyway, this will stop when the stack is exhausted. If we started allowing a much deeper recursion level, that could definitely significantly increase the time until the stack exhausts...

(I think I'm glad we don't implement tail recursion; seems to me that debuggerProtection is an infinite loop in a browser with tail recursion....)

That said, I just tried https://www.pogo.com/games/lottso-express and I do not see any intervals getting set for 4000ms... So either they're using a different version of that obfuscator code or something is up. :(

RyanVM says the difference is night-and-day on Windows. I can't be sure there's any regression at all on Mac.

If so, this is probably the stack size change. We can either back it out or add a counter-hack at the point where we set the JS engine stack size.

Filed bug 1538375 for debugger statements being slow.

I'm seeing lots of 5000ms setTimeouts.

I don't entirely understand why I'm seeing such a limited window of time in the profiler, but I'll ignore that.

I'm seeing a major GC per frame, and multiple minor GCs that are also very slow. The major GCs are around 200ms. The minors vary widely, but can get up to 90ms. They all claim to be triggered by GCBytes allocations, and the majors are clearing at least 50MB of junk per frame. I think the bulk of the time spent is in marking the stack, which I guess makes sense with a deeply nested stack. It also means the GC time is strongly influenced by the stack size change.

It looks like hsivonen's patch actually decreases the stack size on Windows, from 2MB to 1.5MB. So a counter-hack to keep the JS stack size unchanged is not an option. bz pointed out that that's not what the patch does; it actually changes the stack size from 2MB to 8MB, but the diff is confusing. The patch changes the meaning of the else line that appears not to have changed.

n/m you got it (and I can't delete a comment)

Ryan, could you grab a build from this try run and see if the problem is still present?

(That's Nightly plus a small patch, easy to backport to 66 and 67.)

Assignee: nobody → jorendorff
Flags: needinfo?(ryanvm)
Priority: -- → P1

In bug 256180, the size of the stack on 64-bit Windows was changed from 2MB to
8MB, and on 32-bit Windows, from 1MB to 1.5MB. This is so large that it takes
significantly longer for a runaway recursive function to throw "too much
recursion", which causes terrible performance in scripts obfuscated using
obfuscator.io.

This patch leaves the actual stack size as-is, but changes the
JS-engine-specific stack quota back to 2MB on 64-bit Windows (6MB if ASAN is
enabled). 32-bit Windows is unaffected by the new cap.

I suppose the hack that would speed this up the most is to detect the case where a function's entry point is postdominated by a recursive call to itself, and throw an over-recursed exception in place of the recursive call.

An even simpler version would only handle the subset that has no branches at all.

It would change behavior, by only running the body once instead of N times, but I assume there's spec language saying our stack can bottom out arbitrarily?

Probably not worth the effort, code, or slight bytecode compilation slowdown for every other javascript function in the universe.

Oh, dammit:

function normal(depth) {
    print(depth);
    if (depth == 7)
        normal = () => null;
    normal(depth + 1);
}

normal(0);
Flags: needinfo?(vchin)

This looks similar to bug 1475013, only that now the increased stack size made things even worse.

Hey, the Firefox Debugger is great for finding debugger statements. Here are a couple from the actual page linked in comment 2:

eval(" try { (function b () { debugger; b(); })(); } catch (e) { setTimeout(a, 5000); } "); 
        try {
          var P60 = 2;
          while (P60 !== 1) {
            switch (P60) {
              case 2:
                (function t90() {
                  var X60 = 2;
                  while (X60 !== 5) {
                    switch (X60) {
                      case 2:
                        debugger;
                        t90();
                        X60 = 5;
                        break;
                    }
                  }
                }()); P60 = 1; break;
              }
          }
        } catch (g90) {
        }

If someone wants to burn the world, they're going to find a way to do it. Obfuscators gonna 'scate...

This patch leaves the actual stack size as-is, but changes the
JS-engine-specific stack quota back to 2MB on 64-bit Windows (6MB if ASAN is
enabled).

Did this game ever work on Mac or Linux? Bug 256180 didn't change the stack size for them, and SpiderMonkey appears to allow an 8-megabyte stack on Linux.

Severity: normal → major

(In reply to Jason Orendorff [:jorendorff] from comment #21)

Ryan, could you grab a build from [this try
run](https://treeherder.mozilla.org/#/
jobs?repo=try&revision=55b83f8844a9d4098830702e0da8b48a44a98875&searchStr=B&s
electedJob=235586899) and see if the problem is still present?

(That's Nightly plus a small patch, easy to backport to 66 and 67.)

Yes, this is much better.

Flags: needinfo?(ryanvm)

(In reply to Henri Sivonen (:hsivonen) from comment #26)

Did this game ever work on Mac or Linux? Bug 256180 didn't change the stack size for them, and SpiderMonkey appears to allow an 8-megabyte stack on Linux.

I think the regression is Windows-only. I tried both versions on Mac and couldn't tell the difference.

Well, Jan has reviewed the patch, but now I'm getting cold feet, because of this review comment:

Does 3 MB or 4 MB still work? Asking because sometimes websites break because they run out of stack space in Firefox and not in other browsers... I was excited about the 8 MB change for that reason and it's a bit unfortunate to go back.

This is dumb. We shouldn't optimize for garbage code at the expense of normal code.

We have to figure out another way forward, though. Maybe patch the javascript-obfuscator package not to be so dumb? Hmm.

(In reply to Jason Orendorff [:jorendorff] from comment #29)

Does 3 MB or 4 MB still work? Asking because sometimes websites break because they run out of stack space in Firefox and not in other browsers... I was excited about the 8 MB change for that reason and it's a bit unfortunate to go back.

This is dumb. We shouldn't optimize for garbage code at the expense of normal code.

I agree, we should not penalize website which are not using this workaround.

We have to figure out another way forward, though. Maybe patch the javascript-obfuscator package not to be so dumb? Hmm.

This would not fix existing websites.

One idea, would be to prevent debugger statement to be executed after a specific stack depth, and throw a JSMSG giving our apology to web developers and and offering them an about:config flag to convert this JSMSG to a console warning.

Has STR: --- → yes
Summary: Slow games because of GC called triggered by debugger → pogo.com is slow because it uses an obfuscator that deliberately triggers stack overflow exceptions
Summary: pogo.com is slow because it uses an obfuscator that deliberately triggers stack overflow exceptions → Games on pogo.com are slow because they use an obfuscator that deliberately triggers stack overflow exceptions

OK. Well, I've gone back and forth on this 30 times now but it's no use fighting the web. We have to revert. I'm going to land the revert now, backport to 67 and 66, then file a follow-up bug to rationalize JS stack quota across platforms.

<Waldo> jorendorff: I have no problems fighting the web, but we should fight deliberately, not by stumbling into it

I think in the followup we should evaluate the allowed-depth behavior of other browsers and see how it compares to ours. Because there are various dimensions of compat here, and ideally we would hit all of them. If that means increasing stack sizes and fixing the "debugger;" case to not be super-painful in that situation, that would be great.

Note also that rationalizing JS stack quota across platforms used to be tough in terms of web-perceived behavior because the stack space the engine used (esp js::Interpret) was so vastly different across platforms. Maybe that's better now that we use the same compiler everywhere...

Pushed by jorendorff@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/79f99b532456
Cap the stack size at 2MB on Windows. r=jandem

<tcampbell> jorendorff: Win64 is much better than OSX for same machine on 65.0.2

Ted checked. This confirms my guess: performance on OS X was bad for this site all along, due to the large stack on OS X. The regression is Windows-only.

Comment on attachment 9053033 [details]
Bug 1537609 - Cap the stack size at 2MB on Windows. r?jandem

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 256180
  • User impact if declined: Sites that use the javascript-obfuscator package will be slow in Firefox on Windows.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This just puts the JS stack quota on Windows back where it was before bug 256180 landed.
  • String changes made/needed: none
  • Do you want to request approval of these patches as well?: on
Attachment #9053033 - Flags: approval-mozilla-release?
Attachment #9053033 - Flags: approval-mozilla-beta?

This applies cleanly to both branches.

QA Whiteboard: [qa-triaged]
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Comment on attachment 9053033 [details]
Bug 1537609 - Cap the stack size at 2MB on Windows. r?jandem

Fix a blocking regression in 66, let's take it in the next beta 6 and check that this is fixed there and Liz can then decide if she takes it for the next 66 dot release (if we have one). Thanks

Attachment #9053033 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Build ID: 20190327175114
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

The issue still occurs on the latest Nightly build.

On the latest Beta build (67b6) the issue is fixed.

Comment on attachment 9053033 [details]
Bug 1537609 - Cap the stack size at 2MB on Windows. r?jandem

This has so many user complaints that I'd like to take the fix for 66.0.3.
Verified in beta and the patch should apply cleanly to m-r.

Attachment #9053033 - Flags: approval-mozilla-release? → approval-mozilla-release+

Adding to 66.0.3 release notes as "FIXED: Performance issues with some HTML5 games"

Blocks: 1542995

Build ID: 20190409155332
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Verified as fixed on the latest Release build (66.0.3)

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1578350
See Also: → 1578220
Flags: needinfo?(softwebtuts)

Please specify a root cause for this bug. See :tmaity for more information.

Root Cause: --- → ?

The Grendel's mother of External Software Affecting Firefox bugs.

Root Cause: ? → External Software Affecting Firefox
No longer blocks: 256180
Regressed by: 256180
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: